PI: Xindong Wu; Co-PIs: Abdullah N. Arslan and Xingquan Zhu U of Vermont & State Agricultural College
Pattern Matching with Wildcards and Length Constraints
This research defines a unique problem of pattern matching with wildcards and length constraints, and aims to design efficient algorithms for the problem. Given a pattern P and a text T, a substring S in T is a matching string of P if (1) the number of wildcards between each two consecutive pattern letters in S and (2) the length of S are both bounded by the user's specifications. The project seeks to find the maximum number of ``distinct'' occurrences of P in T. This is a complex problem that integrates both local constraints (in the form of gaps between consecutive pattern letters) and global length constraints in pattern matching.
The research team will start with an existing preliminary design and further investigate the pattern matching problem, by (1) exploring the time complexity of the problem, (2) designing new, efficient algorithms to deal with some special cases, and (3) applying these efficient algorithms in practical problems in text indexing, gene sequence analysis, network security and stream data mining.