This is an interinstitutional collaborative project. Combinatorial data consisting of sequences, trees, and graphs arise in many scientific disciplines. For example, the primary structure of proteins is a sequence, whereas the tertiary structure is a graph. Comparing such data to find similarities entails the use of a "distance metric" that mea sures the difference between two data items. Numerous distance metrics are possible. This work consists primarily of (i) inventing efficient ways to compute known distance metrics; (ii) developing a data structure to decide which of a set of data items is "closest" (according to a given distance metric) to a new data item; (iii) techniques and s oftware for discovering patterns with minimum or near-minimum distance to a given set of data items with respect to a given distance metric; and (iv) software to solve such discovery problems on networks of occasionally idle workstations. This work will help every field in which approximate matching is important. Significant applications are expe cted to molecular biology and rational drug design, as well as to finding patterns in linguistic strings.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9531548
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
1996-08-01
Budget End
2000-01-31
Support Year
Fiscal Year
1995
Total Cost
$207,532
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Newark
State
NJ
Country
United States
Zip Code
07102