The goal of this research project is to make it possible to process approximate queries on combinatorial structures such as trees and graphs in a fast (i.e., sublinear with respect to the number of objects being searched) and efficient manner. The ultimate goal is to be as fast as approximate keyword search in text documents. The approach consists of: (1) discovering the best heuristics for answering these NP-complete problems in polynomial time with high quality, and (2) discovering data structures to make queries on thousands or millions of such objects fast. The results of this project will provide techniques and software to search for patterns among graph and tree structured data. Possible applications of such a search engine include searches among proteins, chemical compounds, neuroanatomical structures, Web/text filters and XML documents. The techniques developed in this research are in particular suitable for bioinformatics and biocomputing applications. Professors Shasha and Wang plan to collaborate with five additional researchers in these areas: Jack Collins, National Cancer Institute working in small molecule-protein docking for drug design; Michael Donohue, Professor of Biology and Director of the Harvard University Herbaria planning to apply this research to phylogenic trees; Bruce Shapiro, National Cancer Institute developing algorithms and computational systems for determining structure/function of nucleic acids; Cathy Wu, National Biomedical Research Foundation doing research in analysis and classification of protein sequences; and Daniel Zaharevitz, National Cancer Institute whose goal is to make biological and structural data more available to the research community via better search capabilities. These collaborators help to motivate and validate the tree and graph matching tools and algorithms development for biological applications. www.cis.njit.edu/~jason/sigmod.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9988636
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2000-09-15
Budget End
2004-08-31
Support Year
Fiscal Year
1999
Total Cost
$194,656
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Newark
State
NJ
Country
United States
Zip Code
07102