9224602 Shasha SDB: Discovering Motifs in Scientific Databases This is the first year funding of a three-year continuing award. This research is carried out in collaboration with Bruce Shapiro, National Institutes of Health. Scientific progress often results from discovering structural commonalities that explain similar behavior. For example, in molecular biology, a set of proteins or DNA sequences may express similar functionality in nature. This project aims to help scientists discover common sequence or topological patterns that explain the similarity. Pattern discovery entails generating pattern guesses in a systematic way and testing them. The tests are based on approximate pattern matching algorithms that yield distance metrics. Thus, commonalities may be approximate. The main research milestones are a family of algorithms for pattern discovery, query processing, data organization and index manipulation. The algorithms are to be tested on data drawn from the National Institutes of Health and from public genome databases. Whereas some of the algorithms are specific to the combinatorial structures present in biology, many of the techniques should generalize to any application that seeks to find patterns in databases. This project will help scientists discover patterns in large databases that determine natural behavior. Such patterns may lead to new drug design or to new treatments. ***