This is a continuation of a project to develop an efficient depth-first search algorithm for detecting patterns in molecular sequences (see last year's annual report). The method has been enhanced in several ways as follows: (i) a rigorous statistical test has been added to the procedure for grouping related patterns and identifying regions sharing a common motif; (ii) the method has been modified to allow DNA and RNA pattern searches; and (iii) a procedure has been developed to down weight redundant sequences (that is, sequences sharing significant similarity with other sequences in the input set). In addition, the method has been incorporated into a new procedure (called PASS) for semiautomatic construction of a protein motif database. The basic method has been implemented in a C language program called ASSET which has been made available via anonymous ftp. The utility of the program was demonstrated on several difficult test problems including detection of the dinucleotide-binding fold present in a small subset of 91 distantly related and unrelated proteins, detection of the helix-turn- helix motif in 15 distantly related DNA binding proteins, and the discovery of novel ankyrin-like repeats in a bacterial protein. The significance of the project lies in the development of a fast and sensitive pattern detection method as an aid to the characterization and classification of protein and nucleic acid sequences.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000052-02
Application #
3759320
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1994
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code