This project continues work on a method for detecting motifs in protein sequences that was started at Washington University in St. Louis. The basic method, which uses a depth-first strategy to search for statistically significant patterns, was extended by A. enhancing the basic algorithm and B. incorporating additional procedures to process the output. A. Enhancing the basic algorithm. First, the speed of the basic depth-first search algorithm was increased by a factor of about ten. Second, detection of subtle motifs was enhanced by including patterns having pairs of related residues in the search (the original method searched only for single residue patterns). (Another modification, which allows incorporation of a similarity scoring matrix, was also developed.) Third, the statistical method for estimating pattern p-values was improved. B. Incorporating additional procedures. Two new procedures were developed that take as input the output from the depth-first search algorithm and attempt to identify protein regions sharing a common motif. Since a search can yield a large number of related patterns the first procedure groups these patterns and identifies the matching regions. Some regions, however, may share significant similarity to a motif without matching a statistically significant pattern and conversely, regions that otherwise have no significant similarity to a motif may match a pattern by chance. Therefore the second procedure attempts to correct for this in order to identify those regions most likely to share a common motif. The significance of the methods developed during this project lie in their ability to find subtle but significant motifs that are not detectable by other currently available methods. Motifs usually correspond to structurally and functionally conserved regions so that their detection can aid the experimentalist in protein characterization and classification. A manuscript describing these methods and their application to several very difficult problems will soon be submitted for publication.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000052-01
Application #
3781284
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1993
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code