The proposed work focuses on searching large nucleic acid sequence databases for functionally, structurally, and evolutionarily significant patterns. There is a need to develop efficient and biologically relevant strategies for defining higher- order patterns and for using the defined pattern as a basis for searches through nucleic acid sequence databases (e.g., the GenBank database). We will design and install on the Cray computer an improved set of algorithms that will: (1) use a hash function to rapidly scan large databases for evidence of local similarity; (2) use a rigorous measure to find and score the best local alignments; (3) dynamically filter significant scores; and (4) store the results of the scan in a dynamically maintained database of similarity scores. The increased speed and reduced cost of these algorithms will allow us to examine in greater detail variations in measures (on both nucleic acid and protein levels) of similarity; the programs will be well suited to the large-scale sequence comparison projects being planned for the Crays becoming available to the molecular biology research community. We will develop an improved method for recognizing potential donor and acceptor sites for intron sequences. We will re- examine the notion of """"""""consensus sequence"""""""" donor and acceptor patterns as well as define additional attributes that distinguish these patterns. These measures will be combined through discriminant analysis to create an algorithm that predicts whether or not a potential donor or acceptor site is functional. We will develop a general approach for scanning large databases for structural patterns, initially in the context of a search for potential curved DNA sequences. First we will develop useful measures for gauging curvature in DNA, and then test the predictions of the Tung-Harvey model by comparison with experimental results for curved DNA. With the established measures and model, we will scan the GenBank database for anomalously curved regions, and determine whether or not they are correlated with the functional roles of coincident or neighboring sequences.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM037812-02
Application #
3293562
Study Section
(SSS)
Project Start
1986-12-01
Project End
1991-11-30
Budget Start
1987-12-01
Budget End
1988-11-30
Support Year
2
Fiscal Year
1988
Total Cost
Indirect Cost
Name
Los Alamos National Lab
Department
Type
Organized Research Units
DUNS #
City
Los Alamos
State
NM
Country
United States
Zip Code
87545