The development of rapid methods for molecular cloning, DNA sequencing, and protein and DNA sequence comparison has revolutionized the practice of molecular biology. Newly determined sequences are routinely compared against large sequence databases, and increasingly, inferences about structure are based on sequence similarity. During the past grant period, we (1) developed a rigorous new approach to evaluating sequence comparison algorithm and scoring parameters and discovered a simple but very effective normalization that significantly improves similarity searches; (2) implemented our search programs within six network parallel programming environments and evaluated their performance; (3) investigated tree-based multiple alignment strategies; (4) developed more exhaustive distance-based evolutionary tree methods. During the next period, we will (1) Develop rapid methods for protein sequence comparison that perform as well as or better than the rigorous Smith-Waterman approach. We will incorporate statistical estimates into the FASTA and Smith-Waterman comparison programs. We will also search for scoring parameters and normalization functions that provide better search performance. We will extend our studies to DNA sequence comparison, using repeated sequence families and exon sequences to characterize use ability of algorithm and scoring parameters to both identify aid properly bound homologous DNA sequences. (2) Our current network-parallel comparison programs are not well suited for production environments. We will augment our them to provide all the functions present in the widely used serial versions. We will also develop more robust and usable parallel platforms for """"""""production"""""""" sequence searching on networks of shared-workstations. (3) We will develop efficient heuristics for constructing evolutionary trees based on distance and parsimony criteria. We will focus on new approaches that sample more broadly evolutionary tree space and can produce information on sub-optimal trees. (4) We will continue to develop and characterize tree-based approaches to multiple sequence alignment. We will develop heuristic tree-based alignment algorithms that are capable of aligning rapidly dozens of sequences and develop parallel implementations of these algorithms. We will also examine more sophisticated gap-penalties for tree-based alignments. (5) We will examine effective """"""""unified""""""""' approaches to phylogeny and alignment by combining the approaches outlined in aims 3 and 4 above.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM004969-10
Application #
2460257
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Project Start
1988-08-01
Project End
1999-07-31
Budget Start
1997-08-01
Budget End
1998-07-31
Support Year
10
Fiscal Year
1997
Total Cost
Indirect Cost
Name
University of Virginia
Department
Biochemistry
Type
Schools of Medicine
DUNS #
001910777
City
Charlottesville
State
VA
Country
United States
Zip Code
22904
Pearson, William R; Mackey, Aaron J (2017) Using SQL Databases for Sequence Similarity Searching and Analysis. Curr Protoc Bioinformatics 59:9.4.1-9.4.22
Pearson, William R (2016) Finding Protein and Nucleotide Similarities with FASTA. Curr Protoc Bioinformatics 53:3.9.1-25
Triant, Deborah A; Pearson, William R (2015) Most partial domains in proteins are alignment and annotation artifacts. Genome Biol 16:99
Pearson, William R (2013) An introduction to sequence similarity (""homology"") searching. Curr Protoc Bioinformatics Chapter 3:Unit3.1
Pearson, William R (2013) Selecting the Right Similarity-Scoring Matrix. Curr Protoc Bioinformatics 43:3.5.1-9
Mills, Lauren J; Pearson, William R (2013) Adjusting scoring matrices to correct overextended alignments. Bioinformatics 29:3007-13
Li, Weizhong; McWilliam, Hamish; Goujon, Mickael et al. (2012) PSI-Search: iterative HOE-reduced profile SSEARCH searching. Bioinformatics 28:1650-1
Holliday, Gemma L; Andreini, Claudia; Fischer, Julia D et al. (2012) MACiE: exploring the diversity of biochemical reactions. Nucleic Acids Res 40:D783-9
Gonzalez, Mileidy W; Pearson, William R (2010) RefProtDom: a protein database with improved domain boundaries and homology relationships. Bioinformatics 26:2361-2
Sierk, Michael L; Smoot, Michael E; Bass, Ellen J et al. (2010) Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments. BMC Bioinformatics 11:146

Showing the most recent 10 out of 29 publications