This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Advances this year include: a) The definition of improved scoring systems for DNA sequence comparison; b) An analysis of when protein comparison is more sensitive to biological relations than DNA comparison; c) The definition of a scoring system for macromolecular sequence comparison that is sensitive to similarities at all evolutionary distances, and an analysis of its statistics; d) The development of statistics for the sum of the scores of high-scoring segment pairs; e) The development of Poisson statistics for consistent high-scoring segment pairs; f) The estimation of optimal gap costs for pairwise sequence comparison.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000014-01
Application #
3845105
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
1
Fiscal Year
1992
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Stojmirovic, Aleksandar; Gertz, E Michael; Altschul, Stephen F et al. (2008) The effectiveness of position- and composition-specific gap costs for protein similarity searches. Bioinformatics 24:i15-23
Yu, Yi-Kuo; Gertz, E Michael; Agarwala, Richa et al. (2006) Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res 34:5966-73
Yu, Yi-Kuo; Altschul, Stephen F (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-11
Yu, Yi-Kuo; Wootton, John C; Altschul, Stephen F (2003) The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A 100:15688-93
Altschul, S F; Bundschuh, R; Olsen, R et al. (2001) The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res 29:351-61
Schaffer, A A; Wolf, Y I; Ponting, C P et al. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15:1000-11