This project is a continuing study of questions concerning what similarities can be expected to occur purely by chance when two protein or DNA sequences are compared. A subsidiary and related question concerns the definition of scoring systems that are optimal for distinguishing biologically meaningful patterns from chance similarities. Work this year includes: a) Investigation of the statistics of a block-based scoring system - within certain parameter ranges, the distribution of optimal block-based scores was found to be reasonably well modelled by an extreme value distribution. Whether such scores are more sensitive in recognizing distant biological relationships than protein """"""""profiles"""""""" or position-specific score matrices remains to be determined; b) Initial investigation of the statistics of the """"""""hybrid"""""""" local alignment scoring system - this method was found to produce scores that follow an extreme value distribution with predictible scale parameter lambda.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Intramural Research (Z01)
Project #
1Z01LM000014-11
Application #
6681316
Study Section
(CBB)
Project Start
Project End
Budget Start
Budget End
Support Year
11
Fiscal Year
2002
Total Cost
Indirect Cost
Name
National Library of Medicine
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Stojmirovic, Aleksandar; Gertz, E Michael; Altschul, Stephen F et al. (2008) The effectiveness of position- and composition-specific gap costs for protein similarity searches. Bioinformatics 24:i15-23
Yu, Yi-Kuo; Gertz, E Michael; Agarwala, Richa et al. (2006) Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res 34:5966-73
Yu, Yi-Kuo; Altschul, Stephen F (2005) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902-11
Yu, Yi-Kuo; Wootton, John C; Altschul, Stephen F (2003) The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A 100:15688-93
Altschul, S F; Bundschuh, R; Olsen, R et al. (2001) The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res 29:351-61
Schaffer, A A; Wolf, Y I; Ponting, C P et al. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15:1000-11