Computational Approaches to Protein Sequence Analysis

States, David

Abstract

The large and growing databases of known protein sequences represent a knowledge base with the power to revolutionize biology, biochemistry, and biotechnology. These sequencing efforts have highlighted the growing gap between the sequence data and our ability to analyze this data. We are generally interested in answering specific questions about structure, function, and mechanisms. Much information can come from the identification of homologous proteins about which more is known. Identifying distant homologs is still difficult, even with the advent of new profile methods. Another powerful approach is to predict the tertiary structure. While progress is being made, we are still far from being able to reliably predict structures based on sequence data alone. Both of these techniques can be assisted by an analysis of the evolutionary record encoded in the sequences of available homologous proteins. We still do not have a good understanding of how to interpret this record, partially due to a lack of good models of the evolutionary process. Optimal score functions for the identification of distant homologies will be developed and analyzed, and the optimization techniques will be applied to the creation of optimal score functions for alignment of known homologs. Models of amino acid site substitutions will be used to create protein profiles that will allow the identification of further homologs and analogs. Optimization procedures will be developed for the identification of tertiary structures in proteins, including encoding the evolutionary patterns of sidechain conservation and variation. These techniques will be applied to the """"""""inverse-felding"""""""" process, that is, identifying sequences that are likely to fold into a given structure. Simple models of the evolutionary process will be developed to examine how observed properties of proteins can be understood in an evolutionary context. These models will be elaborated to include the effect of population dynamics on the evolutionary process, as well as selective pressure resulting from the need for the protein to be functional. These models will be used to explore which protein properties are likely to be inherent, and to understand how much information can be derived for proteins based on information about known homologs.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM005770-09
Application #: 6638862
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Florance, Valerie

Project Start: 1995-04-01
Project End: 2005-03-31
Budget Start: 2003-04-01
Budget End: 2004-03-31
Support Year: 9
Fiscal Year: 2003
Total Cost: $186,287
Indirect Cost

Institution

Name: University of Michigan Ann Arbor
Department: Genetics
Type: Schools of Medicine
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2004 R01 LM	Computational Approaches to Protein Sequence Analysis States, David J. / University of Michigan Ann Arbor	$186,287
NIH 2003 R01 LM	Computational Approaches to Protein Sequence Analysis States, David J. / University of Michigan Ann Arbor	$186,287
NIH 2002 R01 LM	Computational Approaches to Protein Sequence Analysis States, David J. / University of Michigan Ann Arbor	$186,287
NIH 2001 R01 LM	Computational Approaches to Protein Sequence Analysis Goldstein, Richard A. / University of Michigan Ann Arbor	$186,437
NIH 2000 R01 LM	Computational Approaches to Protein Sequence Analysis Goldstein, Richard A. / University of Michigan Ann Arbor	$187,038

Comments

Be the first to comment on David States's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments