: The proposed research will increase understanding of the relationship between protein sequence and function through development of innovative computational and statistical technologies. The approach is designed to maximize information extracted from datasets that include dense sampling of sequences from diverse taxa. The development of new and fast phylogeny-based likelihood methods will allow researchers to take advantage of large multi-protein datasets sampled over a range and density of biodiversity that is currently uncommon, but will increase rapidly in the near future. In the first phase, the project will develop novel computational methods to analyze patterns of protein evolution and coevolution, create a fast method for analyzing large, taxonomically diverse datasets, and evaluate the utility and accuracy of model approximations using this method, and begin to develop methods to manage and visualize sequence, structure, function, and phylogenetic information from large, taxonomically diverse datasets. In the second phase, it will further develop novel computational methods to analyze patterns of protein evolution and coevolution, apply analytical tools to a broad range of proteins and protein complexes, implement computer programs employing these methods that are accessible to the general community, and provide filtered access to protein sequence biodiversity data for easy analysis and visualization. The long-term goal of this project is to understand the relationship between sequence diversity and structure such that more accurate predictions of the effect of substitution can be made. It will determine the value of taxonomic diversity in predicting functional and structural information. By focusing on the near-human evolutionary environment (the vertebrates), results will be directly applicable towards understanding the structural context of human proteins and the effect of substitutions in human proteins that may lead to both single locus and quantitative disease.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21GM065612-01
Application #
6480118
Study Section
Special Emphasis Panel (ZRG1-SSS-H (01))
Program Officer
Wehrle, Janna P
Project Start
2002-08-01
Project End
2004-07-31
Budget Start
2002-08-01
Budget End
2003-07-31
Support Year
1
Fiscal Year
2002
Total Cost
$138,348
Indirect Cost
Name
Louisiana State University A&M Col Baton Rouge
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
075050765
City
Baton Rouge
State
LA
Country
United States
Zip Code
70803
Castoe, Todd A; Poole, Alexander W; Gu, Wanjun et al. (2010) Rapid identification of thousands of copperhead snake (Agkistrodon contortrix) microsatellite loci from modest amounts of 454 shotgun genome sequence. Mol Ecol Resour 10:341-7
de Koning, A P Jason; Gu, Wanjun; Pollock, David D (2010) Rapid likelihood analysis on large phylogenies using partial sampling of substitution histories. Mol Biol Evol 27:249-65
Castoe, T A; Gu, W; de Koning, A P J et al. (2009) Dynamic nucleotide mutation gradients and control region usage in squamate reptile mitochondrial genomes. Cytogenet Genome Res 127:112-27
Castoe, Todd A; de Koning, A P Jason; Kim, Hyun-Min et al. (2009) Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci U S A 106:8986-91
Gu, Wanjun; Castoe, Todd A; Hedges, Dale J et al. (2008) Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 380:77-83
Krishnan, Neeraja M; Seligmann, Herve; Stewart, Caro-Beth et al. (2004) Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference. Mol Biol Evol 21:1871-83