The structure, function, and interactions of proteins produce evolutionary patterns that are imprinted on protein sequences. Here, mitochondrial and nuclear sequences will be used to study evolutionary processes and develop understanding of how proteins evolve in the context of structural, energetic, and functional constraints. Improved models of protein evolution will be developed and informed by this deeper understanding, and their utility in predicting mutational effects and structural features will be evaluated. They will also be used to better predict adaptiv bursts and levels of convergence and coevolution among residues, particularly in multigene families. This research is motivated by insights from previous research. First, it is expected from evolutionary simulations that substitution probabilities at individual positions in a protein fluctuate in time due to epistasis (interactions with substitutions at other sites in the same or other proteins). These expectations are supported by strong evidence that substitution processes do regularly fluctuate with time in real proteins. However, current models of protein evolution do not usually allow substitution processes to fluctuate with time, and levels of amino acid convergence in proteins deviate substantially from expectations for such models. Because of this, incorporating such fluctuations is a key feature of the proposed models. Second, current approaches that incorporate structure into evolutionary studies tend to use de novo prediction or pseudo energy potentials to predict the acceptability of substitutions, but these methods are not especially accurate for evolutionary analysis, which includes sequences that have diverged substantially from the sequences of known protein structures. To account for this, rather than allowing such predictions to stand alone, they will be incorporated probabilistically into empirica substitution models to varying degrees depending on expected predictive accuracy and distance from any sequences with known structure. Third, a Bayesian approach to building complex evolutionary models was recently developed that is designed to allow relatively easy computation of processes that fluctuate among sites and over time. This approach using what is called partial sampling of substitution histories makes the proposed methodology feasible. It is expected that the proposed study will make significant improvements in understanding of molecular evolution and how it relates to structure and function. One expected result of this study will be better predictions of mutational effects, which will lead to an improved ability to identify disease-causing mutations in human genome and exome sequencing studies. It is further expected that predictions of structural features when they are unknown will be improved, and researchers will be able to better understand how ancestral functional changes in proteins have arisen through adaptive sequence change.

Public Health Relevance

The proposed research is relevant to public health because it will develop new and more accurate methods for extracting information from comparative genomic data that will inform on protein structure and function and how they relate to phenotypes of disease-related mutations in humans. Such predictions will be useful in genetic studies of human disease, and also in studies that may attempt to modify protein function through drugs to ameliorate disease. In general, it will improve our basic understanding of how and why proteins work the way they do, improving our ability to make intelligent decisions in protein-related health research.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM083127-06
Application #
9262235
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Janes, Daniel E
Project Start
2009-03-01
Project End
2020-02-29
Budget Start
2017-03-01
Budget End
2018-02-28
Support Year
6
Fiscal Year
2017
Total Cost
$321,926
Indirect Cost
$108,176
Name
University of Colorado Denver
Department
Biochemistry
Type
Schools of Medicine
DUNS #
041096314
City
Aurora
State
CO
Country
United States
Zip Code
80045
Goldstein, Richard A; Pollock, David D (2017) Sequence entropy of folding and the absolute rate of amino acid substitutions. Nat Ecol Evol 1:1923-1930
Goldstein, Richard A; Pollock, David D (2016) The tangled bank of amino acids. Protein Sci 25:1354-62
Goldstein, Richard A; Pollard, Stephen T; Shah, Seena D et al. (2015) Nonadaptive Amino Acid Convergence Rates Decrease over Time. Mol Biol Evol 32:1373-81
Li, Cai; Zhang, Yong; Li, Jianwen et al. (2014) Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment. Gigascience 3:27
Wacholder, Aaron C; Cox, Corey; Meyer, Thomas J et al. (2014) Inference of transposable element ancestry. PLoS Genet 10:e1004482
Castoe, Todd A; de Koning, A P Jason; Hall, Kathryn T et al. (2013) The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. Proc Natl Acad Sci U S A 110:20645-50
Nakayama, Maki; Castoe, Todd; Sosinowski, Tomasz et al. (2012) Germline TRAV5D-4 T-cell receptor sequence targets a primary insulin peptide of NOD mice. Diabetes 61:857-65
de Koning, A P Jason; Gu, Wanjun; Castoe, Todd A et al. (2012) Phylogenetics, likelihood, evolution and complexity. Bioinformatics 28:2989-90
Pollock, David D; Thiltgen, Grant; Goldstein, Richard A (2012) Amino acid coevolution induces an evolutionary Stokes shift. Proc Natl Acad Sci U S A 109:E1352-9
Yokoyama, Ken Daigoro; Pollock, David D (2012) SP transcription factor paralogs and DNA-binding sites coevolve and adaptively converge in mammals and birds. Genome Biol Evol 4:1102-17

Showing the most recent 10 out of 19 publications