Complete vertebrate genomes are accumulating rapidly, and the pace of accumulation will only increase. This is excellent news, because the utility of comparative analysis depends heavily on the diversity of species sampling. There are, however, substantial challenges to exploiting the full potential of such extensive data: development of novel methods and analytical approaches is needed.
We aim to develop and extend our capacity to analyze the dynamic evolutionary processes (across regions and through time) that have shaped extant genomes. We will achieve this goal using a Bayesian evolutionary analysis approach we recently developed that allows us many orders of magnitude speed advantage over competing approaches, and which scales well with model complexity and data size. Many of the studies we propose are based on biologically realistic paradigms that previously were impossible to consider or test because of computational limitations. We propose to comprehensively delineate the repetitive contents of a selected set of vertebrate genomes, including annotation of ancient elements from the """"""""dark matter"""""""" of genomes (the currently unannotated portion). The transposable elements in this set of repeat sequences will be used to build the first complete genome-wide models of context-dependent substitution processes. We will consider contexts such as recombination, rearrangement, expression, and local nucleotide content, as well as unknown contexts, and analyze how the evolutionary processes influenced by these contexts have changed over time. These context- dependent substitution models will provide a powerful tool for identifying and annotating functional regions in interspecific comparisons of vertebrate genomes, and for differentiating and characterizing fitness-based effects in proteins. The core concept is that that if we better understand genome-wide patterns of background nucleotide substitution, then we will be able to more accurately identify genomic regions that are likely functional, and to understand how selection directs the evolution of proteins.

Public Health Relevance

The proposed research is relevant to public health because it will develop new methods for understanding and interpreting vertebrate (and human) genomes, and for identifying genomic regions that are functionally important and thus relevant to phenotype and disease. The project is relevant to the NIH mission because it will provide methods for extracting information from comparative genomic data that will inform the structure and function of genomes, and how they relate to phenotypes of disease-related mutations in humans.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Colorado Denver
Schools of Medicine
United States
Zip Code
Goldstein, Richard A; Pollock, David D (2017) Sequence entropy of folding and the absolute rate of amino acid substitutions. Nat Ecol Evol 1:1923-1930
Goldstein, Richard A; Pollock, David D (2016) The tangled bank of amino acids. Protein Sci 25:1354-62
Andrew, Audra L; Card, Daren C; Ruggiero, Robert P et al. (2015) Rapid changes in gene expression direct rapid shifts in intestinal form and function in the Burmese python after feeding. Physiol Genomics 47:147-57
Hoen, Douglas R; Hickey, Glenn; Bourque, Guillaume et al. (2015) A call for benchmarking transposable element annotation methods. Mob DNA 6:13
Goldstein, Richard A; Pollard, Stephen T; Shah, Seena D et al. (2015) Nonadaptive Amino Acid Convergence Rates Decrease over Time. Mol Biol Evol 32:1373-81
Pollock, David D; Goldstein, Richard A (2014) Strong evidence for protein epistasis, weak evidence against it. Proc Natl Acad Sci U S A 111:E1450
Li, Cai; Zhang, Yong; Li, Jianwen et al. (2014) Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment. Gigascience 3:27
Green, Richard E; Braun, Edward L; Armstrong, Joel et al. (2014) Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science 346:1254449
Wacholder, Aaron C; Cox, Corey; Meyer, Thomas J et al. (2014) Inference of transposable element ancestry. PLoS Genet 10:e1004482
Vonk, Freek J; Casewell, Nicholas R; Henkel, Christiaan V et al. (2013) The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. Proc Natl Acad Sci U S A 110:20651-6

Showing the most recent 10 out of 14 publications