Complete vertebrate genomes are accumulating rapidly, and the pace of accumulation will only increase. This is excellent news, because the utility of comparative analysis depends heavily on the diversity of species sampling. There are, however, substantial challenges to exploiting the full potential of such extensive data: development of novel methods and analytical approaches is needed.
We aim to develop and extend our capacity to analyze the dynamic evolutionary processes (across regions and through time) that have shaped extant genomes. We will achieve this goal using a Bayesian evolutionary analysis approach we recently developed that allows us many orders of magnitude speed advantage over competing approaches, and which scales well with model complexity and data size. Many of the studies we propose are based on biologically realistic paradigms that previously were impossible to consider or test because of computational limitations. We propose to comprehensively delineate the repetitive contents of a selected set of vertebrate genomes, including annotation of ancient elements from the "dark matter" of genomes (the currently unannotated portion). The transposable elements in this set of repeat sequences will be used to build the first complete genome-wide models of context-dependent substitution processes. We will consider contexts such as recombination, rearrangement, expression, and local nucleotide content, as well as unknown contexts, and analyze how the evolutionary processes influenced by these contexts have changed over time. These context- dependent substitution models will provide a powerful tool for identifying and annotating functional regions in interspecific comparisons of vertebrate genomes, and for differentiating and characterizing fitness-based effects in proteins. The core concept is that that if we better understand genome-wide patterns of background nucleotide substitution, then we will be able to more accurately identify genomic regions that are likely functional, and to understand how selection directs the evolution of proteins.

Public Health Relevance

The proposed research is relevant to public health because it will develop new methods for understanding and interpreting vertebrate (and human) genomes, and for identifying genomic regions that are functionally important and thus relevant to phenotype and disease. The project is relevant to the NIH mission because it will provide methods for extracting information from comparative genomic data that will inform the structure and function of genomes, and how they relate to phenotypes of disease-related mutations in humans.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
3R01GM097251-03S1
Application #
8776584
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Krasnewich, Donna M
Project Start
2012-04-13
Project End
2016-01-31
Budget Start
2014-02-01
Budget End
2015-01-31
Support Year
3
Fiscal Year
2014
Total Cost
$58,311
Indirect Cost
$20,640
Name
University of Colorado Denver
Department
Biochemistry
Type
Schools of Medicine
DUNS #
041096314
City
Aurora
State
CO
Country
United States
Zip Code
80045
Pollock, David D; Goldstein, Richard A (2014) Strong evidence for protein epistasis, weak evidence against it. Proc Natl Acad Sci U S A 111:E1450
Wacholder, Aaron C; Cox, Corey; Meyer, Thomas J et al. (2014) Inference of transposable element ancestry. PLoS Genet 10:e1004482