To be fully understood, the human genome must be considered in the context of evolution. The activities that have dominated human genomics for three decades ? such as genome sequencing and annotation, interrogation with high-throughput biochemical assays, and the identification of associations between genetic variants and diseases ? have been enormously informative, but these descriptive studies must eventually be understood within the theoretical framework of evolutionary genetics. We must continue to press forward from the what? to the why? and how? of human genetics. The goal of my laboratory is to interpret high-throughput genomic data from an evolutionary perspective. Drawing from ideas and techniques in molecular evolution, population genetics, statistics, and computer science, we aim both to understand the evolutionary forces that have shaped human genomes, and to use evolution to shed light on the phenotypic importance of particular sequences. Our recent activities have focused in three major areas: (1) reconstruction of features of human evolution based on genome sequences; (2) prediction of the fitness consequences of human mutations; and (3) the study of transcriptional regulation and its evolution in primates. We have reported major findings in each of these areas, including the existence of gene flow from early modern humans to Eastern Neandertals, a map of fitness consequences for mutations across the human genome, and an analysis showing that the architecture of transcription initiation is highly similar at enhancers and promoters in the human genome. Here we propose to extend our research substantially in each of these areas, working together with a broad range of experimental and theoretical collaborators. Our new goals include the development of improved methods for reconstructing human demography, with a focus on ancient gene flow; extensions of our ancestral recombination graph (ARG) sampling methods to accommodate much larger samples sizes, with applications in association mapping and the detection of natural selection; two complementary machine-learning approaches for improving the prediction of fitness consequences from sequence data; an experimental collaboration to leverage CRISPR-Cas9 screens in characterizing noncoding mutations; a multi-pronged study of the sequence determinants of RNA stability and their implications for the evolution of transcription units; and development of a new probabilistic model for turnover of regulatory elements. Together, these projects will address a wide variety of fundamental questions about the function and evolution of sequences in the human genome.
Vast quantities of genomic data are now available to describe patterns of genetic variation within human populations and across species, and various measures of biochemical activity along the human genome. These data need to be interpreted in light of the fundamental forces of mutation, recombination, natural selection, and genetic drift that have shaped genetic variation. This proposal describes a series of projects that make use of new computational, statistical, and theoretical methods to address fundamental questions in human evolutionary genetics, including how humans arose from our archaic hominin and ape cousins, how human populations diverged from one another, how new mutations influence human health and fitness, and how regulatory sequences contribute to unique aspects of human biology.
|Fang, Han; Huang, Yi-Fei; Radhakrishnan, Aditya et al. (2018) Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution. Cell Syst 6:180-191.e4|
|Danko, Charles G; Choate, Lauren A; Marks, Brooke A et al. (2018) Dynamic evolution of regulatory element ensembles in primate CD4+ T cells. Nat Ecol Evol 2:537-548|
|Mohammed, Jaaved; Flynt, Alex S; Panzarino, Alexandra M et al. (2018) Deep experimental profiling of microRNA diversity, deployment, and evolution across the Drosophila genus. Genome Res 28:52-65|
|Ramani, Ritika; Krumholz, Katie; Huang, Yi-Fei et al. (2018) PhastWeb: a web interface for evolutionary conservation scoring of multiple sequence alignments using phastCons and phyloP. Bioinformatics :|
|Gulko, Brad; Siepel, Adam (2018) An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences. Nat Genet :|