Over the past decade there has been a rapid expansion of genome-wide association studies (GWAS), as well as the development of large-scale consortia like the UKBioBank and the All of Us project. While the number of genetic associations to human traits and disease is soaring, tools to characterize and interpret these variants are lacking. One challenge to realizing the potential of genomics is that over 99% of human genetic variation is non-coding, regulatory sequences. However, ?regulatory grammar? ? the complex pattern of sequences that interact with transcription factors to control gene expression, is poorly understood. A repertoire of well-characterized causal variants is needed to build generalizable models with which to unlock insights into the genetic basis of human health and history. Natural selection is a powerful driver of human genetic variation. As our species has encountered new climates, dramatic alterations in diet, and novel pathogens, these selective pressures have left hundreds of signatures of adaptation in our genomes, reflected in our species? diversity of disease risk and morphology. For selection to have acted positively on them, these adaptive alleles must exhibit relatively strong phenotypic effects, and they continue to contribute to modern traits and disease (e.g. height or sickle cell anemia). Salient examples of human adaptation include immunity, metabolism, and morphology, all of which have extensive, unresolved GWAS signals. This renders the lens of recent evolution a powerful, but underutilized, tool for identifying alleles that contribute to phenotypic variation in modern association studies. This proposal aims to expand the repertoire of well-characterized GWAS signals, by A) using evolution to prioritize adaptive variants, and B) applying novel, high-throughput experimental and computational tools to comprehensively decipher the functions of regulatory variants. These approaches will identify much needed causal variants, devise paradigms for their study, and inform future predictive models to characterize them. During the mentored phase of the K99, I will first develop methods to colocalize signals of selection and GWAS, and then use Variant Effect Predictions (VEP) to predict their function. I will then employ high-through methods such as a the massively parallel reporter assay and CRISPR non-coding screen to functionally characterize them directly. From the adaptive GWAS alleles our screens identify, we will make in-vivo system to more deeply characterize them during the Independent R00 phase. During this time I will deploy a variety of genomic tools such as ChIP, ChIA-PET, and RNA-seq to understand the adaptive variants? molecular etiology. I will use the empirical data fro these studies, and the MPRA/HCR-FlowFISH screens to build more accurate VEP models. !

Public Health Relevance

While thousands of genomic regions have been linked to human evolution and diseases, many of the genetic variants responsible are non-coding and thus difficult to interpret. I propose to identify adaptive human alleles underlying genome wide association studies and comprehensively characterize them using novel computational and experimental tools. I will then make in-vivo models of these to test their function and effects on fitness, improving future predictions of how genetic variants impact human evolution and health.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Career Transition Award (K99)
Project #
Application #
Study Section
National Human Genome Research Institute Initial Review Group (GNOM)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Broad Institute, Inc.
United States
Zip Code