Understanding the genetic causes of human disease has immense potential to benefit human health. The human genetics community has devoted tremendous resources to identifying those causes, including, most recently, whole genome sequencing of patient cohorts. Those studies have found genetic variation in non-coding regions of the genome to be most often associated with diseases and drug responses. Unfortunately, since the effects of genetic variation on gene regulation remain poorly understood and difficult to study at the genome-wide scale, the full benefit of most of those studies has yet to be realized. Our long-term goal is to understand how non-coding genetic variants act through gene regulatory elements to influence phenotypes. The objective of this proposal, a step towards that long-term goal, is to develop a platform of empirical and statistical methods to reliably and systematically determine the regulatory mechanisms underlying human traits and diseases. Specifically, in Aim 1, we will use high- throughput reporter assays to quantify the effects of millions of human genetic variants on regulatory element activity. Those variants will represent diverse human ancestries, and will cover over 60% of all regions associated with a trait or disease via GWAS. The outcome will be the most extensive catalog of human regulatory variation every created.
In Aim 2, we will develop new technologies to systematically relate those changes in regulatory element activity to changes in gene expression. That technology will combine our previous work developing CRISPR-Cas9- based epigenome editing screens with targeted single-cell RNA-seq.
In Aim 3 we will develop statistical analyses to integrate the effects of regulatory variants to infer changes in gene expression and differences in phenotypes between individuals. The resulting method will be analogous to gene based association tests, but for the noncoding genome. The expected outcomes of this project are (i) dramatically improved ability to establish mechanisms underlying non-coding associations with human traits and diseases; (ii) better understanding of the genetic architecture of regulatory element activity and gene regulation that will guide the design and interpretation of future genetic association studies; and (iii) novel reagents, protocols, and software that other labs can use to complete similar investigations of their own model systems of interest. Taken together, we expect that this project will be a major step towards fully realizing the potential of genome wide and whole genome association studies.

Public Health Relevance

Much of human disease is caused by genetic variation that changes the regulation of disease-related genes. This project seeks to develop new information, resources, and technologies for determining how non-coding genetic variation changes gene regulation. In doing so, we aim to reveal new mechanisms of disease that can be targeted therapeutically in the future.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Duke University
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code