A central challenge in understanding the genetic origins of disease is the inability to isolate which regions of the genome are functionally responsible for the aberrant expression of genes. To date, nearly 85% of identified disease-causing mutations lie within protein-coding exons (i.e. the ?exome?), which comprises 2% of the human genome. Yet an estimated 50-75% of Mendelian disorders, and an even greater proportion of non-Mendelian (i.e. polygenic) diseases, have unexplained genetic etiologies which are suspected to involve genetic variants in the remaining 98% of the human non-coding genome. Over the past 5 years, new genome-scale technologies have uncovered the existence of ~400,000 enhancer-like regions. Mutations in these regions are suspected to be a major source of the misregulation of gene expression levels, which can in turn manifest in disease. Nevertheless, the vast majority of these regions have never been directly tested for their ability to activate transcription, nor have they been definitively linked to the regulation of target genes. The K99 training phase of this award entails the development of a new generation of massively parallel reporter assay (MPRA) technologies that can interrogate the functional activity of 10,000-100,000 enhancers with high precision and reproducibility, an order of magnitude more than is currently possible (Aim 1). Coordinated with this effort will be the quantitative modeling of biological determinants that are predictive of enhancer activity (Aim 2). Complementing Aims 1 and 2 is the development of models designed to infer enhancer-promoter regulatory interactions. Towards this goal, self-attentive models, derived from the field of computational linguistics, will be trained to learn how the epigenetic marks and transcription factor binding events associated with distal enhancers contribute to gene expression levels in a diversity of cell types (Aim 3). As this work transitions into the R00 independent phase of the award, deep convolutional neural networks will be trained to learn how underlying DNA sequences encode epigenetic and transcription factor binding information. This would thereby generate a mathematical function which links DNA sequence directly to gene expression levels, which would help to predict how specific genetic variants in distal enhancers might perturb the mRNA levels of target genes. These predictions will help to inform?at single nucleotide resolution?which genetic variants identified by genome-wide association studies are causally linked to disease (Aim 4). Collectively, these aims will give insight into the cis-regulatory logic encoded in DNA that specifies mRNA abundance. The methods developed herein will lay a quantitative framework with which to evaluate enhancer function, prioritize which genetic variants are likely to be associated with disease, and shed light onto the elusive functions of the non-coding regions of the human genome.

Public Health Relevance

The goals of this project are to develop a technology that can measure the activity of >10,000 enhancers in parallel, and to devise quantitative models that describe the impact of enhancers on gene expression levels. Achieving these goals will give insight into the cis-regulatory logic encoded in DNA that specifies mRNA abundance. We anticipate that these methods will shed light upon the elusive functions of non-coding regions in the human genome, and help to dissect the molecular origins of diverse genetic diseases.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Career Transition Award (K99)
Project #
1K99HG010662-01
Application #
9804719
Study Section
National Human Genome Research Institute Initial Review Group (GNOM)
Program Officer
Gilchrist, Daniel A
Project Start
2019-08-02
Project End
2021-07-31
Budget Start
2019-08-02
Budget End
2020-07-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Washington
Department
Genetics
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195