Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation, particularly in non-coding regions. To address this challenge, we recently developed a novel framework, Combined Annotation Dependent Depletion (CADD), for estimating the deleteriousness of any genetic variant. CADD defines an objective, data-rich, and quantitative integration of many genomic annotations into a single measure of variant effect at the organismal level. The goals of this R01 proposal are to further develop the CADD framework, to apply it in the context of ongoing genetic studies of both rare and common human diseases, and to experimentally evaluate its predictions.
In Specific Aim 1, we will substantially modify CADD in both straightforward and creative ways, with the goal of dramatically improving CADD's ability to annotate non- coding variants, not only to estimate their organismal effects but also to provide insights into molecular mechanisms.
In Specific Aim 2, we will apply CADD to a variety of ongoing whole genome sequencing studies of human disease, especially those in which non-coding variants are either known or suspected to be causal. As part of this effort, we will develop new statistical frameworks that directly incorporat CADD into traditional genome-wide discovery approaches.
In Specific Aim 3, we will perform a combination of high-throughput (massively parallel reporter assays), medium-throughput (CRISPR/Cas9), and low-throughput (in vivo mouse transgenics) experimental assays for systematic and targeted assessment of CADD predictions. This proposal includes both computational and experimental innovations, and builds on established collaborative relationships between investigators with complementary strengths. The completion of our aims will yield novel methods, data, and resources with which to annotate whole genome sequences, broadly enabling the field to more effectively identify and mechanistically understand non-coding genetic variants that are causally relevant to human disease.
As we enter an era of personalized medicine, a deep understanding of human genomes will be increasingly important to public health, contributing to the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. However, our limited understanding of the functional consequences of most genetic variants, especially those that do not alter protein sequence, represents a major obstacle. This proposal seeks to dramatically improve our ability to identify and interpret 'non-coding' variants that causally contribute to human disease. A recently developed computational approach will be substantially improved and evaluated in a variety of genetic studies, and its predictions will be experimentally validated. This project will provide much needed methods and resources to address the looming analytical challenges associated with individual whole genome sequencing in both biomedical research and patient care.
Showing the most recent 10 out of 20 publications