Despite the successes of genome-wide association studies (GWAS), important challenges remain that still limit their impact on human biology and medicine, especially for non-coding variants which remain poorly understood. In this proposal, we exploit recent advances (many pioneered by our group) to overcome these challenges and gain a systematic understanding of the role of non-coding variants in human disease and complex traits. First, we develop new statistical methods that utilize high-resolution regulatory annotations to predict disease-relevant tissues, chromatin states, and regulatory motifs, and to prioritize non-coding variants more likely to have regulatory effects within regions of genetic association using epigenomic state information, comparative genomic information, and regulatory motif analysis (Aim 1). Second, we develop a new Bayesian methods for linking regulatory regions to their upstream regulators and downstream target genes by integrating genetic information across all associated regions in the context of regulatory networks that link regulators and regulatory regions using their correlated activity, regulatory motifs, and expression quantitative trait locus (eQTL) information (Aim 2). Third, we validate our methods and predictions using massively-parallel enhancer assays to test the effect of large number of regulatory variants in isolation; using genome editing technologies to test the effects of regulatory variants in their endogenous context; and using cellular phenotypes and animal models to test the physiological effects of regulatory variants at the cellular and organismal levels (Aim 3), and use the results to refine our computational methods and models. Even though our experimental validations are only performed for a small number of traits and cell types that are amenable to such studies, our methods are general and will be applied to all genetic studies available through ongoing collaborations and public catalogs.
Most human variants associated with disease are non-coding and largely uncharacterized, making it a great priority to understand the cell types in which they act and their mechanism of action. To address this challenge, we propose to develop methods to systematically study the regulatory impact of genetic variation by integrating genetic information from genome-wide association studies with genome annotations of regulatory elements across diverse tissues and cell types, regulatory motifs, and cellular circuits. We will systematically validate our predictions using next-generation tools for massively parallel assays and genome editing, in order to test the effect of non-coding variants on regulatory activity, gene expression, and molecular phenotypes associated with diabetes, heart disease, cancer, and neuropsychiatric disease.