In order to characterize the molecular and cellular causes of human disease, it will be essential to unravel the functional impact of genetic variation. However, we are currently unable to predict the impact of the majority genetic variants that lie in non-coding regions of the genome, where indeed most complex disease-associated variants are found. Additionally, recent evidence suggests that a significant fraction of the non-coding genome is likely to be functional, often playing a role in gene regulation. Therefore, our limited understanding of non- coding variation is a critical hurdle to characterizing the genetic basis of disease. The goal of this project is to develop methods for interpreting non-coding genetic variation: to provide a robust and extensible Bayesian method for predicting causal variants from full genomes, to identify and validate a large set of functional non- coding variants using CRISPR technology, and to predict disease-relevant traits likely to be affected by each variant. Our project will leverage a unique cohort from a founder population in Sardinia, with genome sequence and/or transcriptome data available from 3000 individuals, along with extensive phenotyping for hundreds of traits. We will combine advanced statistical modeling with experimental validation based on genome engineering to identify causal non-coding variants affecting biomedical traits in the cohort, along with predicting functional mechanisms through which these variants ultimately perturb the cell.
In Aim 1, we develop computational methods for predicting causal non-coding variation from full genomes, incorporating informative genomic features including epigenetic data, sequence motifs, and conservation information into a Bayesian approach jointly modeling multiple transcriptomic signals. We will optimize and apply these methods on genome and transcriptome data available for the Sardinia cohort to identify a large set of variants predicted to causally affect gene expression. Based on these predictions, in Aim 2, we connect putative causal variants with the diverse set of disease-relevant traits measured in the cohort, using network inference to capture the cascade from genetic variation to gene expression to disease. We will develop methods to integrate across variants, using the models in Aim 1, to identify the common causal mechanisms related to each trait.
In Aim 3, we validate the causal impact of non-coding variants predicted to affect high-level traits. We will us genome editing through CRISPR to introduce individual genetic variants into cell lines and use qPCR to validate the predicted effects on gene expression. Finally, a major goal throughout this proposal will be to provide the research community with convenient computational tools for the prediction of causal non-coding variants from individual genomes, updated on an ongoing basis to integrate the most recent genomic annotations and public data in order to provide the best possible accuracy in predicting causal variants and the traits they are likely to affect. Our projet will greatly advance our understanding of non-coding genetic variation, the specific mechanisms affected by causal variants, and the downstream consequences to the cell and individual health.
Understanding the impact of variation in the entire genome, beyond the well-studied protein-coding regions, is essential to understanding the relationship between genetics and human health. This proposal addresses the problem of identifying functional non-coding genetic variants and predicting the impact of each variant on hundreds of disease-relevant traits. Our approach will focus on integrative, transformative methods for understanding mechanisms underlying the function of the human genome.
Showing the most recent 10 out of 19 publications