In order to characterize the molecular and cellular causes of human disease, it will be essential to unravel the functional impact of genetic variation. However, we are currently unable to predict the impact of the majority genetic variants that lie in non-coding regions of the genome, where indeed most complex disease-associated variants are found. Additionally, recent evidence suggests that a significant fraction of the non-coding genome is likely to be functional, often playing a role in gene regulation. Therefore, our limited understanding of non- coding variation is a critical hurdle to characterizing the genetic basis of disease. The goal of this project is to develop methods for interpreting non-coding genetic variation: to provide a robust and extensible Bayesian method for predicting causal variants from full genomes, to identify and validate a large set of functional non- coding variants using CRISPR technology, and to predict disease-relevant traits likely to be affected by each variant. Our project will leverage a unique cohort from a founder population in Sardinia, with genome sequence and/or transcriptome data available from 3000 individuals, along with extensive phenotyping for hundreds of traits. We will combine advanced statistical modeling with experimental validation based on genome engineering to identify causal non-coding variants affecting biomedical traits in the cohort, along with predicting functional mechanisms through which these variants ultimately perturb the cell.
In Aim 1, we develop computational methods for predicting causal non-coding variation from full genomes, incorporating informative genomic features including epigenetic data, sequence motifs, and conservation information into a Bayesian approach jointly modeling multiple transcriptomic signals. We will optimize and apply these methods on genome and transcriptome data available for the Sardinia cohort to identify a large set of variants predicted to causally affect gene expression. Based on these predictions, in Aim 2, we connect putative causal variants with the diverse set of disease-relevant traits measured in the cohort, using network inference to capture the cascade from genetic variation to gene expression to disease. We will develop methods to integrate across variants, using the models in Aim 1, to identify the common causal mechanisms related to each trait.
In Aim 3, we validate the causal impact of non-coding variants predicted to affect high-level traits. We will us genome editing through CRISPR to introduce individual genetic variants into cell lines and use qPCR to validate the predicted effects on gene expression. Finally, a major goal throughout this proposal will be to provide the research community with convenient computational tools for the prediction of causal non-coding variants from individual genomes, updated on an ongoing basis to integrate the most recent genomic annotations and public data in order to provide the best possible accuracy in predicting causal variants and the traits they are likely to affect. Our projet will greatly advance our understanding of non-coding genetic variation, the specific mechanisms affected by causal variants, and the downstream consequences to the cell and individual health.

Public Health Relevance

Understanding the impact of variation in the entire genome, beyond the well-studied protein-coding regions, is essential to understanding the relationship between genetics and human health. This proposal addresses the problem of identifying functional non-coding genetic variants and predicting the impact of each variant on hundreds of disease-relevant traits. Our approach will focus on integrative, transformative methods for understanding mechanisms underlying the function of the human genome.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Liu, Boxiang; Pjanic, Milos; Wang, Ting et al. (2018) Genetic Regulatory Mechanisms of Smooth Muscle Cells Map to Coronary Artery Disease Risk Loci. Am J Hum Genet 103:377-388
Liu, Nian; Lee, Cameron H; Swigut, Tomek et al. (2018) Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553:228-232
Kernohan, Kristin D; Fr├ęsard, Laure; Zappala, Zachary et al. (2017) Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum Mutat 38:611-614
Chiang, Colby; Scott, Alexandra J; Davis, Joe R et al. (2017) The impact of structural variation on human gene expression. Nat Genet 49:692-699
Parsana, Princy; Amend, Sarah R; Hernandez, James et al. (2017) Identifying global expression patterns and key regulators in epithelial to mesenchymal transition through multi-study integration. BMC Cancer 17:447
Li, Xin; Kim, Yungil; Tsang, Emily K et al. (2017) The impact of rare variation on gene expression across tissues. Nature 550:239-243
Saha, Ashis; Kim, Yungil; Gewirtz, Ariel D H et al. (2017) Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res 27:1843-1858
Knowles, David A; Davis, Joe R; Edgington, Hilary et al. (2017) Allele-specific expression reveals interactions between genetic variation and environment. Nat Methods 14:699-702
Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne et al. (2017) Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans. Genome Med 9:98
McAllister, Kimberly; Mechanic, Leah E; Amos, Christopher et al. (2017) Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 186:753-761

Showing the most recent 10 out of 19 publications