Understanding the genetic basis of human disease will require a deep understanding of genetic effects on gene expression. The vast majority of disease-risk loci are non-coding, so in order to link them to target genes, cellular pathways, and cell types, we seek to identify which genes? expression they disrupt and under what conditions. Population studies of gene expression have now provided thousands of ?expression quantitative trait loci? (eQTLs) where individual genetic variants are associated with expression of a target gene. While eQTL studies across tissues and populations have served as a valuable resource for querying the likely gene targets of disease loci, key obstacles remain. First, eQTL studies simply do not address rare genetic variation, thus excluding evaluation of tens of thousands of variants per individual whole genome sequence, and many known pathogenic loci. Second, even among common variants, it is estimated that over half of disease loci do not coincide with any known eQTL, even based on current multi-tissue data. The remainder of disease loci and rare variants require new data and statistical methods in order to characterize their mechanisms. Here, we propose a research agenda to decipher the complex impact of regulatory genetic variation across the frequency spectrum. 1) First, we will pursue analysis of rare genetic variation and statistical methods for personal whole genome interpretation. Current methods simply do not provide confident predictions for the majority of the variants from whole genome sequencing, and the overall impact of rare regulatory variation on human disease is unknown. We will investigate the use of personal RNA-seq and other functional data to complement whole genome sequence (WGS) in the evaluation of rare variant impact, personal genome interpretation for rare disease patients, and incorporation of rare variants into population studies and genetic risk scores. 2) Second, we will consider common disease variants that are not characterized by current eQTL studies, which almost all use static, adult tissue samples and bulk RNA-seq data. Genetic effects on gene expression are not static, but rather vary over time, cell type, and environment, complicating the identification of disease mechanism. Some disease loci may have only transient effects on a proximal gene?s expression during development, for example. We will study temporally dynamic and context-specific genetic effects. In a novel study, we will evaluate genetic effects on individual cell types and states during cellular differentiation using time-series single-cell RNA-seq across individuals. We will also evaluate dynamic genetic effects during disease progression based on patient longitudinal data. Combined with novel statistical methods, these will provide a map of genetic effects over cell-type, time, and context that may better explain disease loci. All data, methods, and software will be made publicly available. Our work will provide a greater understanding of regulatory genetic effects for both common and rare variants, enabling improved identification of the mechanisms underlying heritable disease.
Determining the genetic factors underlying human disease will support needed development of therapeutics and interventions. A critical step toward this is identifying the biological mechanisms and target genes that disease-associated genetic loci perturb, especially considering that most disease loci are not within the boundaries of any known gene. This proposal describes an integrative approach leveraging functional data to determine the target genes, cell types, and pathways affected by both rare and common genetic variation in the human population.