Large-scale genome-wide association studies (GWAS) have highlighted that heritability explained by common variants is concentrated into non-coding functional annotations that are often cell-type or tissue specific. However, the leveraging of non-coding regulatory variants to detect new disease genes or gene sets is largely unknown. In this proposal, I will investigate the effects of non-coding variants in lower frequency architecture by developing a new statistical method partitioning heritability of low-frequency variants. Then, I will use this method to connect functional heritability to genes, in order to increase the statistical power to detect genes and gene sets enriched in coding and non-coding disease variants. My K99 training will be conducted at the Harvard T.H. Chan School of Public Health, as well as the Broad Institute, under the mentorship of Dr. Alkes Price. The key areas of my training will be: development of models for partitioning heritability explained by low-frequency variants across functional annotations (including gene set annotations); analyses of large-scale GWAS and whole genome sequencing datasets; and joint analyses of multiple large functional genomics datasets. The long-term goal of this research is to produce functional annotations and software that will enable geneticists to analyze large GWAS and whole genome sequencing datasets, in order to make discoveries that will improve our biological knowledge of human diseases.
The first aim of this proposal is to develop a method for partitioning the heritability of common and low- frequency variants across functional annotations. I will apply this method on large GWAS data sets, and will use the results to fit an evolutionary model that will predict the distribution of rare variant effect sizes for each annotation.
The second aim i s to determine the best strategy to connect functional heritability to genes. I will compare different strategies using Hi-C data, conserved annotations, and other functional data to connect functional elements to genes and determine which strategy is maximally informative for trait heritability. Then, I will use this strategy to identify gene sets enriched for heritability.
The third aim will leverage insights from common and low-frequency variant enrichments estimated from large GWAS data sets (Aim 1) as well as insights on how to connect functional elements to a gene (Aim 2) to improve the statistical power of gene- based rare variant association tests. The new annotations and computational tools developed in this research proposal will be distributed to the scientific community.
The genetic architecture of human complex diseases is dominated by common non-coding regulatory variants. However, little is known about how to leverage this information to detect new disease genes or gene sets and elucidate the role of low-frequency non-coding variants. The proposed research will develop and apply new statistical methods to: (1) understand the effect of non-coding variants in lower frequency architecture, (2) connect non-coding heritability to genes to detect new disease gene sets, and (3) increase the statistical power of detecting genes enriched in rare coding and non-coding disease variants.