Although genome-wide association studies (GWAS) have identified thousands of disease susceptibility loci, the underlying genetic structure in these regions is not fully studied and it is likely that the GWAS signal originates from one or many yet unidentified causal variants. In order to localize potential causal variant(s) for further follow-u experiments, fine-mapping studies in large populations are underway. To date, fine-mapping studies have used standard approaches that fail to account for the full array of information currently available such as associations with gene expression (eQTLs) and genomic functional annotation. With the advent of large-scale initiatives such as The Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA), it may be possible to include an additional layer of functional information to fine-mapping studies, enhancing the ability to localize causal variants. We here propose to develop a statistical framework that will incorporate both functional and genetic information. We will build variant-specific priors based on cell-specific functional annotation (e.g. DNase I hypersensitive sites, protein coding), associations with tissue-specific gene expression and correlated phenotypes. We will capitalize on the publically available ENCODE data to acquire functional annotation for each genetic variant. We will then estimate posterior probabilities for each genetic variant based on their derived prior an the evidence for association with the outcome of interest. Such posterior probabilities can then be used to prioritize genetic variants for further follow-up in a laboratory setting. Compared to existing approaches, our proposed method is unique in that it will jointly model internal (e.g. sequencing and gene expression data) and external (e.g. ENCODE, TCGA) sources. It will also allow for multiple causal variants at each region and jointly assess all loci simultaneously, allowing the method to "borrow" information between the regions. To ensure generalizability, we will conduct extensive simulation studies taking numerous possible scenarios into account. We will apply our method on a multi-ethnic breast cancer targeted sequencing dataset of 2,288 breast cancer cases and 2,323 controls for whom we have generated high-depth sequencing data for 12 GWAS-identified breast cancer regions. For a subset of these women, we also have mammographic density (n=1,000) and whole-genome expression data (n=250) in both normal and tumor tissue, allowing us to apply our method and jointly model empirical sequencing, gene expression and phenotype data. We have assembled a multi-disciplinary research team with a track record of producing high-profile publications in fine-mapping, statistical methods, breast cancer epidemiology, population genetics and publicly available software packages for the genetics community. Our work has the potential of bridging the gap between initial screening for regions in the genome that are associated with disease and prioritizing specific variants for further functional analysis. Such methods will have important implications for understanding the underlying biology of disease, a major challenge in the post-GWAS era.

Public Health Relevance

Genome-wide association studies (GWAS) have identified thousands of genetic regions involved in disease but the specific causal genetic variants within each region remain unknown. We here propose a novel statistical approach to fine-mapping that will prioritize plausible causal variants based on recent functional mapping of the genome and high coverage sequencing data. Our methods will close the gap between initial screening of the genome and nominating specific potentially causal genetic variants, one of the grand challenges in the post-GWAS era.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21CA182821-01A1
Application #
8753749
Study Section
Special Emphasis Panel (ZCA1-SRLB-D (M1))
Program Officer
Mechanic, Leah E
Project Start
2014-08-15
Project End
2016-07-31
Budget Start
2014-08-15
Budget End
2015-07-31
Support Year
1
Fiscal Year
2014
Total Cost
$222,955
Indirect Cost
$69,575
Name
Harvard University
Department
Public Health & Prev Medicine
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115
Kichaev, Gleb; Yang, Wen-Yun; Lindstrom, Sara et al. (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10:e1004722