Prioritizing follow-up of GWAS loci using genetic and functional annotation data

Lindstroem, Sara

Abstract

Although genome-wide association studies (GWAS) have identified thousands of disease susceptibility loci, the underlying genetic structure in these regions is not fully studied and it is likely that the GWAS signal originates from one or many yet unidentified causal variants. In order to localize potential causal variant(s) for further follow-u experiments, fine-mapping studies in large populations are underway. To date, fine-mapping studies have used standard approaches that fail to account for the full array of information currently available such as associations with gene expression (eQTLs) and genomic functional annotation. With the advent of large-scale initiatives such as The Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA), it may be possible to include an additional layer of functional information to fine-mapping studies, enhancing the ability to localize causal variants. We here propose to develop a statistical framework that will incorporate both functional and genetic information. We will build variant-specific priors based on cell-specific functional annotation (e.g. DNase I hypersensitive sites, protein coding), associations with tissue-specific gene expression and correlated phenotypes. We will capitalize on the publically available ENCODE data to acquire functional annotation for each genetic variant. We will then estimate posterior probabilities for each genetic variant based on their derived prior an the evidence for association with the outcome of interest. Such posterior probabilities can then be used to prioritize genetic variants for further follow-up in a laboratory setting. Compared to existing approaches, our proposed method is unique in that it will jointly model internal (e.g. sequencing and gene expression data) and external (e.g. ENCODE, TCGA) sources. It will also allow for multiple causal variants at each region and jointly assess all loci simultaneously, allowing the method to """"""""borrow"""""""" information between the regions. To ensure generalizability, we will conduct extensive simulation studies taking numerous possible scenarios into account. We will apply our method on a multi-ethnic breast cancer targeted sequencing dataset of 2,288 breast cancer cases and 2,323 controls for whom we have generated high-depth sequencing data for 12 GWAS-identified breast cancer regions. For a subset of these women, we also have mammographic density (n=1,000) and whole-genome expression data (n=250) in both normal and tumor tissue, allowing us to apply our method and jointly model empirical sequencing, gene expression and phenotype data. We have assembled a multi-disciplinary research team with a track record of producing high-profile publications in fine-mapping, statistical methods, breast cancer epidemiology, population genetics and publicly available software packages for the genetics community. Our work has the potential of bridging the gap between initial screening for regions in the genome that are associated with disease and prioritizing specific variants for further functional analysis. Such methods will have important implications for understanding the underlying biology of disease, a major challenge in the post-GWAS era.

Public Health Relevance

Genome-wide association studies (GWAS) have identified thousands of genetic regions involved in disease but the specific causal genetic variants within each region remain unknown. We here propose a novel statistical approach to fine-mapping that will prioritize plausible causal variants based on recent functional mapping of the genome and high coverage sequencing data. Our methods will close the gap between initial screening of the genome and nominating specific potentially causal genetic variants, one of the grand challenges in the post-GWAS era.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21CA182821-01A1
Application #: 8753749
Study Section: Special Emphasis Panel (ZCA1-SRLB-D (M1))
Program Officer: Mechanic, Leah E

Project Start: 2014-08-15
Project End: 2016-07-31
Budget Start: 2014-08-15
Budget End: 2015-07-31
Support Year: 1
Fiscal Year: 2014
Total Cost: $222,955
Indirect Cost: $69,575

Institution

Name: Harvard University
Department: Public Health & Prev Medicine
Type: Schools of Public Health
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2015 R21 CA	Prioritizing follow-up of GWAS loci using genetic and functional annotation data Lindstroem, Sara / Harvard University
NIH 2015 R21 CA	Prioritizing follow-up of GWAS loci using genetic and functional annotation data Lindstroem, Sara / University of Washington
NIH 2014 R21 CA	Prioritizing follow-up of GWAS loci using genetic and functional annotation data Lindstroem, Sara / Harvard University	$222,955

Publications

Kichaev, Gleb; Roytman, Megan; Johnson, Ruth et al. (2017) Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33:248-255

Lindström, Sara; Ablorh, Akweley; Chapman, Brad et al. (2016) Deep targeted sequencing of 12 breast cancer susceptibility regions in 4611 women across four different ethnicities. Breast Cancer Res 18:109

Finucane, Hilary K; Bulik-Sullivan, Brendan; Gusev, Alexander et al. (2015) Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47:1228-35

Kichaev, Gleb; Pasaniuc, Bogdan (2015) Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. Am J Hum Genet 97:260-71

Kichaev, Gleb; Yang, Wen-Yun; Lindstrom, Sara et al. (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10:e1004722

Comments

Be the first to comment on Sara Lindstroem's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: