Genome-wide association studies (GWAS) have been highly successful identifying loci associated with complex traits, including prostate cancer (PrCa). Although these newly discovered loci have shed some light into the underlying biology, the clinical impact for preventing or treating PrCa has been limited. Furthermore, the known variation fails to explain a majority of PrCa. GWAS primarily focus on common genetic variation (MAF>5%), however, empirical evidence suggests low-frequency variants play a role in cancer, including PrCa. Although low-frequency variants will explain a portion of the missing heritability for PrCa and could lead to actionable targets for prevention or treatment, current discovery strategies are not feasible. Whole-genome sequencing studies are too costly for the sample sizes needed to adequately evaluate low-frequency variants in PrCa. Innovative strategies are needed. Several studies have demonstrated that the accuracy of imputation for low-frequency variants improves with increasing size of the reference panel. Here, we propose the following:
AIM 1 - Assemble an enhanced reference panel of over 10,000 whole-genomes and perform imputation for 83,991 PrCa cases and 58,430 disease-free controls assembled from available GWAS data (Aim 1a). Using the above data, define operating characteristics of imputation for less common and rare variants using existing GWAS data. Clarify statistical relationship between GWAS marker frequency spectrum, target market frequency spectrum, reference panel size, and imputation success as measured by imputation R2, determine for above data a target set of imputable common, less common, and rare variants (Aim 1b);
AIM 2 - Perform a whole-genome scan of imputable common, less common, and rare variants associated with overall and aggressive PrCa susceptibility and PSA levels among disease-free controls;
AIM 3 - Develop novel statistical approaches to improve estimation and partition of heritability by incorporating genomic annotation (e.g. exonic, regulatory regions, pathways, etc.) while accounting for imputation error to improve risk prediction models. Recently, we performed a meta-analysis for PrCa utilizing GWAS data imputed to the 1000 Genomes Project for 31,663 PrCa cases and 35,870 disease-free controls identifying over 30 common novel susceptibility loci that are currently being validated. Furthermore, we will be genotyping an additional 90,000 individuals of European ancestry (2/3 PrCa cases and 1/3 controls) using a customized array with a GWAS backbone as part of the NCI Genetic Associations and Mechanisms in Oncology "post-GWAS" initiative (GAME-ON). Our recent experience demonstrates our ability to implement imputation in a large data resource and our strong collaborations to assemble the world's largest PrCa case-control series with available GWAS data. We are collaborating with several groups to coordinate efforts assembling an enhanced reference panel for imputation and will provide access to the research community. This proposal not only addresses a fundamental question in PrCa but will greatly aid quantifying the role of genetics in other complex phenotypes.
Prostate cancer (PrCa) is the second most common cancer worldwide among men with very few known risk factors. Genome-wide association studies (GWAS) have successfully demonstrated common variation plays a central role in PrCa etiology but known loci fail to explain the inherited component entirely. Utilizing a cost- efficient stratey we will evaluate low-frequency variation by leveraging available resources to identify the missing heritability of PrCa and identify potential therapeutic targets.