Rare variants have been heralded as key to uncovering missing heritability"""""""" in complex diseases such as cancers. These variants can now be genotyped using next-generation sequencing technologies;nonetheless, rare haplotypes may also result from combination of common SNPs available from Genome-Wide Association Studies (GWAS). In this regard, there may be a great deal of treasure that are yet to be mined from the GWAS data to explore the common disease rare variant hypothesis. Recently, we have proposed an approach named Logistic Bayesian LASSO (LBL) to identify association with rare haplotypes in a case-control setting. LBL is an adaptation of the Bayesian counterpart of penalized regression approach LASSO. Our approach is able to weed out unassociated (especially common) haplotypes to achieve enough noise reduction so that the signals contained in the associated rare haplotypes can be more easily detected. Using LBL, we were able to implicate a specific rare haplotype for Age-related Macular Degeneration (AMD) in the Complement Factor H (CFH) gene for the first time. In addition to rare variants, gene-environment interaction (GXE) is believed to be another important contributor to missing heritability. LBL has a flexible framework that can incorporate non-genetic (environmental) covariates and gene- environment interactions. In this project we propose methods for exploring interactions between rare haplotypes and environmental factors in cancer epidemiology, rst in the setting of simple random sampling and then for stratified random sampling. We will develop methods both with and without the assumption of gene-environment independence. The methods will be extensively studied through simulations under a variety of settings. They will be applied to several cancer datasets available from NIH's database of Genotypes and Phenotypes (dbGaP) and the AMD data. Further, the method for stratified sampling will be used to analyze the NCI-sponsored Kidney Cancer Case-Control Study, wherein the controls were selected by stratified sampling using frequency matching with cases. We will implement the proposed methods in a well-documented user-friendly software and make it available to the larger scientific community.
Unraveling the interplay between gene and environment is fundamental to the understanding of many cancers and other complex diseases, however, this problem is particularly challenging when the causal genetic variants occur infrequently in population. We propose to develop statistical methods to identify interactions of rare genetic variants with environmental factors in causing disease and apply them to several cancer datasets. Successful implementation and application of these methods will contribute greatly to uncovering the role and interplay of rare variants and environmental factors in the etiology of cancers and other complex diseases, which can be of potential value in aiding the prevention and treatments of those diseases.