Secondary Analysis of Longitudinal Trait in Genome Wide Association Studies Abstract: Relating genome wide association study (GWAS) data to longitudinal phenotype data can provide special advantages, but also represents certain challenges. This is particularly challenging in secondary data analyses from case-control studies, as commonly found for GWAS investigations. Secondary data analyses using existing case-control GWAS data yield an effective and practical solution for genetic investigations of longitudinal traits that afford the opportunity to examine disease heterogeneity over time and early disease detection. None of the current secondary methodologies works with longitudinal data. The primary goal of this proposal is to investigate and develop statistical inference methods to analyze longitudinal secondary traits using case-control GWAS data. This project is motivated by research problems arising from the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO), in which Cancer Antigen 125 (CA125) and Prostate Specific Antigen (PSA) levels were measured at baseline (T0) and then annually for five years in women and men, respectively, and BMI values were measured or reported at three time points (age 20, 50 and enrollment). The Total Genotype Set developed recently at the NCI for GWAS, includes genotype data for subjects from various case-control cancer studies within PLCO. This aggregated genotype dataset provides a unique opportunity to evaluate the inherited determinants of these serum antigens and their serial time trajectories, as secondary analyses.
Our specific aims are: 1) Develop statistical approaches for secondary analysis of longitudinal traits in GWAS. We propose to integrate the mixed-effects model, commonly used for longitudinal analysis, with weighted likelihood and retrospective likelihood methods, two theoretically justified secondary analysis methods for a single time trait, to develop robust and efficient approaches for secondary data analysis of longitudinal traits. 2) Translate the proposed statistical methodology into practical research knowledge and software. We will apply the proposed methods generated in Aim 1 to the PLCO BMI, CA125 and PSA GWAS data to evaluate the inherited determinants of these antigens and their serial time trajectories. We will also develop, distribute and support freely available software packages for the methods used in this proposal. Considering the immense amount of recently generated genotype data, there is a tremendous need for the development of sophisticated secondary analysis statistical methods, as proposed herein. This study will greatly advance applied epidemiology research, by allowing investigators to incorporate longitudinal data analysis with genetic analysis, taking advantage of the large supply of recently generated case-control GWAS data.
We will develop secondary analysis statistical approaches which use the available case-control GWAS studies to investigate the association between genotype and longitudinal trait data. We will apply the proposed inference methods to the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial BMI, CA125 and PSA GWAS data to evaluate the inherited determinants of these antigens and their kinetics;and also develop, distribute and support freely available software packages for the methods used in this proposal.
Zhang, Yilong; Han, Sung Won; Cox, Laura M et al. (2017) A multivariate distance-based analytic framework for microbial interdependence association test in longitudinal study. Genet Epidemiol 41:769-778 |
Wu, Jing; Peters, Brandilyn A; Dominianni, Christine et al. (2016) Cigarette smoking and the oral microbiome in a large study of American adults. ISME J 10:2435-46 |
Li, Huilin; Chen, Jinbo (2016) Efficient unified rare variant association test by modeling the population genetic distribution in case-control studies. Genet Epidemiol 40:579-590 |