Several genome-wide association studies (GWAS) have been published on various complex diseases, where genotype data on a large number of single nucleotide polymorphisms (SNPs) are collected to study the association between these SNPs and a disease. Although new loci are found to be associated with different diseases in these GWAS, they generally explain very little of the genetic risk for these diseases. Much of the remaining trait variation is likely to be due to the combined effect of genes, environmental factors, and their interactions. How- ever, most investigators conducting genome-wide association studies do not consider gene-environment (GxE) or gene-gene (GxG) interactions in their search for new genes. Moreover, most of these studies are cross-sectional. Complex diseases are frequently dynamic, varying over time with changing or accumulating environmental and physiological factors. The influence of genes on these diseases may also vary over time through interaction with factors such as age, developmental stage or other time-dependent environmental factors. Variation in the effects of genetic variants at different stages of life could significantly alter the trajectories of traits. Hence, studies that do not consider te possibility of longitudinal variation in genetic associations may lead to over-simplistic models of variant effects and hence lack power to detect them. This is in part due to a current lack of efficient statistical methods and corresponding software to detect the interplay of high-volume genetic data and time-dependent environmental factors. The purpose of this proposal responds to this urgent need by developing advanced statistical methods and efficient computing algorithms to analyze high-throughput data from gene-environment longitudinal studies with data on unrelated individuals as well as families. We propose to develop two efficient methods to detect GxE interactions in longitudinal studies. They are as follows: (1) to develop techniques for robust and efficient estimation of GxE interactions in longitudinal study designs using a likelihood-based dimension reduction approach;(2) to develop a powerful random-effect model for high-dimensional data to detect joint-effects of multiple SNPs and time-dependent environmental factors. The proposed methods are motivated by and to be applied to the Minnesota Center for Twin and Family Research (MCTFR) data, a longitudinal genome-wide study on genes and environments and their interactions with different behavioral traits. We intend to study the etiological underpinnings of substance use disorders (SUDs) derived from various interacting biological and psychosocial factors that work together dynamically over the course of development. Open access user-friendly statistical software will be developed and distributed.

Public Health Relevance

We propose powerful statistical approaches for detection of genome-wide gene-environment interactions using longitudinal data from families and unrelated individuals. Our proposed approaches will provide insights into the complex interplay between genes and environmental factors in the development of a disease. We will implement these proposed approaches on Minnesota Center for Twin and Family Research (MCTFR) dataset, a longitudinal prospective study of families to characterize the nature of gene-environment interplay in the development of substance use disorders.

National Institute of Health (NIH)
National Institute on Drug Abuse (NIDA)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Weinberg, Naimah Z
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Minnesota Twin Cities
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Basu, Saonli; Zhang, Yiwei; Ray, Debashree et al. (2013) A rapid gene-based genome-wide association test with multivariate traits. Hum Hered 76:53-63