Risks of complex diseases, such as cancers, hypertension, diabetes, and schizophrenia, are determined by both genetic and environmental factors. Advances in human genome research have thus led to epidemiologic investigations not only of the effects of genes alone, but also of their effects in combination with environmen- tal exposures. The case-control study design, which has been widely used in classical questionnaire-based epidemiologic studies, is now commonly employed to study the role of genes and gene-environment interac- tions in the etiology of complex diseases. Recently, a broad class of semiparametric retrospective-likelihood methods has been developed for the analysis of case-control genetic data in the presence of environmental factors. These methods exploit knowledge about the distribution of genetic variants in order to build esti- mators that are much more statistically efficient than other approaches, and are also statistically valid in the presence of incomplete genetic data, such as missing marker alleles and unknown haplotypes. Because this kind of methodology is not available in any commercial software, researchers have resorted to standard approaches, which lack statistical efficiency and sometimes validity. As a result, important gene-environment interactions are obscured, as are important main effects. The goal of this project is to develop Stata software to implement the semiparametric retrospective-likelihood and related methods. The software will accom- modate missing genotypes, phase ambiguity, untyped markers, flexible disease-risk models with gene-gene and gene-environment interactions, genomewide association studies, population stratification, and models both with and without Hardy-Weinberg equilibrium. This tool will be highly useful to epidemiologists and geneticists in their search for genetic and environmental determinants of complex diseases.
Risks of complex diseases, such as cancers, hypertension, diabetes, and schizophrenia, are determined by both genetic and environmental factors. Advances in human genome research have thus led to epidemiologic investigations not only of the effects of genes alone, but also of their effects in combination with environmental exposures. This project will implement novel and efficient statistical methods for the analysis of case-control genetic data in the presence of environmental factors, and thus bring into the mainstream better ways of detecting genetic effects and gene-environment interactions.