We are in the process of developing methods for using exposures measured in pooled specimens from several individuals, together with genotypes measured separately on each individual, to study gene-environment interactions. Suppose one has case-control study and genotyped each individual at a panel of SNPs (single nucleotide polymorphisms). Suppose that one also has biological specimens (e.g., serum or urine) from the same individuals but lacks the budget to assay each individual specimen for an exposure of interest. Pooling specimens and assaying the resulting pooled specimens will not only save assay costs but preserve specimen volume for future uses. In the past, we have developed methods for analyzing case-control studies with exposures measured in pooled specimens. Those methods assume, reasonably, that the measured value on the pooled specimen is the average of the values for the individual specimens. With those methods, testing gene-environment interactions at a single SNP required creating specimen pools within strata of individuals who all had the same genotype for that SNP. To study gene-environment interactions for a panel of SNPs, our previous methods would require creating new pooled specimens for each SNP studied and the potential savings in assay costs would disappear. The approach that we are developing regards the individual measurements as missing data and uses the pooled specimens in a principled way to impute those missing data. With a give set of imputed data in hand, we can use standard statistical methods for case-control data to estimate gene-environment interactions. In practice, we use a multiple-imputation approach: creating multiple sets of imputed data, doing a case-control analysis for each set, and combining the results from the multiple analyses. This approach has shown some promise but some problems remain to be resolved. Work on this problem is ongoing. Identification of causative SNPs in a genome-wide study can be challenging when individual SNPs have small marginal effects because testing thresholds must reflect the large number of SNPs under study. For complex diseases, particular combinations of SNPs may dramatically increase risk a kind of epistasis or gene-gene interaction. We are currently investigating the use of a machine learning technique for the discovery of sets of SNPs that together cause disease (causative SNPs) in case-parents data. First, we devised a way to use actual case-parent triad genotypes to create simulated genome-wide data sets that reflect realistic linkage disequilibrium structure and are seeded with known sets of causative SNPs. We are currently working to better characterize the genetic properties of populations simulated in this way. Second, we implemented an existing stochastic search algorithm (called GA-KNN) that is based on an evolutionary algorithm to find multiple sets of k SNPs that are predictive of disease (here k is a small number, say 2 or 4). By cataloguing those SNPs which appear most frequently among the sets that are predictive of disease, we hope to uncover the sets of causative SNPS. In preliminary trials on simulated data seeded with two interacting sets of four SNPs each, our approach shows promise. In ongoing work, we are attempting to speed up the algorithm and to see whether the promising performance is maintained in more complex situations. (see also Z01 ES040007; PI Clare Weinberg; Min Shi is also a within-lab collaborator on this project; her time is allocated in Weinberg's project but not in this one.)

Project Start
Project End
Budget Start
Budget End
Support Year
21
Fiscal Year
2016
Total Cost
Indirect Cost
Name
U.S. National Inst of Environ Hlth Scis
Department
Type
DUNS #
City
State
Country
Zip Code
Shi, Min; Umbach, David M; Weinberg, Clarice R (2015) Using parental phenotypes in case-parent studies. Front Genet 6:221
Shi, Min; Umbach, David M; Weinberg, Clarice R (2014) Disentangling pooled triad genotypes for association studies. Ann Hum Genet 78:345-56
Weinberg, Clarice R; Shi, Min; DeRoo, Lisa A et al. (2014) Asymmetry in family history implicates nonstandard genetic mechanisms: application to the genetics of breast cancer. PLoS Genet 10:e1004174
Deroo, Lisa A; Bolick, Sophia C E; Xu, Zongli et al. (2014) Global DNA methylation and one-carbon metabolism gene polymorphisms and the risk of breast cancer in the Sister Study. Carcinogenesis 35:333-8
Shi, Min; Umbach, David M; Weinberg, Clarice R (2013) Case-sibling studies that acknowledge unstudied parents and permit the inclusion of unmatched individuals. Int J Epidemiol 42:298-307
Weinberg, Clarice R; Shi, Min; Umbach, David M (2011) A sibling-augmented case-only approach for assessing multiplicative gene-environment interactions. Am J Epidemiol 174:1183-9
Weinberg, Clarice R; Shi, Min; Umbach, David M (2011) Re.: ""Genetic association and gene-environment interaction: a new method for overcoming the lack of exposure information in controls"". Am J Epidemiol 173:1346-7; author reply 1347-8
Shi, Min; Umbach, David M; Weinberg, Clarice R (2011) Family-based gene-by-environment interaction studies: revelations and remedies. Epidemiology 22:400-7
Shi, Min; Umbach, David M; Weinberg, Clarice R (2010) Testing haplotype-environment interactions using case-parent triads. Hum Hered 70:23-33
Vermeulen, Sita H; Shi, Min; Weinberg, Clarice R et al. (2009) A hybrid design: case-parent triads supplemented by control-mother dyads. Genet Epidemiol 33:136-44

Showing the most recent 10 out of 11 publications