The conduct of genome-wide association studies involving hundreds of thousands of Single Nucleotide Polymorphisms (SNPs) requires both innovative study design and statistical analysis. The objective of this application is to develop statistical methods and computationally efficient algorithms which best utilize the diverse data resources and strengths of different study designs. For many association studies interest will not just be limited to the characterization of individual SNPs or haplotypes that are associated with a disease outcome, but importantly will include the identification of interactions either between SNPs within a gene as in haplotype effect, between genes (epistasis), or between gene and environment such as drugs, smoking, and alcohol consumption.
The first aim of this application involves the investigation of situations, including designs, where it is possible to identify different types of interactions as well construct predictive models based on several single SNPs or haplotypes. The proposed statistical methods will use stage-wise or regularization strategies to carefully control for statistical over-fitting in the context of high-dimensional SNP data. It is also important to recognize that study designs play a critical role in this setting. Two common study designs for association studies are population-based case-control and family-based designs. The population-based case-control design is popular because it is cost-effective, but it can be sensitive to population stratification. Family-based studies using family members as controls are more robust and allow for the evaluation of maternal or parent-of-origin effects on the disease. However they could potentially be inefficient due to over-matching in genotypes. Sampling ascertainment biases could also substantially complicate the analysis. For these reasons, conducting hybrid association studies using both designs can strengthen the power for detecting disease associated SNPs.
The second aim of this application is to develop unified statistical estimation and inference procedures for combining resources, taking into account different ascertainment schemes and potential bias due to population stratification. Particularly we focus on the methods that can be easily adapted for high-dimensional SNP data by exploiting the computational techniques developed in the first aim. ? ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01CA125489-03
Application #
7487538
Study Section
Special Emphasis Panel (ZHG1-HGR-P (J1))
Program Officer
Wang, Wendy
Project Start
2006-09-13
Project End
2009-07-31
Budget Start
2008-08-20
Budget End
2009-07-31
Support Year
3
Fiscal Year
2008
Total Cost
$266,332
Indirect Cost
Name
Fred Hutchinson Cancer Research Center
Department
Type
DUNS #
078200995
City
Seattle
State
WA
Country
United States
Zip Code
98109
Kooperberg, Charles; LeBlanc, Michael; Obenchain, Valerie (2010) Risk prediction using genome-wide association studies. Genet Epidemiol 34:643-52
Dai, James Y; Leblanc, Michael; Smith, Nicholas L et al. (2009) SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association. Biostatistics 10:680-93
Kooperberg, Charles; Leblanc, Michael; Dai, James Y et al. (2009) Structures and Assumptions: Strategies to Harness Gene × Gene and Gene × Environment Interactions in GWAS. Stat Sci 24:472-488
Dai, James Y; LeBlanc, Michael; Kooperberg, Charles (2009) Semiparametric estimation exploiting covariate independence in two-phase randomized trials. Biometrics 65:178-87
LeBlanc, Michael; Kooperberg, Charles (2009) Adaptively weighted association statistics. Genet Epidemiol 33:442-52
Kooperberg, Charles; Leblanc, Michael (2008) Increasing the power of identifying gene x gene interactions in genome-wide association studies. Genet Epidemiol 32:255-63