A key component to preventing cancer is uncovering the genetics behind various cancers and the complex traits and diseases that lead to cancer. To uncover the genetic etiology for cancers and other complex diseases or traits, it is necessary to use methods that jointly consider multiple genetic components underlying the disease. Genome wide association (GWA) studies use methods to scan the genome looking for possible genetic associations with disease risk. However, many GWA studies perform the analysis using a univariate approach - treating each genetic marker as independent. Recently, methods for simultaneous significance testing and multivariate hierarchical models have started to consider multiple genes simultaneously, rather than univariately. While considering markers simultaneously, these methods restrict themselves to the assumption that when scanning the genome, the number of genes detected will be very small compared to the number of genes investigated. In response, we propose to develop novel, more powerful tools that use Bayesian model averaging methods to include genetic structure in the models, while simultaneously searching for genes in a complex disease, such as lung cancer, on a genome wide scale. Such models that include biological information can increase the power to detect small contributors to risk for complex diseases, and can still include sparsity information that controls for false positives. Recently, we completed a methodological study showing that Bayesian model averaging performs better than standard selection techniques using multivariate logistic regression in a hypothesis driven or candidate gene type approach. The central theme of this proposal is to develop Bayesian model averaging methods that incorporate genetic structure inherent to markers used in GWA studies that can also search through the immense number of markers available for GWA studies. We propose to develop fast Markov chain Monte Carlo algorithms for Bayesian model averaging techniques. We will calibrate the newly developed statistical techniques using simulation studies, and apply the new and calibrated methods to perform a GWA study of lung cancer using data already available at M. D. Anderson Cancer Center. The significance of this proposal is to develop new methods of performing GWA studies that will incorporate available biological information that can increase power and control false positives to detect genetic factors contributing to cancer.

Public Health Relevance

A full understanding of cancer prevention depends on reliable research to identify genetic risk factors that contribute to cancer. Developing a powerful method for detecting genes in a genome wide association scan of cancers that incorporates previous knowledge can provide valuable inferences to enrich cancer prevention research, facilitating advances in clinical and genetic counseling.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Small Research Grants (R03)
Project #
1R03CA141998-01
Application #
7751499
Study Section
Special Emphasis Panel (ZCA1-SRLB-F (M1))
Program Officer
Wang, Wendy
Project Start
2009-08-01
Project End
2011-07-30
Budget Start
2009-08-01
Budget End
2010-07-30
Support Year
1
Fiscal Year
2009
Total Cost
$77,000
Indirect Cost
Name
University of Texas MD Anderson Cancer Center
Department
Type
Schools of Medicine
DUNS #
800772139
City
Houston
State
TX
Country
United States
Zip Code
77030
Stingo, Francesco C; Swartz, Michael D; Vannucci, Marina (2015) A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data. Stat Interface 8:137-151
Swartz, Michael D; Peterson, Christine B; Lupo, Philip J et al. (2013) Investigating multiple candidate genes and nutrients in the folate metabolism pathway to detect genetic and nutritional risk factors for lung cancer. PLoS One 8:e53475
Swartz, M D; Peng, B; Reyes-Gibby, C et al. (2011) Using Ascertainment for Targeted Resequencing to Increase Power to Identify Causal Variants. Stat Interface 4:285-294