It is now well established that many genes influence the risk of cancer. For major genes known to affect risk, an important task is to determine the risks conferred by individual variants. Geneticists consider variants to confer risk if they have been shown to segregate with disease in families, but increasingly the evidence will accrue from population-based association studies, where empirical evidence is obtained on the basis of case and control frequencies for all observed variants, many of which will necessarily occur very infrequently, perhaps only once, in the study. Furthermore, many of these variants will not have been observed in previous cancer-prone families. Hierarchical modeling offers a natural strategy to leverage the collective evidence from these rare variants with sparse data. This can be accomplished when the variants can be effectively grouped on the basis of higher- level covariates that characterize the functional properties of the variants that are relevant to risk prediction. In this application we propose to study in detail the properties of available hierarchical modeling techniques for this purpose, and suitable modifications of these techniques, with a view to establishing valid analytic strategies for obtaining relative risk estimates for rare variants. We will use simulations to evaluate the small sample properties of pseudo-likelihood estimation of the relative risks of rare variants from a hierarchical model. The simulations will address bias and cover- age probabilities of the individual estimators, their relative efficiency compared to ordinary logistic regression, the influence of the predictiveness of the higher-level covariates, the impact of model misspecification, the influence of sample size, the impact of missing data on higher-level covariates, and the use of explained variation as a measure of extent to which the higher-level covariates explain the risk variation. We will also examine the asymptotic properties of pseudo-likelihood estimation under various assumptions: a correctly specified hierarchical model;an incorrectly specified hierarchical model;and a setting in which the number of variants is allowed to increase indefinitely, but data on the individual variants remains sparse. These investigations address distinct questions of practical importance in the design and analysis of association (case-control) studies of major cancer genes.

Public Health Relevance

Many major genes have been identified that strongly in0uence the risk of cancer. However, there are typically many different mutations in the gene, each of which may or may not confer increased risk. It is critical to identify which genetic mutations are harmful, and which ones are harmless, so that individuals who learn from genetic testing that they have a mutation can be appropriately counseled. This is a challenging task, since new mutations are continually being identified, and there is typically relatively little evidence available about each individual mutation. In this proposal we plan to examine new statistical techniques that have the potential to identify the mutations that are harmful with much greater accuracy. The research will involve hierarchical statistical modeling, a technique that aggregates the evidence about lots of rare mutations to increase the ability to predict the effects of each mutation individually.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA131010-03
Application #
7894393
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Divi, Rao L
Project Start
2008-08-12
Project End
2012-07-30
Budget Start
2010-08-01
Budget End
2012-07-30
Support Year
3
Fiscal Year
2010
Total Cost
$314,736
Indirect Cost
Name
Sloan-Kettering Institute for Cancer Research
Department
Type
DUNS #
064931884
City
New York
State
NY
Country
United States
Zip Code
10065
Capanu, Marinela; Gönen, Mithat; Begg, Colin B (2013) An assessment of estimation methods for generalized linear mixed models with binary outcomes. Stat Med 32:4550-66
Tischkowitz, Marc; Capanu, Marinela; Sabbaghian, Nelly et al. (2012) Rare germline mutations in PALB2 and breast cancer risk: a population-based study. Hum Mutat 33:674-80
Mukherjee, Bhramar; Delancey, John Oliver; Raskin, Leon et al. (2012) Risk of non-melanoma cancers in first-degree relatives of CDKN2A mutation carriers. J Natl Cancer Inst 104:953-6
Begg, Colin B; Zabor, Emily C (2012) Detecting and exploiting etiologic heterogeneity in epidemiologic studies. Am J Epidemiol 176:512-8
Begg, Colin B (2011) A strategy for distinguishing optimal cancer subtypes. Int J Cancer 129:931-7
Capanu, Marinela; Concannon, Patrick; Haile, Robert W et al. (2011) Assessment of rare BRCA1 and BRCA2 variants of unknown significance using hierarchical modeling. Genet Epidemiol 35:389-97
Capanu, Marinela; Begg, Colin B (2011) Hierarchical modeling for estimating relative risks of rare genetic variants: properties of the pseudo-likelihood method. Biometrics 67:371-80
Malone, Kathleen E; Begg, Colin B; Haile, Robert W et al. (2010) Population-based study of the risk of second primary contralateral breast cancer associated with carrying a mutation in BRCA1 or BRCA2. J Clin Oncol 28:2404-10
Borg, Ake; Haile, Robert W; Malone, Kathleen E et al. (2010) Characterization of BRCA1 and BRCA2 deleterious mutations and variants of unknown clinical significance in unilateral and bilateral breast cancer: the WECARE study. Hum Mutat 31:E1200-40
Kuligina, Ekatherina; Reiner, Anne; Imyanitov, Evgeny N et al. (2010) Evaluating cancer epidemiologic risk factors using multiple primary malignancies. Epidemiology 21:366-72

Showing the most recent 10 out of 11 publications