The fundamental premise of this proposal is that cancer types based on anatomic site may contain sub-types that are etiologically distinct. Indeed a lot of evidence for this has emerged in recent years. The goal of the proposal is to develop a strategy for optimally identifying such etiologically distinct tumor sub-types, and to develop the statistical techniques needed to accomplish this. In addition to clarifying cancer etiology, such an approach offers the promise of a more powerful strategy for detecting new risk factors, by focusing studies to discover these new risk factors on the sub-types that possess distinct etiology. Our research plan is motivated by a crucial new result regarding the occurrence of double primary malignancies. We show that the odds ratio linking tumor sub-types of pairs of independently occurring cancers is directly related to the underlying population risk heterogeneity of the sub-types. Consequently data from studies of double primaries can be used to determine optimal tumor sub-classification from an etiologic perspective. In this proposal we build upon this result to develop multivariate clustering techniques that optimize the etiologic heterogeneity of the resulting clusters (Aim 1). We will develop analogous techniques for creating sub-types that maximize the degree of etiologic heterogeneity on the basis of known risk factors for use in settings where data on multiple primary cancers are unavailable or unobtainable (Aim 2). We will determine the implications of the use of sub-typing as a strategy for detecting new risk factors from the perspective of statistical power (Aim 3). Finally, we will develop freely-available software to allow other investigators easy access to the methods that we develop (Aim 4). The research will lead ultimately to a conceptual framework for investigating etiologic heterogeneity, and a suite of statistical tools for conducting the dat analyses.

Public Health Relevance

Our research plan has the potential to change the landscape of how cancer epidemiologic investigations are conducted, by focusing on etiologic heterogeneity as a tool for improving the efficiency and statistical power of cancer epidemiologic investigations. As such, it can lead to greater speed in the discovery of factors affecting cancer risk.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sloan-Kettering Institute for Cancer Research
New York
United States
Zip Code
Begg, Colin B; Ostrovnaya, Irina; Carniello, Jose V Scarpa et al. (2016) Clonal relationships between lobular carcinoma in situ and other breast malignancies. Breast Cancer Res 18:66
Ostrovnaya, Irina; Seshan, Venkatraman E; Begg, Colin B (2015) USING SOMATIC MUTATION DATA TO TEST TUMORS FOR CLONAL RELATEDNESS. Ann Appl Stat 9:1533-1548
Begg, Colin B; Orlow, Irene; Zabor, Emily C et al. (2015) Identifying Etiologically Distinct Sub-Types of Cancer: A Demonstration Project Involving Breast Cancer. Cancer Med 4:1432-9
Ogino, Shuji; Campbell, Peter T; Nishihara, Reiko et al. (2015) Proceedings of the second international molecular pathological epidemiology (MPE) meeting. Cancer Causes Control 26:959-72
Begg, Colin B; Seshan, Venkatraman E; Zabor, Emily C et al. (2014) Genomic investigation of etiologic heterogeneity: methodologic challenges. BMC Med Res Methodol 14:138
Begg, Colin B; Zabor, Emily C; Bernstein, Jonine L et al. (2013) A conceptual and methodological framework for investigating etiologic heterogeneity. Stat Med 32:5039-52
Seshan, Venkatraman E; Gonen, Mithat; Begg, Colin B (2013) Comparing ROC curves derived from regression models. Stat Med 32:1483-93
Begg, Colin B; Gonen, Mithat; Seshan, Venkatraman E (2013) Testing the incremental predictive accuracy of new markers. Clin Trials 10:690-2
Begg, Colin B; Zabor, Emily C (2012) Detecting and exploiting etiologic heterogeneity in epidemiologic studies. Am J Epidemiol 176:512-8
Begg, Colin B; Pike, Malcolm C (2012) Comment on ""the predictive capacity of personal genome sequencing"". Sci Transl Med 4:135le3; author reply 135lr3

Showing the most recent 10 out of 11 publications