A major challenge in cancer epidemiologic studies, especially those of rare cancers, is observing enough cases. To address this issue, researchers often pool multiple studies (or cohorts) to achieve large sample sizes, allowing them for increased power to study complex hypotheses. Combining studies, however, renders it difficult to analyze the pooled data in the presence of heterogeneity. A simple pooled analysis, which increases statistical power for detecting risk factors with homogenous effects, can misrepresent and obscure heterogeneous effects. Statistical solutions to this problem are limited and addressed mainly by two-stage methods that combine study-specific estimates using fixed- or random-effects models. Moreover, when a large number of risk factors are under investigation, in addition to identifying important ones, it is important to distinguish predictors with homogeneous versus heterogeneous effects. Knowing this structure can provide insight into disease etiology and have important implications for developing and evaluating cancer risk models using the pooled study strategy. However, statistical tests for detecting heterogeneity generally are of low power and not amenable to handle multivariate or high- dimensional risk factors. In this project, motivated by a collaborative nested case-control (NCC) study of ovarian cancer between the New York University Women Health Study (NYUWHS), the Northern Sweden Health and Disease Study (NSHDS), and the Italian Hormones and Diet in the Etiology of Cancer Study (ORDET), we will investigate the novel use of penalty regularization ideas to handle heterogeneity in the context of pooled NCC studies. We propose the following Specific Aims: (1) to develop an adaptive L1/Lq penalty regularized partial likelihood approach to integrating information from multiple NCC studies to identify important predictors, (2) to develop an adaptive L1 + L1/Lq penalty regularized partial likelihood approach to discovering the homogeneous and heterogeneous structure of predictors in pooled NCC studies, and (3) to translate the proposed procedures into practical knowledge and accessible software. As more and more research is conducted through collaborations from multiple studies, cohorts and centers, novel statistical methodology for integrating information across multiple studies is imperative. The proposed project will yield new statistical methodologies, which are theoretically sound and empirically effective, to con- duct pooled analysis, develop cancer risk models using the pooled study strategy, and evaluate existing models readily across multiple populations. Furthermore, the newly developed statistical methodology will be integrated into open-source software, providing practitioners with effective tools to analyze pooled studies. The developed methods will be applicable to many pooled studies, and lead to identify new risk factors related to cancers and a better understanding of the heterogeneity of effects for some cancer risk factors.

Public Health Relevance

With increasing collaborative efforts towards pooling resources for cancer research, statistical methodologies to support and strengthen these collaborations have significant values. This project aims to develop novel and effective statistica approaches to cancer model development and evaluation using the pooled study strategy. The application of newly developed analytical and computational tools to pooled cancer studies will further our understanding of effects of cancer risk factors and their potential heterogeneity across populations.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Epidemiology of Cancer Study Section (EPIC)
Program Officer
Divi, Rao L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
New York University
Public Health & Prev Medicine
Schools of Medicine
New York
United States
Zip Code
Kang, Suhyun; Lu, Wenbin; Liu, Mengling (2017) Efficient estimation for accelerated failure time model under case-cohort and nested case-control sampling. Biometrics 73:114-123
Sit, Tony; Liu, Mengling; Shnaidman, Michael et al. (2016) Design and analysis of clinical trials in the presence of delayed treatment effect. Stat Med 35:1774-9
Cheng, Xin; Lu, Wenbin; Liu, Mengling (2015) Identification of homogeneous and heterogeneous variables in pooled cohort studies. Biometrics 71:397-403
Lu, Wenbin; Liu, Mengling; Chen, Yi-Hau (2014) Testing goodness-of-fit for the proportional hazards model based on nested case-control data. Biometrics 70:845-51
Liu, Mengling; Lu, Wenbin; Krogh, Vittorio et al. (2013) Estimation and selection of complex covariate effects in pooled nested case-control studies with heterogeneity. Biostatistics 14:682-94
Shang, Shulian; Liu, Mengling; Zeleniuch-Jacquotte, Anne et al. (2013) Partially Linear Single Index Cox Regression Model in Nested Case-Control Studies. Comput Stat Data Anal 67:199-212