A major challenge in cancer epidemiologic studies, especially those of rare cancers, is observing enough cases. To address this issue, researchers often pool multiple studies (or cohorts) to achieve large sample sizes, allowing them for increased power to study complex hypotheses. Combining studies, however, renders it difficult to analyze the pooled data in the presence of heterogeneity. A simple pooled analysis, which increases statistical power for detecting risk factors with homogenous effects, can misrepresent and obscure heterogeneous effects. Statistical solutions to this problem are limited and addressed mainly by two-stage methods that combine study-specific estimates using fixed- or random-effects models. Moreover, when a large number of risk factors are under investigation, in addition to identifying important ones, it is important to distinguish predictors with homogeneous versus heterogeneous effects. Knowing this structure can provide insight into disease etiology and have important implications for developing and evaluating cancer risk models using the pooled study strategy. However, statistical tests for detecting heterogeneity generally are of low power and not amenable to handle multivariate or high- dimensional risk factors. In this project, motivated by a collaborative nested case-control (NCC) study of ovarian cancer between the New York University Women Health Study (NYUWHS), the Northern Sweden Health and Disease Study (NSHDS), and the Italian Hormones and Diet in the Etiology of Cancer Study (ORDET), we will investigate the novel use of penalty regularization ideas to handle heterogeneity in the context of pooled NCC studies. We propose the following Specific Aims: (1) to develop an adaptive L1/Lq penalty regularized partial likelihood approach to integrating information from multiple NCC studies to identify important predictors, (2) to develop an adaptive L1 + L1/Lq penalty regularized partial likelihood approach to discovering the homogeneous and heterogeneous structure of predictors in pooled NCC studies, and (3) to translate the proposed procedures into practical knowledge and accessible software. As more and more research is conducted through collaborations from multiple studies, cohorts and centers, novel statistical methodology for integrating information across multiple studies is imperative. The proposed project will yield new statistical methodologies, which are theoretically sound and empirically effective, to con- duct pooled analysis, develop cancer risk models using the pooled study strategy, and evaluate existing models readily across multiple populations. Furthermore, the newly developed statistical methodology will be integrated into open-source software, providing practitioners with effective tools to analyze pooled studies. The developed methods will be applicable to many pooled studies, and lead to identify new risk factors related to cancers and a better understanding of the heterogeneity of effects for some cancer risk factors.
With increasing collaborative efforts towards pooling resources for cancer research, statistical methodologies to support and strengthen these collaborations have significant values. This project aims to develop novel and effective statistica approaches to cancer model development and evaluation using the pooled study strategy. The application of newly developed analytical and computational tools to pooled cancer studies will further our understanding of effects of cancer risk factors and their potential heterogeneity across populations.