This renewal application proposes to carry out a Program Project of statistical methods research to address gaps and barriers arising in the analysis of large and complex data from observational studies in cancer research. The ultimate goal of the Program is to use rich data sources to develop effective strategies for reducing cancer burden in the U.S. and improving longevity and quality of life. This Program Project comprises three research projects and two cores. The three integrated projects jointly address the statistical needs for three research priority areas identified by the Division of Cancer Contro and Population Science of National Cancer Institute: Health Disparities;Comparative Effectiveness Research;and Public Health Genomics. In Project 1, we will develop statistical methods to overcome common data limitations for the investigation of social and racial disparities spanning the cancer continuum. We will analyze data from the SEER database that is linked with data from the National Longitudinal Mortality Survey (NLMS). In Project 2, we will develop methods for comparative effectiveness research (CER) in cancer using large observational data. We will use the SEER-Medicare data and the CaPSURE cohort to emulate complex randomized trials to compare the effectiveness of personalized strategies for cancer diagnosis and dynamic strategies for cancer treatment. In Project 3, we will develop statistical methods for analysis of next generation sequencing data in genetic cancer epidemiological studies. The proposed research in Project 3 is motivated by and applied to the Harvard lung cancer and breast cancer exome and targeted sequencing studies as well as the affiliated Genome-Wide Association Studies. The Administrative Core will coordinate the overall scientific direction and programmatic activities of the Program, which will include regular P01 meetings, seminars, the annual retreat, the external advisory committee meeting, short courses, a visitor program, dissemination of research results. The Statistical Computing Core will allow access to Harvard largest high performance computing cluster, perform data management, and ensure the development and dissemination of open access, high quality software. The Program PIs, Professors Xihong Lin and Francesca Dominici, are renowned biostatisticians with strong track records of methodological and collaborative research and academic administration.

Public Health Relevance

This research Program aims to develop innovative and practical statistical tools for the analysis of large and complex observational data to study social disparities in cancer, comparative effectiveness of cancer diagnosis and treatment, and cancer risk assessment and prediction, prevention, and progression using genetic profiles and environmental/behavior/social exposures.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-RPRB-2 (M1))
Program Officer
Mariotto, Angela B
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Bind, M-A C; Vanderweele, T J; Coull, B A et al. (2016) Causal mediation analysis for longitudinal data with exogenous exposure. Biostatistics 17:122-34
Hernán, Miguel A; Robins, James M (2016) Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am J Epidemiol :
Chen, Jun; Just, Allan C; Schwartz, Joel et al. (2016) CpGFilter: model-based CpG probe filtering with replicates for epigenome-wide association studies. Bioinformatics 32:469-71
Lin, Xinyi; Lee, Seunggeun; Wu, Michael C et al. (2016) Test for rare variants by environment interactions in sequencing association studies. Biometrics 72:156-64
Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A et al. (2016) Multivariate Bayesian variable selection exploiting dependence structure among outcomes: Application to air pollution effects on DNA methylation. Biometrics :
Yung, Godwin; Lin, Xihong (2016) Validity of using ad hoc methods to analyze secondary traits in case-control association studies. Genet Epidemiol 40:732-743
Arvold, Nils D; Cefalu, Matthew; Wang, Yun et al. (2016) Comparative effectiveness of radiotherapy with vs. without temozolomide in older patients with glioblastoma. J Neurooncol :
Wasfy, Jason H; Dominici, Francesca; Yeh, Robert W (2016) Letter by Wasfy et al Regarding Article, "Facility Level Variation in Hospitalization, Mortality, and Costs in the 30 Days After Percutaneous Coronary Intervention: Insights on Short-Term Healthcare Value From the Veterans Affairs Clinical Assessment, Rep Circulation 133:e376
Carere, Deanna Alexis; Kraft, Peter; Kaphingst, Kimberly A et al. (2016) Consumers report lower confidence in their genetics knowledge following direct-to-consumer personal genomic testing. Genet Med 18:65-72
Zigler, Corwin Matthew (2016) The Central Role of Bayes' Theorem for Joint Estimation of Causal Effects and Propensity Scores. Am Stat 70:47-54

Showing the most recent 10 out of 136 publications