This renewal application proposes to carry out a Program Project of statistical methods research to address gaps and barriers arising in the analysis of large and complex data from observational studies in cancer research. The ultimate goal of the Program is to use rich data sources to develop effective strategies for reducing cancer burden in the U.S. and improving longevity and quality of life. This Program Project comprises three research projects and two cores. The three integrated projects jointly address the statistical needs for three research priority areas identified by the Division of Cancer Contro and Population Science of National Cancer Institute: Health Disparities;Comparative Effectiveness Research;and Public Health Genomics. In Project 1, we will develop statistical methods to overcome common data limitations for the investigation of social and racial disparities spanning the cancer continuum. We will analyze data from the SEER database that is linked with data from the National Longitudinal Mortality Survey (NLMS). In Project 2, we will develop methods for comparative effectiveness research (CER) in cancer using large observational data. We will use the SEER-Medicare data and the CaPSURE cohort to emulate complex randomized trials to compare the effectiveness of personalized strategies for cancer diagnosis and dynamic strategies for cancer treatment. In Project 3, we will develop statistical methods for analysis of next generation sequencing data in genetic cancer epidemiological studies. The proposed research in Project 3 is motivated by and applied to the Harvard lung cancer and breast cancer exome and targeted sequencing studies as well as the affiliated Genome-Wide Association Studies. The Administrative Core will coordinate the overall scientific direction and programmatic activities of the Program, which will include regular P01 meetings, seminars, the annual retreat, the external advisory committee meeting, short courses, a visitor program, dissemination of research results. The Statistical Computing Core will allow access to Harvard largest high performance computing cluster, perform data management, and ensure the development and dissemination of open access, high quality software. The Program PIs, Professors Xihong Lin and Francesca Dominici, are renowned biostatisticians with strong track records of methodological and collaborative research and academic administration.

Public Health Relevance

This research Program aims to develop innovative and practical statistical tools for the analysis of large and complex observational data to study social disparities in cancer, comparative effectiveness of cancer diagnosis and treatment, and cancer risk assessment and prediction, prevention, and progression using genetic profiles and environmental/behavior/social exposures.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-RPRB-2 (M1))
Program Officer
Mariotto, Angela B
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Bobb, Jennifer F; Claus Henn, Birgit; Valeri, Linda et al. (2018) Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health 17:67
Chen, Han; Cade, Brian E; Gleason, Kevin J et al. (2018) Multiethnic Meta-Analysis Identifies RAI1 as a Possible Obstructive Sleep Apnea-related Quantitative Trait Locus in Men. Am J Respir Cell Mol Biol 58:391-401
Pierce, Brandon L; Kraft, Peter; Zhang, Chenan (2018) Mendelian randomization studies of cancer risk: a literature review. Curr Epidemiol Rep 5:184-196
Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433
Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175
Emilsson, Louise; García-Albéniz, Xabier; Logan, Roger W et al. (2018) Examining Bias in Studies of Statin Treatment and Survival in Patients With Cancer. JAMA Oncol 4:63-70
Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662
Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan et al. (2018) Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics :
Wilson, Ander; Zigler, Corwin M; Patel, Chirag J et al. (2018) Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics 74:1034-1044
García-Albéniz, Xabier; Hsu, John; Hernán, Miguel A (2017) The value of explicitly emulating a target trial when using real world evidence: an application to colorectal cancer screening. Eur J Epidemiol 32:495-500

Showing the most recent 10 out of 192 publications