We propose a Program Project, Statistical Informatics in Cancer Research, to tackle a series of problems motivated by the analysis of high dimensional data arising in population-based studies of cancer. This Program Project comprises three research projects and two cores. Project 1 focuses on spatio-temporal modeling of disease count data collected for administrative areas.
The specific aims are motivated by problems encountered in epidemiological studies designed to monitor and assess health disparities. Our proposed methods address issues associated with administrative boundaries changing over time, sparse disease counts, spatial confounding, and heavy computational burdens for large data sets. Methods will be applied to data on U.S. breast cancer incidence from three state cancer registries, Boston-area premature mortality, and NCI SEER data. Project 2 is also motivated by spatially-indexed data related to cancer incidence and mortality, but the emphasis is on population surveillance and spatial cluster detection. Three of the specific aims of Project 2 are motivated by the analysis of NCI SEER data and one from a case/control study designed to assess spatial clustering in childhood leukemia. This dataset also includes individual level data on several genetic biomarkers of susceptibility. One sub-aim of this project assesses gene-space interaction by studying whether disease clustering patterns differ according to genetic polymorphisms. Project 3 focuses on methods for the analysis of very high dimensional genomic and proteomic biomarkers. Extensions to spatially indexed genomic data are also considered in Project 3. All of the aims of the three projects are closely integrated with the motivating real world cancer studies in which the investigators are involved. The three projects link thematically through a focus on population-based, observational studies in cancer, as well as technically through the consideration of high-dimensional correlated data (arising from different sources) that require advanced statistical and computing methods. Several specific techniques (e.g. spatio-temporal modeling, penalized likelihoods, False Discovery Rates, hidden Markov models) are shared between two and in some cases all three projects. The two cores consist of an Administrative Core and a Statistical Computing Core. The Administrative Core will coordinate the overall scientific direction and programmatic activities of Program, which will include short courses, a visitor program, dissemination of research results, and an external advisory committee. A Statistical Computing Core will ensure the development and dissemination of open access, good quality, user friendly software designed to implement the statistical methods developed in the Research Projects, which is the final Specific Aim of each of the three projects. The Program Director and Co-Director, Professors Louise Ryan and Xihong Lin, respectively, are internationally known biostatisticians with strong track records of academic administration.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-RPRB-7 (M1))
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Bind, M-A C; Vanderweele, T J; Coull, B A et al. (2016) Causal mediation analysis for longitudinal data with exogenous exposure. Biostatistics 17:122-34
Hernán, Miguel A; Robins, James M (2016) Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. Am J Epidemiol :
Chen, Jun; Just, Allan C; Schwartz, Joel et al. (2016) CpGFilter: model-based CpG probe filtering with replicates for epigenome-wide association studies. Bioinformatics 32:469-71
Lin, Xinyi; Lee, Seunggeun; Wu, Michael C et al. (2016) Test for rare variants by environment interactions in sequencing association studies. Biometrics 72:156-64
Lee, Kyu Ha; Tadesse, Mahlet G; Baccarelli, Andrea A et al. (2016) Multivariate Bayesian variable selection exploiting dependence structure among outcomes: Application to air pollution effects on DNA methylation. Biometrics :
Yung, Godwin; Lin, Xihong (2016) Validity of using ad hoc methods to analyze secondary traits in case-control association studies. Genet Epidemiol 40:732-743
Arvold, Nils D; Cefalu, Matthew; Wang, Yun et al. (2016) Comparative effectiveness of radiotherapy with vs. without temozolomide in older patients with glioblastoma. J Neurooncol :
Wasfy, Jason H; Dominici, Francesca; Yeh, Robert W (2016) Letter by Wasfy et al Regarding Article, "Facility Level Variation in Hospitalization, Mortality, and Costs in the 30 Days After Percutaneous Coronary Intervention: Insights on Short-Term Healthcare Value From the Veterans Affairs Clinical Assessment, Rep Circulation 133:e376
Carere, Deanna Alexis; Kraft, Peter; Kaphingst, Kimberly A et al. (2016) Consumers report lower confidence in their genetics knowledge following direct-to-consumer personal genomic testing. Genet Med 18:65-72
Zigler, Corwin Matthew (2016) The Central Role of Bayes' Theorem for Joint Estimation of Causal Effects and Propensity Scores. Am Stat 70:47-54

Showing the most recent 10 out of 136 publications