We propose a Program Project, Statistical Informatics in Cancer Research, to tackle a series of problems motivated by the analysis of high dimensional data arising in population-based studies of cancer. This Program Project comprises three research projects and two cores. Project 1 focuses on spatio-temporal modeling of disease count data collected for administrative areas.
The specific aims are motivated by problems encountered in epidemiological studies designed to monitor and assess health disparities. Our proposed methods address issues associated with administrative boundaries changing over time, sparse disease counts, spatial confounding, and heavy computational burdens for large data sets. Methods will be applied to data on U.S. breast cancer incidence from three state cancer registries, Boston-area premature mortality, and NCI SEER data. Project 2 is also motivated by spatially-indexed data related to cancer incidence and mortality, but the emphasis is on population surveillance and spatial cluster detection. Three of the specific aims of Project 2 are motivated by the analysis of NCI SEER data and one from a case/control study designed to assess spatial clustering in childhood leukemia. This dataset also includes individual level data on several genetic biomarkers of susceptibility. One sub-aim of this project assesses gene-space interaction by studying whether disease clustering patterns differ according to genetic polymorphisms. Project 3 focuses on methods for the analysis of very high dimensional genomic and proteomic biomarkers. Extensions to spatially indexed genomic data are also considered in Project 3. All of the aims of the three projects are closely integrated with the motivating real world cancer studies in which the investigators are involved. The three projects link thematically through a focus on population-based, observational studies in cancer, as well as technically through the consideration of high-dimensional correlated data (arising from different sources) that require advanced statistical and computing methods. Several specific techniques (e.g. spatio-temporal modeling, penalized likelihoods, False Discovery Rates, hidden Markov models) are shared between two and in some cases all three projects. The two cores consist of an Administrative Core and a Statistical Computing Core. The Administrative Core will coordinate the overall scientific direction and programmatic activities of Program, which will include short courses, a visitor program, dissemination of research results, and an external advisory committee. A Statistical Computing Core will ensure the development and dissemination of open access, good quality, user friendly software designed to implement the statistical methods developed in the Research Projects, which is the final Specific Aim of each of the three projects. The Program Director and Co-Director, Professors Louise Ryan and Xihong Lin, respectively, are internationally known biostatisticians with strong track records of academic administration.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-RPRB-7 (M1))
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
García-Albéniz, Xabier; Maurel, Joan; Hernán, Miguel A (2015) Why post-progression survival and post-relapse survival are not appropriate measures of efficacy in cancer randomized clinical trials. Int J Cancer 136:2444-7
Aschard, Hugues; Vilhjálmsson, Bjarni J; Greliche, Nicolas et al. (2014) Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet 94:662-76
VanderWeele, Tyler J; Tchetgen Tchetgen, Eric J; Cornelis, Marilyn et al. (2014) Methodological challenges in mendelian randomization. Epidemiology 25:427-35
Krieger, Nancy; Kosheleva, Anna; Waterman, Pamela D et al. (2014) 50-year trends in US socioeconomic inequalities in health: US-born Black and White Americans, 1959-2008. Int J Epidemiol 43:1294-313
Holme, Øyvind; Løberg, Magnus; Kalager, Mette et al. (2014) Effect of flexible sigmoidoscopy screening on colorectal cancer incidence and mortality: a randomized clinical trial. JAMA 312:606-15
Bobb, Jennifer F; Obermeyer, Ziad; Wang, Yun et al. (2014) Cause-specific risk of hospital admission related to extreme heat in older adults. JAMA 312:2659-67
Lee, Seunggeung; Abecasis, Gonçalo R; Boehnke, Michael et al. (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95:5-23
Arvold, Nils D; Wang, Yun; Zigler, Cory et al. (2014) Hospitalization burden and survival among older glioblastoma patients. Neuro Oncol 16:1530-40
Zigler, Corwin Matthew; Dominici, Francesca (2014) Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model Averaged Causal Effects. J Am Stat Assoc 109:95-107
Wang, Yun; Schrag, Deborah; Brooks, Gabriel A et al. (2014) National trends in pancreatic cancer outcomes and pattern of care among Medicare beneficiaries, 2000 through 2010. Cancer 120:1050-8

Showing the most recent 10 out of 40 publications