We propose a Program Project, Statistical Informatics in Cancer Research, to tackle a series of problems motivated by the analysis of high dimensional data arising in population-based studies of cancer. This Program Project comprises three research projects and two cores. Project 1 focuses on spatio-temporal modeling of disease count data collected for administrative areas.
The specific aims are motivated by problems encountered in epidemiological studies designed to monitor and assess health disparities. Our proposed methods address issues associated with administrative boundaries changing over time, sparse disease counts, spatial confounding, and heavy computational burdens for large data sets. Methods will be applied to data on U.S. breast cancer incidence from three state cancer registries, Boston-area premature mortality, and NCI SEER data. Project 2 is also motivated by spatially-indexed data related to cancer incidence and mortality, but the emphasis is on population surveillance and spatial cluster detection. Three of the specific aims of Project 2 are motivated by the analysis of NCI SEER data and one from a case/control study designed to assess spatial clustering in childhood leukemia. This dataset also includes individual level data on several genetic biomarkers of susceptibility. One sub-aim of this project assesses gene-space interaction by studying whether disease clustering patterns differ according to genetic polymorphisms. Project 3 focuses on methods for the analysis of very high dimensional genomic and proteomic biomarkers. Extensions to spatially indexed genomic data are also considered in Project 3. All of the aims of the three projects are closely integrated with the motivating real world cancer studies in which the investigators are involved. The three projects link thematically through a focus on population-based, observational studies in cancer, as well as technically through the consideration of high-dimensional correlated data (arising from different sources) that require advanced statistical and computing methods. Several specific techniques (e.g. spatio-temporal modeling, penalized likelihoods, False Discovery Rates, hidden Markov models) are shared between two and in some cases all three projects. The two cores consist of an Administrative Core and a Statistical Computing Core. The Administrative Core will coordinate the overall scientific direction and programmatic activities of Program, which will include short courses, a visitor program, dissemination of research results, and an external advisory committee. A Statistical Computing Core will ensure the development and dissemination of open access, good quality, user friendly software designed to implement the statistical methods developed in the Research Projects, which is the final Specific Aim of each of the three projects. The Program Director and Co-Director, Professors Louise Ryan and Xihong Lin, respectively, are internationally known biostatisticians with strong track records of academic administration.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Program Projects (P01)
Project #
5P01CA134294-02
Application #
7686103
Study Section
Special Emphasis Panel (ZCA1-RPRB-7 (M1))
Program Officer
Dunn, Michelle C
Project Start
2008-09-10
Project End
2013-08-31
Budget Start
2009-09-01
Budget End
2010-08-31
Support Year
2
Fiscal Year
2009
Total Cost
$682,555
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115
Bobb, Jennifer F; Claus Henn, Birgit; Valeri, Linda et al. (2018) Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health 17:67
Chen, Han; Cade, Brian E; Gleason, Kevin J et al. (2018) Multiethnic Meta-Analysis Identifies RAI1 as a Possible Obstructive Sleep Apnea-related Quantitative Trait Locus in Men. Am J Respir Cell Mol Biol 58:391-401
Pierce, Brandon L; Kraft, Peter; Zhang, Chenan (2018) Mendelian randomization studies of cancer risk: a literature review. Curr Epidemiol Rep 5:184-196
Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433
Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175
Emilsson, Louise; García-Albéniz, Xabier; Logan, Roger W et al. (2018) Examining Bias in Studies of Statin Treatment and Survival in Patients With Cancer. JAMA Oncol 4:63-70
Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662
Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan et al. (2018) Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics :
Wilson, Ander; Zigler, Corwin M; Patel, Chirag J et al. (2018) Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics 74:1034-1044
Braun, Danielle; Gorfine, Malka; Parmigiani, Giovanni et al. (2017) Propensity scores with misclassified treatment assignment: a likelihood-based adjustment. Biostatistics 18:695-710

Showing the most recent 10 out of 192 publications