With the advances of technologies, cancer research enterprise is rapidly becoming data-intensive and data- driven. One example is the explosion of biotechnologies and the generation of massive genetic and genomic data, such as whole genome sequencing data. Another example is health informatics, which allows rapid avail- ability of large administrative health care databases, such as electronic medical records and Medicare claim data. Cancer data science has emerged to be increasingly important in cancer research. Indeed, massive data provide unprecedented opportunities for new discovery in cancer. This project aims at development and application of statistical and computational methods for analysis of massive and complex genetic and genomic data, together with epidemiological and clinical data, in population and medical science of cancer research. Our ultimate goal is to use rich data sources to understand cancer etiology, risk, and prognosis, and discover new effective strategies for cancer prevention, intervention and treatment. It has become increasingly evident that limited methods suitable for analyzing massive data have emerged as a bottleneck to effectively translate rich information into meaningful knowledge. There is a pressing need to develop statistical and computational methods for massive cancer data to bridge the technology and information transfer gap, and accelerate innovations in cancer prevention and treatment. This Project aims at narrowing this gap. Specifically, to advance genetic and genomic cancer epidemiology, we will develop statistical and computational methods for (a) analysis of whole genome sequencing association studies; (b) integrative analysis of genetic, genomic, and environment data; (c) study of gene-environment interactions; (d) risk prediction using whole genome genetic and genomic data and environmental data. To advance cancer genomic medicine, we will develop statistical and computational methods for integrative analysis of genetic, genomic and clinical data to understand cancer prognosis and advance precision medicine using (a) data from genetic epidemiological cohort studies; (b) combining data from genetic epidemiological cohort studies with administrative databases such as electronic medical records and Medicare claim data. We have assembled a strong collaborative interdisciplinary team of researchers involving biostatisticians, computational biologists, health informaticians, genetic epidemiologists and clinical scientists. We will apply te proposed methods to lung, breast and nasopharynx cancer genetic epidemiological and clinical studies. We will develop open access user friendly software to be distributed to the research community, and open online educational modules for training cancer researchers in using the methods developed in this Project.

Public Health Relevance

Analytic methods, such as statistical and computational methods, that can handle the complexities associated with big cancer data, play a pivotal role in capitalizing more fully on such data. They will enable cancer researchers to timely and effectively extract knowledge from massive, complex and diverse data, and gain insights in cancer etiology, risk and prognosis, and develop new strategies to reduce cancer burden and improve patient care.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Unknown (R35)
Project #
5R35CA197449-04
Application #
9532792
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Chen, Huann-Sheng
Project Start
2015-08-05
Project End
2022-07-31
Budget Start
2018-08-01
Budget End
2019-07-31
Support Year
4
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
Xia, Yin; Cai, Tianxi; Cai, T Tony (2018) Multiple Testing of Submatrices of a Precision Matrix with Applications to Identification of Between Pathway Interactions. J Am Stat Assoc 113:328-339
Domenyuk, Valeriy; Gatalica, Zoran; Santhanam, Radhika et al. (2018) Poly-ligand profiling differentiates trastuzumab-treated breast cancer patients according to their outcomes. Nat Commun 9:1219
Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433
Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175
Lopes-Ramos, Camila M; Kuijjer, Marieke L; Ogino, Shuji et al. (2018) Gene Regulatory Network Analysis Identifies Sex-Linked Differences in Colon Cancer Drug Metabolism. Cancer Res 78:5538-5547
Sinnott, Jennifer A; Cai, Tianxi (2018) Pathway aggregation for survival prediction via multiple kernel learning. Stat Med 37:2501-2515
Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662
Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan et al. (2018) Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics :
Wei, Yongyue; Liang, Junya; Zhang, Ruyang et al. (2018) Epigenetic modifications in KDM lysine demethylases associate with survival of early-stage NSCLC. Clin Epigenetics 10:41
Shen, Sipeng; Zhang, Ruyang; Guo, Yichen et al. (2018) A multi-omic study reveals BTG2 as a reliable prognostic marker for early-stage non-small cell lung cancer. Mol Oncol 12:913-924

Showing the most recent 10 out of 127 publications