Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research

Lin, Xihong

Abstract

With the advances of technologies, cancer research enterprise is rapidly becoming data-intensive and data- driven. One example is the explosion of biotechnologies and the generation of massive genetic and genomic data, such as whole genome sequencing data. Another example is health informatics, which allows rapid avail- ability of large administrative health care databases, such as electronic medical records and Medicare claim data. Cancer data science has emerged to be increasingly important in cancer research. Indeed, massive data provide unprecedented opportunities for new discovery in cancer. This project aims at development and application of statistical and computational methods for analysis of massive and complex genetic and genomic data, together with epidemiological and clinical data, in population and medical science of cancer research. Our ultimate goal is to use rich data sources to understand cancer etiology, risk, and prognosis, and discover new effective strategies for cancer prevention, intervention and treatment. It has become increasingly evident that limited methods suitable for analyzing massive data have emerged as a bottleneck to effectively translate rich information into meaningful knowledge. There is a pressing need to develop statistical and computational methods for massive cancer data to bridge the technology and information transfer gap, and accelerate innovations in cancer prevention and treatment. This Project aims at narrowing this gap. Specifically, to advance genetic and genomic cancer epidemiology, we will develop statistical and computational methods for (a) analysis of whole genome sequencing association studies; (b) integrative analysis of genetic, genomic, and environment data; (c) study of gene-environment interactions; (d) risk prediction using whole genome genetic and genomic data and environmental data. To advance cancer genomic medicine, we will develop statistical and computational methods for integrative analysis of genetic, genomic and clinical data to understand cancer prognosis and advance precision medicine using (a) data from genetic epidemiological cohort studies; (b) combining data from genetic epidemiological cohort studies with administrative databases such as electronic medical records and Medicare claim data. We have assembled a strong collaborative interdisciplinary team of researchers involving biostatisticians, computational biologists, health informaticians, genetic epidemiologists and clinical scientists. We will apply te proposed methods to lung, breast and nasopharynx cancer genetic epidemiological and clinical studies. We will develop open access user friendly software to be distributed to the research community, and open online educational modules for training cancer researchers in using the methods developed in this Project.

Public Health Relevance

Analytic methods, such as statistical and computational methods, that can handle the complexities associated with big cancer data, play a pivotal role in capitalizing more fully on such data. They will enable cancer researchers to timely and effectively extract knowledge from massive, complex and diverse data, and gain insights in cancer etiology, risk and prognosis, and develop new strategies to reduce cancer burden and improve patient care.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Unknown (R35)
Project #: 5R35CA197449-02
Application #: 9120850
Study Section: Special Emphasis Panel (ZCA1)
Program Officer: Chen, Huann-Sheng

Project Start: 2015-08-05
Project End: 2022-07-31
Budget Start: 2016-08-01
Budget End: 2017-07-31
Support Year: 2
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Harvard University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code

Related projects


NIH 2020 R35 CA	Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research Lin, Xihong / Harvard University
NIH 2019 R35 CA	Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research Lin, Xihong / Harvard University
NIH 2018 R35 CA	Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research Lin, Xihong / Harvard University
NIH 2017 R35 CA	Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research Lin, Xihong / Harvard University
NIH 2016 R35 CA	Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research Lin, Xihong / Harvard University
NIH 2015 R35 CA	Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research Lin, Xihong / Harvard University	$963,452

Publications

Xia, Yin; Cai, Tianxi; Cai, T Tony (2018) Multiple Testing of Submatrices of a Precision Matrix with Applications to Identification of Between Pathway Interactions. J Am Stat Assoc 113:328-339

Domenyuk, Valeriy; Gatalica, Zoran; Santhanam, Radhika et al. (2018) Poly-ligand profiling differentiates trastuzumab-treated breast cancer patients according to their outcomes. Nat Commun 9:1219

Barfield, Richard; Feng, Helian; Gusev, Alexander et al. (2018) Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol 42:418-433

Liu, Zhonghua; Lin, Xihong (2018) Multiple phenotype association tests using summary statistics in genome-wide association studies. Biometrics 74:165-175

Lopes-Ramos, Camila M; Kuijjer, Marieke L; Ogino, Shuji et al. (2018) Gene Regulatory Network Analysis Identifies Sex-Linked Differences in Colon Cancer Drug Metabolism. Cancer Res 78:5538-5547

Sinnott, Jennifer A; Cai, Tianxi (2018) Pathway aggregation for survival prediction via multiple kernel learning. Stat Med 37:2501-2515

Sun, Ryan; Carroll, Raymond J; Christiani, David C et al. (2018) Testing for gene-environment interaction under exposure misspecification. Biometrics 74:653-662

Antonelli, Joseph; Cefalu, Matthew; Palmer, Nathan et al. (2018) Doubly robust matching estimators for high dimensional confounding adjustment. Biometrics :

Wei, Yongyue; Liang, Junya; Zhang, Ruyang et al. (2018) Epigenetic modifications in KDM lysine demethylases associate with survival of early-stage NSCLC. Clin Epigenetics 10:41

Shen, Sipeng; Zhang, Ruyang; Guo, Yichen et al. (2018) A multi-omic study reveals BTG2 as a reliable prognostic marker for early-stage non-small cell lung cancer. Mol Oncol 12:913-924

Showing the most recent 10 out of 127 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: