To achieve the goal of broadly impacting cancer diagnosis and prognosis, we propose feature allocation models for the inference of tumor heterogeneity (TH) using next-generation sequencing (NGS) data. Building upon the Indian buffet process (IBP) in nonparametrics Bayesian statistics, we propose posterior inference on unobserved subclones in a tumor sample at the nucleotide level. The subclones are marked by distinctive DNA sequences and copy numbers, reflecting the variations that occur during clonal expansion and tumorgenesis. We will also develop efficient computational approaches for analyzing extensive data generated from NGS experiments, paving ways for real-life applications using the proposed methods.
In Aims 1 and 2, we will focus on statistical model development accounting for noises in the NGS data and set up a scalable computation. We will generalize the classical IBP model to accommodate both categorical and dependent random matrices, giving rise to the cIBP and dIBP models.
In Aim 3, we propose a TH-based clinical trial for personalized cancer treatment. A unique feature of the trial is its comparison of the adaptive treatment strategies based on TH to a standard, fixed treatment strategy that ignores TH. We intend to develop innovative and efficient Bayesian computational approaches, apply the proposed methods using in-house and publically available genomics data, and disseminate all of the developed tools through our online portal at www.compgenome.
org (Aim 4). The proposed research will promote advancement in statistical methodology and foster development of new classes of Bayesian nonparametrics models. Further, with this type of statistical advancement, important questions on tumor heterogeneity will be addressed.

Public Health Relevance

Innovative statistical models for the inference of tumor heterogeneity are expected to significantly improve medical decision making, such as individualized treatment selection for cancer patients. The improved decisions in turn will accelerate the learning of the optimal treatment strategy, thereby improving the overall quality of patient care. The combination of statistical modeling and big-data implementation in a modern electronic health system will pioneer a new generation of medical practice and is expected to drastically improve the efficiency in disease prevention, diagnosis, and prognosis.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA132897-07
Application #
9064084
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Li, Jerry
Project Start
2007-09-01
Project End
2020-04-30
Budget Start
2016-05-01
Budget End
2017-04-30
Support Year
7
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Northshore University Healthsystem
Department
Type
DUNS #
069490621
City
Evanston
State
IL
Country
United States
Zip Code
60201
Wei, Lin; Jin, Zhilin; Yang, Shengjie et al. (2018) TCGA-assembler 2: software pipeline for retrieval and processing of TCGA/CPTAC data. Bioinformatics 34:1615-1617
Ni, Yang; Müller, Peter; Wei, Lin et al. (2018) Bayesian graphical models for computational network biology. BMC Bioinformatics 19:63
Xu, Yanxun; Müller, Peter; Tsimberidou, Apostolia M et al. (2018) A nonparametric Bayesian basket trial design. Biom J :
Müller, Peter; Xu, Yanxun; Thall, Peter F (2017) Clinical Trial Design as a Decision Problem. Appl Stoch Models Bus Ind 33:296-301
Narayanan, Jaishree; Dobrin, Sofia; Choi, Janet et al. (2017) Structured clinical documentation in the electronic medical record to improve quality and to support practice-based research in epilepsy. Epilepsia 58:68-76
Morita, Satoshi; Müller, Peter (2017) Bayesian population finding with biomarkers in a randomized clinical trial. Biometrics 73:1355-1365
Zuanetti, Daiane Aparecida; Müller, Peter; Zhu, Yitan et al. (2017) Clustering distributions with the marginalized nested Dirichlet process. Biometrics :
Shpak, Max; Ni, Yang; Lu, Jie et al. (2017) Variance in estimated pairwise genetic distance under high versus low coverage sequencing: The contribution of linkage disequilibrium. Theor Popul Biol 117:51-63
Manching, Heather; Sengupta, Subhajit; Hopper, Keith R et al. (2017) Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize. G3 (Bethesda) 7:2161-2170
Sengupta, Subhajit; Gulukota, Kamalakar; Zhu, Yitan et al. (2016) Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples. Nucleic Acids Res 44:e25

Showing the most recent 10 out of 58 publications