Solid tissue samples frequently consist of two distinct compartments, an epithelium-derived tumor and its surrounding stroma. Current analysis of tissue samples composed of both tumor cells and stromal cells may under-detect gene expression signatures associated with cancer prognosis or response to treatment. Modeling the separate tissue compartments is necessary for a better understanding of the biological mechanisms underlying cancer. However, compartmental modeling is difficult from a methodological perspective, and adequate statistical methods have not yet been developed for this purpose. Current methods for in silico separation of expression levels from different compartments of a tissue sample have limited utility as they require previous knowledge of either the various mixing proportions of the patient samples, or the actual expression levels in a few genes (i.e., reference genes) across all tissue compartments. This challenge significantly limits our ability to identify molecular subtypes in both tumor and stroma that are predictive of personalized therapeutic targets. This proposal is to develop novel methods and analytic tools to address these important challenges for the in silico dissection of tumor samples and to demonstrate the utility of these tools by investigating the effect of individual tumor sample components and their interactions with drug treatments for lung cancer.
Our Aim 1 will provide a Bayesian hierarchical model and related software tools that will have the ability to computationally dissect signals within patient samples. This model will take advantage of all existing data and multiple data types, which consequently reduces the need for the prior knowledge that would otherwise be difficult to obtain. This will enable researchers to investigate the expression profiles of individual tumor tissue and surrounding stromal tissues for a much larger set of samples than was previously feasible. It will also provide new ways to increase the accuracy of the genomic analysis of any mixed samples.
Our Aim 2 will re-analyze, by deconvolution, what is to our knowledge the largest set of genomic data for the molecular profiling of lung tumors, all of which were collected at MD Anderson Cancer Center. Lung cancer leads amongst all cancers in causing death anywhere in the world. A thorough understanding of tumor biology is critical to the design of effective treatment modalities. Our analyses will include genomic data from more than 500 patients, generated from two innovative biomarker-based clinical trials: the Biomarker-integrated Approaches of Targeted Therapy for Lung Cancer Elimination (BATTLE) trials, and the Profiling of Resistance Patterns & Oncogenic Signaling Pathways in Evaluation of Cancers of the Thorax and Therapeutic Target Identification (PROSPECT) trials. We focus on the study of one prototype example, lung cancer, because of the public impact of the disease and also the likely role of the tumor-stroma interaction in determining clinical outcomes. Our proof-of-principle investigation of the lung cancer data would be the first of its kind, and has the potential to identify new biomarkers predictive of the effects of drug treatments on the survival time of individuals with lung cancer.

Public Health Relevance

The proposed research is relevant to public health because it will provide tools for the in silico dissection of biological signals from multiple types of genomi data generated from tumor samples, which are often heterogeneous in cell type composition. These methods will provide a cost-effective strategy as an alternative to time-consuming micro-dissection experiments. It will also offer new opportunities to understanding the roles of stromal cells that surround lung tumors and to apply that understanding to the development of personalized therapeutics in lung cancer.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Chen, Huann-Sheng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas MD Anderson Cancer Center
Biostatistics & Other Math Sci
United States
Zip Code
Ellrott, Kyle; Bailey, Matthew H; Saksena, Gordon et al. (2018) Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. Cell Syst 6:271-281.e7
Korkut, Anil; Zaidi, Sobia; Kanchi, Rupa S et al. (2018) A Pan-Cancer Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by the TGF-? Superfamily. Cell Syst 7:422-437.e7
Li, Jialu; Fu, Chunxiao; Speed, Terence P et al. (2018) Accurate RNA Sequencing From Formalin-Fixed Cancer Tissue To Represent High-Quality Transcriptome From Frozen Tissue. JCO Precis Oncol 2018:
Peng, Gang; Bojadzieva, Jasmina; Ballinger, Mandy L et al. (2017) Estimating TP53 Mutation Carrier Probability in Families with Li-Fraumeni Syndrome Using LFSPRO. Cancer Epidemiol Biomarkers Prev 26:837-844
Ahn, Jaeil; Morita, Satoshi; Wang, Wenyi et al. (2017) Bayesian analysis of longitudinal dyadic data with informative missing data using a dyadic shared-parameter model. Stat Methods Med Res :962280217715051
Holik, Aliaksei Z; Law, Charity W; Liu, Ruijie et al. (2017) RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods. Nucleic Acids Res 45:e30
Nikooienejad, Amir; Wang, Wenyi; Johnson, Valen E (2016) Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors. Bioinformatics 32:1338-45
Palculict, Timothy Blake; Ruteshouser, E Cristy; Fan, Yu et al. (2016) Identification of germline DICER1 mutations and loss of heterozygosity in familial Wilms tumour. J Med Genet 53:385-8
Fan, Yu; Xi, Liu; Hughes, Daniel S T et al. (2016) MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol 17:178
Lefterova, Martina I; Shen, Peidong; Odegaard, Justin I et al. (2016) Next-Generation Molecular Testing of Newborn Dried Blood Spots for Cystic Fibrosis. J Mol Diagn 18:267-82

Showing the most recent 10 out of 16 publications