Solid tissue samples frequently consist of two distinct compartments, an epithelium-derived tumor and its surrounding stroma. Current analysis of tissue samples composed of both tumor cells and stromal cells may under-detect gene expression signatures associated with cancer prognosis or response to treatment. Modeling the separate tissue compartments is necessary for a better understanding of the biological mechanisms underlying cancer. However, compartmental modeling is difficult from a methodological perspective, and adequate statistical methods have not yet been developed for this purpose. Current methods for in silico separation of expression levels from different compartments of a tissue sample have limited utility as they require previous knowledge of either the various mixing proportions of the patient samples, or the actual expression levels in a few genes (i.e., reference genes) across all tissue compartments. This challenge significantly limits our ability to identify molecular subtypes in both tumor and stroma that are predictive of personalized therapeutic targets. This proposal is to develop novel methods and analytic tools to address these important challenges for the in silico dissection of tumor samples and to demonstrate the utility of these tools by investigating the effect of individual tumor sample components and their interactions with drug treatments for lung cancer.
Our Aim 1 will provide a Bayesian hierarchical model and related software tools that will have the ability to computationally "dissect" signals within patient samples. This model will take advantage of all existing data and multiple data types, which consequently reduces the need for the prior knowledge that would otherwise be difficult to obtain. This will enable researchers to investigate the expression profiles of individual tumor tissue and surrounding stromal tissues for a much larger set of samples than was previously feasible. It will also provide new ways to increase the accuracy of the genomic analysis of any mixed samples.
Our Aim 2 will re-analyze, by deconvolution, what is to our knowledge the largest set of genomic data for the molecular profiling of lung tumors, all of which were collected at MD Anderson Cancer Center. Lung cancer leads amongst all cancers in causing death anywhere in the world. A thorough understanding of tumor biology is critical to the design of effective treatment modalities. Our analyses will include genomic data from more than 500 patients, generated from two innovative biomarker-based clinical trials: the Biomarker-integrated Approaches of Targeted Therapy for Lung Cancer Elimination (BATTLE) trials, and the Profiling of Resistance Patterns &Oncogenic Signaling Pathways in Evaluation of Cancers of the Thorax and Therapeutic Target Identification (PROSPECT) trials. We focus on the study of one prototype example, lung cancer, because of the public impact of the disease and also the likely role of the tumor-stroma interaction in determining clinical outcomes. Our proof-of-principle investigation of the lung cancer data would be the first of its kind, and has the potential to identify new biomarkers predictive of the effects of drug treatments on the survival time of individuals with lung cancer.

Public Health Relevance

The proposed research is relevant to public health because it will provide tools for the in silico dissection of biological signals from multiple types of genomi data generated from tumor samples, which are often heterogeneous in cell type composition. These methods will provide a cost-effective strategy as an alternative to time-consuming micro-dissection experiments. It will also offer new opportunities to understanding the roles of stromal cells that surround lung tumors and to apply that understanding to the development of personalized therapeutics in lung cancer.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Chen, Huann-Sheng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Texas MD Anderson Cancer Center
Biostatistics & Other Math Sci
Other Domestic Higher Education
United States
Zip Code
Fan, Yu; Xi, Liu; Hughes, Daniel S T et al. (2016) MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol 17:178
Palculict, Timothy Blake; Ruteshouser, E Cristy; Fan, Yu et al. (2016) Identification of germline DICER1 mutations and loss of heterozygosity in familial Wilms tumour. J Med Genet 53:385-8
Nikooienejad, Amir; Wang, Wenyi; Johnson, Valen E (2016) Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors. Bioinformatics 32:1338-45
Lefterova, Martina I; Shen, Peidong; Odegaard, Justin I et al. (2016) Next-Generation Molecular Testing of Newborn Dried Blood Spots for Cystic Fibrosis. J Mol Diagn 18:267-82
Ewing, Adam D; Houlahan, Kathleen E; Hu, Yin et al. (2015) Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat Methods 12:623-30
Cancer Genome Atlas Research Network (2015) The Molecular Taxonomy of Primary Prostate Cancer. Cell 163:1011-25
Fang, Li Tai; Afshar, Pegah Tootoonchi; Chhibber, Aparna et al. (2015) An ensemble approach to accurately detect somatic mutations using SomaticSeq. Genome Biol 16:197
Davis, Caleb F; Ricketts, Christopher J; Wang, Min et al. (2014) The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26:319-30
Peng, Gang; Fan, Yu; Wang, Wenyi (2014) FamSeq: a variant calling program for family-based sequencing data using graphics processing units. PLoS Comput Biol 10:e1003880