This proposal will develop a series of statistical predictive models to test the hypothesis that differences in patterns of gene expression determine the differing biologic behaviors between colon cancers that are curable with primary surgical therapy, and those that ultimately metastasize to the liver and kill. To generate data to test this hypothesis, Dr. Sanford Markowitz and his colleagues in the cancer genetics program at the Case Western Reserve University-NCI designated Comprehensive Cancer Center have established a series of data archives. The primary archive contains metastatic and non-metastatic colon cancers, anatomical staging information, clinical follow-up information and gene expressions collected on Affymetrix human 40K GeneChips using Eos Biotechnology Inc. expression algorithms. Further, an independent validation archive of 350 colon cancers containing anatomical staging information and clinical follow-up data. In addition promising candidate genes can be assayed for expression from colon cancers in this archive. The goal is to develop predictive models that will identify genes making up a so-called """"""""metastatic signature"""""""" for colon cancer. In order to accomplish this, Dr. J. Sunil Rao of the Department of Epidemiology and Biostatistics at CWRU has established a mentoring relationship with Dr. Markowitz in which Dr. Rao will use some recent predictive data mining tools with modifications necessary for modeling of this data. Specifically, Dr. Rao will develop methods that: 1) incorporate measurement error and differential variability for gene expressions; 2) aggregate tree-based classifiers from resampled data (bagging) for accurate predictions; 3) refine bagging using a within-cluster type of bagging for further increased prediction accuracy; 4) cluster observations (and genes) based on estimating a finite mixture of Gamma distributions using approximate Bayesian computing and Dirichlet process priors; 5) formally deal with tuning parameters implicit in the tree-building process; and lastly, 6) collate results into a high-dimensional visual graphic known as the CAT scan for extracting the nature of the hypothesized metastatic signature. A full theoretical and empirical evaluation of all algorithms using simulations and the two colon cancer archives will be made.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Mentored Quantitative Research Career Development Award (K25)
Project #
Application #
Study Section
Subcommittee G - Education (NCI)
Program Officer
Eckstein, David J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Case Western Reserve University
Public Health & Prev Medicine
Schools of Medicine
United States
Zip Code
Dazard, Jean-Eudes; Rao, J Sunil; Markowitz, Sanford (2012) Local sparse bump hunting reveals molecular heterogeneity of colon tumors. Stat Med 31:1203-20
Dazard, Jean-Eudes; Rao, J Sunil (2010) Local Sparse Bump Hunting. J Comput Graph Stat 19:900-929
Ishwaran, Hemant; Rao, J Sunil; Kogalur, Udaya B (2006) BAMarraytrade mark: Java software for Bayesian analysis of variance for microarray data. BMC Bioinformatics 7:59
Rao, J Sunil; Li, Jingjin (2003) Statistical methods for chip calibration and saturation effects in antibody-spiked gene expression data. Respir Physiol Neurobiol 135:109-19