Biotechnological innovations that have occurred over the past two decades have generated much enthusiasm for using genomic, proteomic, and other high-dimensional data to improve the treatment of diseases by tailoring therapies on an individualized basis. The expectation has been that multi-dimensional 'omics data can complement the traditional data and methods used in the clinical diagnosis and treatment of cancer and other diseases. Computerized classification algorithms have been used to predict risks and responses of patients based on high-dimensional data on predictor variables such as expression levels of many hundreds or thousands of genes. However, these algorithms have yet to demonstrate sufficient predictive ability to be adopted in routine clinical practice. A basic premise of this proposal is that the limited success realized so far may be due largely to the common assumption of population homogeneity. That is, all classification algorithms classify all patients according to the same set of criteria. The present proposal takes a different view, and allows for population heterogeneity that may not be readily apparent, and thus not controlled. Population heterogeneity suggests that instead of there being simply two well-defined classes in the population (e.g., healthy and diseased), there can exist unidentified or hidden subpopulations that span both classes. The novel selective-voting algorithm that forms the basis of this proposal allows for the existence of subpopulations that need not be identified in advance. A new classification ensemble with selective voting is developed based on the principal investigator's previous work with two- dimensional convex hulls of positive and negative training samples. Members of the ensemble are allowed to vote on unknown test samples only if they are located within or "behind" suitably reduced, or trimmed, convex hulls of training samples. It is proposed that the subsets of predictor variables associated with the votes of the members of the new classification ensemble may be useful in helping to identify the hidden subpopulations of patients. The added value of the selective-voting algorithm in supporting clinical decision making will be validated in real clinical practice with high-dimensional proteomic data on cervical/endometrial cancer patients at the University of Arkansas for Medical Sciences who experience gastrointestinal mucositis following radiation therapy. Validation of the new algorithm's increased accuracy will be carried out using publicly available data with cancer as the outcome variable and expression levels of thousands of genes as the predictors. R-based software will be developed for implementing the algorithms and associated graphical methods, and will be offered free to users.

Public Health Relevance

Unlike other classification algorithms that operate on high-dimensional data to predict a patient's cancer risk, prognosis, and response to therapy, the novel selective voting scheme that forms the basis of this proposal allows for population heterogeneity that may not be readily apparent and thus not controlled. This represents a fundamental departure from the norm, and is expected to lead to significant advancement in the assignment of therapies on an individualized basis. Not only will this increase the likelihood of successful treatment, but also it can contribute to a better understanding of the underlying cancer processes.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Jessup, John M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Arkansas for Medical Sciences
Biostatistics & Other Math Sci
Schools of Medicine
Little Rock
United States
Zip Code
Hauer-Jensen, Martin (2014) Toward development of interleukin-11 as a medical countermeasure for use in radiological/nuclear emergencies. Dig Dis Sci 59:1349-51
Zhang, Chuanlei; Kodell, Ralph L (2013) Subpopulation-specific confidence designation for more informative biomedical classification. Artif Intell Med 58:155-63
Kodell, Ralph L; Zhang, Chuanlei; Siegel, Eric R et al. (2012) Selective voting in convex-hull ensembles improves classification accuracy. Artif Intell Med 54:171-9