Recursive partitioning and ensemble methods for classifying an ordinal response

Archer, Kellie

Abstract

? ? Classification methods applied to microarray data have largely been those developed by the machine learning community, since the large p (number of covariates) problem is inherent in high-throughput genomic experiments. The random forest (RF) methodology has been demonstrated to be competitive with other machine learning approaches (e.g., neural networks and support vector machines). Apart from improved accuracy, a clear advantage of the RF method in comparison to most machine learning approaches is that variable importance measures are provided by the algorithm. Therefore, one can assess the relative importance each gene has on the predictive model. In a large number of applications, the class to be predicted may be inherently ordinal. Examples of ordinal responses include TNM stage (I,II,III, IV); drug toxicity (none, mild, moderate, severe); or response to treatment classified as complete response, partial response, stable disease, and progressive disease. These responses are ordinal; while there is an inherent ordering among the responses, there is no known underlying numerical relationship between them. While one can apply standard nominal response methods to ordinal response data, in so doing one loses the ordered information inherent in the data. Since ordinal classification methods have been largely neglected in the machine learning literature, the specific aims of this proposal are to (1) extend the recursive partitioning and RF methodologies for predicting an ordinal response by developing computational tools for the R programming environment; (2) evaluate the proposed ordinal classification methods against alternative methods using simulated, benchmark, and gene expression datasets; (3) develop and evaluate methods for assessing variable importance when interest is in predicting an ordinal response. Novel splitting criteria for classification tree growing and methods for estimating variable importance are proposed, which appropriately take the nature of the ordinal response into consideration. In addition, the Generalized Gini index and ordered twoing methods will be studied under the ensemble learning framework, which has not been previously conducted. This project is significant to the scientific community since the ordinal classification methods to be made available from this project will be broadly applicable to a variety of health, social, and behavioral research fields, which commonly collect responses on an ordinal scale. ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Small Research Grants (R03)
Project #: 1R03LM009347-01A2
Application #: 7470967
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Ye, Jane

Project Start: 2008-08-15
Project End: 2010-07-31
Budget Start: 2008-08-15
Budget End: 2009-07-31
Support Year: 1
Fiscal Year: 2008
Total Cost: $74,521
Indirect Cost

Institution

Name: Virginia Commonwealth University
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 105300446

City: Richmond
State: VA
Country: United States
Zip Code: 23298

Related projects


NIH 2010 R03 LM	Recursive partitioning and ensemble methods for classifying an ordinal response Archer, Kellie J. / Virginia Commonwealth University	$5,742
NIH 2009 R03 LM	Recursive partitioning and ensemble methods for classifying an ordinal response Archer, Kellie J. / Virginia Commonwealth University	$74,750
NIH 2009 R03 LM	Recursive partitioning and ensemble methods for classifying an ordinal response Archer, Kellie J. / Virginia Commonwealth University	$75,000
NIH 2008 R03 LM	Recursive partitioning and ensemble methods for classifying an ordinal response Archer, Kellie J. / Virginia Commonwealth University	$74,521