Classification methods applied to microarray data have largely been those developed by the machine learning community, since the large p (number of covariates) problem is inherent in high-throughput genomic experiments. The random forest (RF) methodology has been demonstrated to be competitive with other machine learning approaches (e.g., neural networks and support vector machines). Apart from improved accuracy, a clear advantage of the RF method in comparison to most machine learning approaches is that variable importance measures are provided by the algorithm. Therefore, one can assess the relative importance each gene has on the predictive model. In a large number of applications, the class to be predicted may be inherently ordinal. Examples of ordinal responses include TNM stage (I,II,III, IV);drug toxicity (none, mild, moderate, severe);or response to treatment classified as complete response, partial response, stable disease, and progressive disease. These responses are ordinal;while there is an inherent ordering among the responses, there is no known underlying numerical relationship between them. While one can apply standard nominal response methods to ordinal response data, in so doing one loses the ordered information inherent in the data. Since ordinal classification methods have been largely neglected in the machine learning literature, the specific aims of this proposal are to (1) extend the recursive partitioning and RF methodologies for predicting an ordinal response by developing computational tools for the R programming environment;(2) evaluate the proposed ordinal classification methods against alternative methods using simulated, benchmark, and gene expression datasets;(3) develop and evaluate methods for assessing variable importance when interest is in predicting an ordinal response. Novel splitting criteria for classification tree growing and methods for estimating variable importance are proposed, which appropriately take the nature of the ordinal response into consideration. In addition, the Generalized Gini index and ordered twoing methods will be studied under the ensemble learning framework, which has not been previously conducted. This project is significant to the scientific community since the ordinal classification methods to be made available from this project will be broadly applicable to a variety of health, social, and behavioral research fields, which commonly collect responses on an ordinal scale.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Small Research Grants (R03)
Project #
5R03LM009347-02
Application #
7670456
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2008-08-15
Project End
2011-07-31
Budget Start
2009-08-01
Budget End
2011-07-31
Support Year
2
Fiscal Year
2009
Total Cost
$74,750
Indirect Cost
Name
Virginia Commonwealth University
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
105300446
City
Richmond
State
VA
Country
United States
Zip Code
23298