Modern machine learning is limited in its ability to use diverse information during training. This project is developing algorithms in the SVM family that allow extra information to be used effectively during training, with the understanding that this extra information will not be available during actual operation. Examples of extra information include structural homologies between proteins in a system designed to predict structure from amino acid sequences; and values for a financial time series between the time where a prediction is made and the time of the value being predicted. Preliminary testing has shown that such extra information can dramatically reduce prediction error in the learned system compared with current generation machine learning methods that cannot use this extra information.

This project encompasses analytic research to establish performance bounds on our new algorithms, and to explore the relationships of this work to human learning. The project also includes experimental work, including construction of novel training and testing datasets; software implementation of the algorithms; and training, testing and analysis of experimental results. Areas of application include handwritten character recognition; 3-D protein structure prediction; non-linear time series prediction, for example of financial time series; and prediction of likelihood of hospital readmittance for elderly patients. This project aims to give greater insight into the nature of learning, whether in humans or machines, and seeks to formally take into account data that is today seen as only peripheral to the learning task, and impossible for current machine learning algorithms to use.

The project will produce technical articles, a book, and teaching materials explaining this research. In addition the project will produce sharable software that implements the best version of the algorithm devised during the life of the project.

Project Start
Project End
Budget Start
2009-09-15
Budget End
2011-08-31
Support Year
Fiscal Year
2009
Total Cost
$499,071
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027