This project investigates the basic question of how unlabeled data can be most effectively used together with labeled data in machine learning. The goals of this work are three-fold. First, the research aims to achieve a fundamental understanding of this problem, including new methods for reasoning about the kind of information unlabeled data can provide. Second, this research explores new algorithms for using large amounts of unlabeled data together with small amounts of labeled data and background knowledge, in order to achieve performance that greatly exceeds that available using only labeled data and more traditional methods today. The approaches used by the investigators include graph algorithms and random fields, Monte Carlo sampling and spectral methods, closely connected areas of computer science that have found application in computer vision, but that have yet to be fully exploited in machine learning. Finally, targeted applications, including text analysis, image classification, and intrusion detection for computer security, will be investigated to validate the theoretical principles that are developed, and to explore algorithms and suggest new directions for investigation.

The broader impact of this research will be to help enable new technologies to use the volumes of data that are being collected in so many new domains, and on such a great scale. Advances in our understanding of the possibilities for, and fundamental limits to, combining labeled and unlabeled data has the potential to impact many scientific fields, allowing researchers to more easily use the vast quantities of data that are available but not necessarily annotated for their own specific needs. It also may ultimately influence the future data collection initiatives that our society chooses to invest in.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0312814
Program Officer
Edwina L. Rissland
Project Start
Project End
Budget Start
2003-09-01
Budget End
2006-08-31
Support Year
Fiscal Year
2003
Total Cost
$395,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213