Sequential and graph-structured data arise naturally in a wide variety of scientific, engineering, and intelligence problems, such as handwriting and speech recognition, text mining, gene finding, and network analysis. While researchers have recently made significant progress on machine learning methods for processing structured data, these methods are much less accessible to scientists, engineers, and analysts than the better understood statistical learning techniques of classification and regression.

This project is researching methods to advance the state of the art in machine learning for structured data, building on recent work in conditional random fields and weighted transducers. The project is also developing a software toolkit to make the results of these advances accessible to researchers working in a wide range of disciplines and application domains. The toolkit will enable users to define, train, and apply models for structured data without requiring advanced expertise in machine learning. The functionality of the toolkit will include methods for specifying features relevant to an application, automatically selecting the most relevant features, adjusting parameters to optimize suitable training objectives, and combining models that pertain to different facets of an application.

The software, which will be freely distributed, will be tested with selected users in several application domains, and be carefully documented. The project will thus provide the scientific and engineering community with the first generally usable tool for learning from structured data, serving a role that is parallel to that of the more standard tools for classification and regression that are already widely used.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0427206
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2004-09-01
Budget End
2008-08-31
Support Year
Fiscal Year
2004
Total Cost
$333,134
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213