This project considers the problem of simultaneously solving multiple component-level natural language processing problems. Such component-level tasks are necessary as building blocks for large-scale applications (eg., automatic document summarization, machine translation, etc.), but are typically solved independently. These independent solutions ignore the natural connections that relate the output of one problem to the output of the other. This research explores the ability to exploit such output correspondences to aid machine learning algorithms, termed "Cross-Task Learning." These output correspondences provide strong prior information about the relationship between the desired outputs of multiple problems. This prior knowledge can potentially serve to improve task-level performance, even when large amounts of training data are unavailable. The research exploits such prior knowledge using a k-best methodology so as to maximize the applicability of these techniques. It also develops new techniques for semi-supervised learning based on the idea of output correspondences in order to capitalize on the vast amounts of unannotated data that are available. In addition, the proposed techniques are analyzed in the context of computational learning theory. The outcome will be a set of techniques for learning across multiple natural language processing tasks. This technology will be empirically evaluated in the context of low-level tasks such as shallow parsing and named entity recognition, as well as the high-level tasks of discourse analysis and automatic document summarization.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0712764
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-08-15
Budget End
2011-07-31
Support Year
Fiscal Year
2007
Total Cost
$377,067
Indirect Cost
Name
University of Utah
Department
Type
DUNS #
City
Salt Lake City
State
UT
Country
United States
Zip Code
84112