Many recent advances in natural language processing (e.g., speech recognition and information extraction) are due to widespread use of finite-state automata. These automata probabilistically transform input strings into output strings, and they can be quickly assembled to tackle new jobs via generic mathematical operations like composition and forward application. However, these automata are a bad fit for many important problems that require syntax-sensitive transformations and large-scale re-ordering (such as language translation and summarization).

We are investigating tree automata as an alternative building block for new natural language systems. These automata walk over input trees and produce output trees. Fortunately, there is an extensive mathematical theory associated with these devices. These automata also fit many of the ad hoc models recently proposed in natural language research. However, there are several critical missing pieces: (1) how to design efficient computer science algorithms for generic tree operations, (2) how to design efficient machine learning algorithms for inducing tree automata and probabilities from linguistic data, and (3) how to use automata to accurately model problems in automatic language translation.

We expect this research to yield several benefits. Many researchers who currently use finite-state tools will switch to more powerful tree-based automata and obtain more accurate language processing systems. Also, tree automata will enable better understanding of how to model language translation more deeply and accurately, and -- importantly -- in such a way that syntactic and lexical translation knowledge can still be acquired fully automatically by the machine from text corpora.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0428020
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2004-09-15
Budget End
2009-08-31
Support Year
Fiscal Year
2004
Total Cost
$1,250,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089