Natural language processing systems currently degrade when used outside of their training domains and languages. However, when text is analyzed along with translations into another language, the two languages provide powerful constraints on each other. For example, a syntactic construction which is ambiguous in one language may be unambiguous in another. We exploit such constraints by using multilingual models that capture the ways in which linguistic structures correspond between one language and another. These models are then used to accurately analyze both sides of parallel texts, which can in turn be used to train new, better, models for each language alone. Multilingual models are challenging because each language alone is complex, and the correspondences between languages can include deep syntactic and semantic restructurings. Focusing on syntactic parsing, we address these complexities with a hierarchy of increasingly complex models, each constraining the next. Our approach of multilingual analysis improves three technologies: resource projection, wherein tools for resource-rich languages are transferred to resource-poor ones, domain adaptation, wherein tools are transferred from one domain to another, and multilingual alignment, wherein correspondences between languages are extracted for use in machine translation pipelines. In addition to publishing the research results from this work, we also make freely available the multilingual modeling tools we develop.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0915265
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2009-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2009
Total Cost
$100,000
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704