Today, there are more than 6,000 living languages in the world. It is widely agreed among linguists that human languages share substantial similarity at all the levels of linguistic structure. The study of this connection has made possible major discoveries about human communication: it has revealed the evolutionary history of languages, facilitated the reconstruction of proto-languages and led to understanding language universals. The goal of cross-lingual learning is to capitalize on the deep connection between human languages to improve automatic language processing. This exploratory research effort focuses on the use of hierarchical Bayesian models that jointly induce linguistic structure for each language and at the same time identify cross-lingual correspondence patterns. The cross-lingual learning is studied in several tasks ranging from morphological to syntactic analysis.

The expected benefits of this approach are three fold. First, the performance of cross-lingual learning could yield substantial improvement over state-of-the-art unsupervised approaches across a range of tasks. Second, cross-lingual learning is applicable to hundreds of human languages with no annotated resources which are currently out of reach for existing text processing methods. Finally, tools developed in the course of this project will provide powerful comparative analysis methods for researchers in fields such as linguistics, history, and anthropology.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0835445
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2008-06-01
Budget End
2009-11-30
Support Year
Fiscal Year
2008
Total Cost
$56,297
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02139