Today, there are more than 6,000 living languages in the world. It is widely agreed among linguists that human languages share substantial similarity at all the levels of linguistic structure. The study of this connection has made possible major discoveries about human communication: it has revealed the evolutionary history of languages, facilitated the reconstruction of proto-languages and led to understanding language universals. The goal of cross-lingual learning is to capitalize on the deep connection between human languages to improve automatic language processing. This exploratory research effort focuses on the use of hierarchical Bayesian models that jointly induce linguistic structure for each language and at the same time identify cross-lingual correspondence patterns. The cross-lingual learning is studied in several tasks ranging from morphological to syntactic analysis.
The expected benefits of this approach are three fold. First, the performance of cross-lingual learning could yield substantial improvement over state-of-the-art unsupervised approaches across a range of tasks. Second, cross-lingual learning is applicable to hundreds of human languages with no annotated resources which are currently out of reach for existing text processing methods. Finally, tools developed in the course of this project will provide powerful comparative analysis methods for researchers in fields such as linguistics, history, and anthropology.