This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).

This project takes on two problems: (1) deciphering ancient texts using computers, and (2) training automated language translation systems without using parallel texts. Statistical language processing software has played little role to date in the analysis of ancient texts, where data is limited and human intuition has so far ruled. Data for automated language translation is more plentiful, and research has made great strides in the 21st century. However, researchers are addicted to training on large parallel texts, which are limited for the diversity of languages and domains for which people need automated translation.

The project develops unsupervised methods that compensate for the lack of parallel data, using alternative sources of linguistic knowledge. For ancient languages, these sources include known languages as decipherment targets, capitalizing on tight connections within a language family. In translation, large quantities of untranslated data are exploited to induce strong bilingual connections. Formulating these tasks in a decipherment framework brings powerful cryptographic theory and algorithms to bear. Such theory also helps estimate expected translation accuracy given fixed data resources, and gauge whether a lost language is decipherable, given a fixed amount of script.

Computational analysis of ancient scripts offers a better understanding of ancient cultures, and unsupervised techniques construct language connections of great interest to historical linguists. Applying such techniques to automated language translation offers the chance to bring many more language pairs and domains to the population at large.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0904684
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2009-07-15
Budget End
2013-06-30
Support Year
Fiscal Year
2009
Total Cost
$1,200,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089