The world's linguistic diversity is diminishing at an alarming rate, and there are not enough resources (trained field linguists or funding for them) to document all the endangered languages before they are gone. Thus there is a critical need for software tools to support the efficiency of field linguists. This project will develop software tools to assist in the documentation of endangered languages by merging two types of resources: Collections of linguistic examples curated by linguists and a cross-linguistic computational grammar resource, called the Grammar Matrix. The result will be a system for creating machine-readable, or implemented, grammars from data collected and annotated by field linguists.

Implemented grammars can contribute to endangered language documentation in several ways: The grammars themselves provide a very rich resource, allowing linguists to explore analyses at a level of precision not usually achieved in prose descriptions. Furthermore, implemented grammars can be used to create treebanks, that is, collections of utterances associated with syntactic and semantic structures. The process of creating the treebank can provide important feedback to the field linguist about aspects of the linguistic data not covered by current analyses. The resulting treebanks can be used to create further computational tools and are also a rich source of comparable data for qualitative and quantitative work in linguistic typology, grounding higher-level linguistic abstractions in actual utterances in a computationally tractable fashion.

While building an implemented grammar is typically not within the scope of a field linguistics project, field linguists do routinely create collections of examples of glossed, translated text (called "IGT"), which encapsulate the result of extensive linguistic analysis. This project will further develop computational methods for extracting typological information from IGT like those pioneered by the RiPLes project (Xia & Lewis 2007, Lewis & Xia 2008) and combine that information with the cross-linguistic resource produced by the Grammar Matrix project (Bender et al 2002, 2010) to create implemented grammars for endangered languages.

The Division of Information & Intelligent Systems of the Directorate for Computer & Information Science & Engineering is funding this award as part of its commitment to support the development of computational tools and methods for the documentation of endangered languages.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Type
Standard Grant (Standard)
Application #
1160274
Program Officer
Colleen M. Fitzgerald
Project Start
Project End
Budget Start
2012-09-15
Budget End
2015-06-30
Support Year
Fiscal Year
2011
Total Cost
$228,071
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195