Three related languages in Bolivia, Peru and Chile show dramatically different levels of health: Aymara, spoken by about a third the population of Bolivia and sizable representation in the other countries, is in fairly good shape. Jaqaru is spoken by a few thousand people in and around Tupe, Yauyos, Lima, Peru and by large numbers in the cities of Lima, Huancayo, Chincha, and Cañete; its situation is precarious but not desperate. Kawki is spoken only by a very few people in and around Cachuy, Yauyos, Lima, Peru and is clearly a dying language. The only time we will be able to understand these last two languages, and their relationship with Aymara, is the present. With National Science Foundation support, Dr. M.J. Hardman and colleagues Howard Beck, Elizabeth Lowe McCoy, Sue Legg and Dimas Bautista Iturrizaga will conduct a three-year project to transform a corpus of Jaqaru and Kawki materials consisting of 50 field notebooks of texts, corresponding audiotapes, 450 photographs and related linguistic data into an accessible, archived linguistic research database. The linguistic materials will be digitized, analyzed, parsed, edited, translated and entered into a database. A dictionary of the languages will also be created and made available electronically.

This project will make the field notes collected by Hardman over 50 years of linguistic field research in Peru available to the linguistic community. It also builds on the work done from 2004-2007 for the "Aymara on the Internet" Project funded by the U.S. Department of Education. The project will preserve and make available the texts, dictionary and grammar of two highly endangered Andean languages for linguistic research and for the use and future collaboration of heritage. The linguistic material will be translated from Jaqaru and Kawki into Spanish and English. It is anticipated that the government of Peru will want to utilize this project for broader dissemination for bilingual education and language preservation purposes. The broad scope of the linguistic material makes it an attractive sample for such widespread distribution. Its further use and elaboration will serve as a model for other projects.

Project Report

An Accessible Linguistic Research Database of the Endangered Jaqaru and Kawki Languages NSF Project Team: Drs. M.J. Hardman, Howard Beck, Sue Legg, and Elizabeth Lowe This project to preserve the endangered Jaqi languages of Peru is based on fieldwork conducted by Dr. M.J. Hardman, in collaboration with Dr. Dimas Bautista Iturrizaga, and represents the life work of these investigators in documenting the languages. Interviews taped by Dr. Hardman from 1959-1977 with native speakers of the Jaqaru and Kawki languages born in the late 19th century and early 20th century in the mountains of Peru, were transcribed into 50 field notebooks. The texts include autobiographic accountings, fables, poetry (songs), historical accountings, descriptions of daily life, and descriptions of festivals involving communal labor among other topics. The digitalized data were entered in Lyra, an online object oriented database. The structure of the data files enables users to study the texts at the audio, story, phrase, and word levels. The texts, translated into Spanish and English are presented as phrases, divided/identified by morpheme, and then each morpheme is identified as to its individual definition and grammatical function. All 161 texts for Jaqaru have been fully analyzed with complete identification of all morphemes. The Kawki texts have been uploaded as phrases, translated and divided/identified by morpheme. Most of these 110 texts also have been analyzed giving the individual definition and grammatical function for each morpheme. The scanned notebooks, digitalized audio and parsed data have been uploaded to the Jaqi Language Archive site at the University of Florida Digital Library. Work will continue on the remaining Kawki phrases, and the files will be updated with new material. Currently, the data includes the following numbers of morphemes and phrases: Aymara base morphemes: 1031 Jaqaru base morphemes: 2404 Kawki base morphemes: 1500 Aymara phrases: 9229 Jaqaru phrases: 7942 Kawki phrases: 6835 In addition to the AILLA archive at the University of Texas www.ailla.utexas.org/site/welcome.html the data files are available at the following URLs for the digital archives at the University of Florida: Aymara photographs at: http://ufdc.ufl.edu/UF00103162/00002 Jaqi Language Metadata Archive at: http://ufdc.ufl.edu/UF00103162/00001 Unzipped audio files at: http://ufdc.ufl.edu/UF00103162/00003 Notebooks and audio, with photographs at: http://ufdc.ufl.edu/jaqi Jaqaru and morpheme level data at: http://ufdc.ufl.edu/UF00103162/00001/downloads A dictionary for Jaqaru and Kawki that provides English and Spanish translations is online at http://test.aymara.ufl.edu/dictionary.html. The dictionary site was provided as a first step in publishing the Jaqaru and Kawki data by request for native speaker use in bilingual education, and is open for public use. The dictionaries (which also include Aymara) enable users to browse an alphabetical list of all morphemes within each of the three Jaqi languages. Information available for each morpheme includes the Spanish and English gloss, class (root or suffix), part of speech, and a concordance of all words containing the morpheme. The information is primarily displayed in Spanish, but it can also be displayed in English. A pilot study entitled "Applying Computational Linguistics Tools to Analyze Grammatical Structures" was jointly conducted in 2011 by the University of Illinois at Urbana-Champaign (UI) and the University of Florida (UF). Using the SNoW (Sparse Network of Winnows), a high performance linear classifier which learns a linear function over the example feature variables, we used a sample of the data to explore three tasks: 1) Analysis and discovery of morpheme behavioral rules, 2) Syntax analysis, and 3) Shape verb analysis. Rules for morpheme behavior can be induced by looking at the influence of neighboring morphemes, and what influence is being exerted by other morphemes within the overall syntactic structure of the phrase. The highly annotated nature of our Jaqi database facilitates such an analysis (morphological structure has been annotated for thousands of words for all three languages in the Jaqi collection). In the Jaqi languages, syntactic relationships within noun and verb phrases and complete sentences are governed largely by suffixes markers and vowel dropping behavior. Formal rules are needed to build parsers that can automatically parse complete phrases. It may be possible to discover these rules in part by using the annotations that already exist in the archive. Shape verbs in the Jaqi languages determine classes of nouns by the shape of the object and way in which it is carried. Shapes in Jaqi are in movement (with a specific cognate suffix for stopping movement). We studied the categories of objects associated with which shape verbs by searching the archive to see what verb-object combinations actually occur. The results of these pilot studies warrant additional research to validate the applicability of computational linguistics for discovering formal grammatical rules in endangered languages. More texts can be added, both historical (at least another 50 notebooks exist) and current to expand research possibilities.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Application #
0754550
Program Officer
Shobhana Chelliah
Project Start
Project End
Budget Start
2008-07-01
Budget End
2012-04-30
Support Year
Fiscal Year
2007
Total Cost
$154,843
Indirect Cost
Name
University of Florida
Department
Type
DUNS #
City
Gainesville
State
FL
Country
United States
Zip Code
32611