This project will analyze and collect new information about syntax and texts in two indigenous languages of northern California, Karuk [kyh] and Yurok [yur]; each language has a handful of (elderly) speakers. Existing scholarly literature carefully describes the pronunciation and word formation patterns of both languages, but discourse and word order patterns have attracted relatively little attention. This project will investigate these topics through work with speakers, analysis of existing published and unpublished texts (collected by several linguists over the 20th century), and creation of syntactically annotated text corpora for both languages.

This project is important scientifically for three reasons. First, while indigenous California was linguistically the densest and most diverse area of its size in the western hemisphere, its languages have mostly not been analyzed in syntactic detail. Second, over the last 1000-2000 years there has been intensive cultural interaction between neighboring Karuk and Yurok, but its linguistic effects have not been carefully studied. Language contact effects tend to be especially conspicuous in syntax; preliminary work suggests that the syntactic correspondences between these two languages will cast new light on the mechanisms of contact-induced language change. A third reason is methodological: it is highly unusual to build syntatically annotated corpora of "small" languages; the results are expected to show researchers throughout the world that this approach can yield interesting syntactic generalizations.

More broadly, this project will be of direct benefit for Karuk and Yurok language learning. Learners in both communities have a good knowledge of vocabulary, pronunciation, and word formation. But they are less familiar with the distinctive syntactic patterns of their heritage languages, and naturally tend to use English patterns. This research will allow for the writing of grammatical descriptions that are useful for learners and teachers, emphasizing aspects of each language that differ from English, and reinforcing linguistic and cultural revival.

Project Report

This research project focused on Karuk, an endangered indigenous language of northern California. Projects goals included (1) documentation, description, and analysis of syntax, a major area of the language that had not been thoroughly studied in previous work; (2) creating a corpus of new and prior text material in the language; and (3) data annotation to parse the corpus morphologically and to create a "treebank" or syntactically annotated text corpus. Treebanks are common for well-studied languages with vast text resources, like English, German, and Hindi, but ours is the first treebank created for an endangered indigenous language with no written tradition before the 20th century. Therfore, in addition to specific research results our work yields for Karuk, this treebank serves as an important proof of concept for other similar projects. We created over 200 hours of new documentary field recordings of Karuk, working with six fluent speakers in collaboration with several younger language teachers and advanced learners. Our documentation focused especially on syntactic patterns (how words are used in sentences, how simple clauses are formed, and how clauses are combined). Research included grammatical and vocabulary elicitation, stimulus-based prompts, and some monolingual elicitation sessions. All our work was recorded, and is organized in preparation for permanent archiving to ensure access by scholars and Karuk community members. We expanded our digital corpus of Karuk text material from about 800 sentences at the beginning of the project to nearly 6,000 sentences as of December 2014. These are from a variety of sources including traditional narratives, conversation, procedural texts, anecdotes, and linguistic elicitation sessions, and spanning a long time period from 1903 to 2014 (involving many different speakers and researchers over the years). The entire corpus is morphologically annotated; the internal structure of each word in the corpus is indicated and can be retrieved to see how word struture relates to syntactic and other patterns. We have also carefully annotated the corpus toward the goal of a Karuk treebank. Over 500 sentences are syntactically annotated; every word is tagged with information about head-dependency relations, syntactic status, and part of speech. To do this systematically we also created a set of guidelines for annotators, covering the major construction types and examples of common words and problems, which will be useful in projects on other languages as well. Broader impacts of our research have to do with science education and indigenous linguistic and cultural restoration. Karuk syntactic and semantic projects now feature in several large undergraduate classes at UC Berkeley, with at least nine advanced undergraduates undertaking independent work on the language. We hope this will contribute to a greater public engagement with indigenous language issues. We also participate regularly in Karuk language education programs by attending and contributing to classes in Yreka, CA, for grade school students, high school students, and community members. We provide curriculum support and guidance and have created some concrete pedagogical materials that we have distributed in person and through our website. All online resources available through our project (linguistics.berkeley.edu/~karuk) have been adapted with the interests of community members in mind; these resources include 165 texts that can be read in various display formats, depending on user interest, and a lexicon with about 7,000 entries. We have also worked with Karuk community members on orthography design, verb structure, word order, culturally significant lexical semantics, and the use of archival materials. (Note: The title of our project mentions Yurok as well as Karuk, but budget constraints meant that most actual research focused on the Karuk language.)

Project Start
Project End
Budget Start
2011-06-15
Budget End
2014-11-30
Support Year
Fiscal Year
2010
Total Cost
$154,425
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94710