Katso is an endangered minority language spoken in a single farming village in China's Yunnan Province. The speakers are descendants of Mongol troops who conquered the region in the 13th century under Kublai Khan. The language evolved from extensive contact between the Mongols and other local minorities such as the Bai and the Yi, and is thus considered a mixed language. Today it is classified as variant of the officially recognized Yi language (Tibeto-Burman) although the two are mutually unintelligible. The 5000 speakers are under increasing pressure to speak Mandarin since that is the only way to pursue an education or find employment outside the village. Consequently, many children now learn Katso as a second language. It is therefore crucial that Katso be documented while it is still vibrant and in daily use.

The goals of the project are to create a comprehensive corpus of Katso and write a detailed grammar of the language. The corpus will consist of digital audio and video recordings accompanied by transcriptions with linguistic annotation. The language captured will include a wide variety of discourse types, such as traditional narratives, personal anecdotes, conversation, ceremonies, idioms, jokes and songs. Archived with both the speaker community and at a professional language archive, the corpus and the resulting grammar will create a permanent record of the language for future use by villagers and scholars alike. The project will considerably expand our knowledge of this unique language and provide the foundation for future scholarly investigations into fundamental questions on mixed languages and language contact, historical change, and language classification. The project will benefit the Katso community. Materials such as recordings of traditional songs and stories will be made available, and interested speakers will be trained and mentored. In addition, workshops on language documentation will be offered to interested students at the Yunnan University of Nationalities.

Project Report

Katso is an endangered minority language spoken only in the farming village of Xingmeng, in Yunnan Province in the People’s Republic of China. The speakers are ethnic Mongols who are descended from troops that Kublai Khan brought to the region in the 13th century. The language, however, is no longer related to Mongolian, but is much closer in structure to the neighboring Yi languages, although how it evolved is not well understood. The purpose of the two-year project was to document and preserve Katso, which was achieved through two main activities: (1) the creation of an annotated corpus of audio and video recordings, including natural and spontaneous speech, during a year of fieldwork in Xingmeng; and (2) following the fieldwork year, writing a comprehensive grammar of the language, which serves as the Co-PI’s doctoral dissertation. The project requested support from the NSF to fund part of the year of fieldwork in China (July 2012 through February 2013). The primary goal of the funding period was the creation of an extensive corpus of the endangered Katso language. The corpus consists of 46 hours of recorded material, featuring 55 native speakers who represent a cross-section of village demographics and provide a variety of pronunciation and speaking styles. Within the 35 hours of audio recordings, there are 9 hours of elicited words and phrases, 10 hours of comparative pronunciation data, and 16 hours of spontaneous language which include traditional stories, personal anecdotes, conversation, instructions, idioms and songs. The 11 hours of video primarily focus on demonstrating and talking about traditional activities, such as basket weaving, sewing, constructing straw stools, playing musical instruments, singing and dancing. Approximately 25 hours of the corpus have been transcribed to date, and this work continues. A digital photography archive was also created to capture life in the village. Containing approximately 1500 photos, the collection features daily activities, holiday celebrations and ceremonies, plus traditional architecture, clothing and tools. In addition, because Katso has no writing system, an orthography was devised to allow native speakers to transcribe the recordings. The second year was spent writing the grammar, a description of all the elements of the language and how they work systematically together to convey the complexity of human life. Since many grammatical structures are only apparent in natural speech, such as conversation, a discourse-based study of the corpus has yielded a richer and more detailed analysis than is possible with simple elicitation methods. As a result, the grammar presents a more detailed view of the language and its structure than was previously known. There are six chapters which cover every aspect of Katso, from its sound system to the way complex clauses are formed. All told, the grammar will contain an estimated 700 pages (single-spaced) of linguistic analysis, including more than 1100 examples of Katso words and phrases. In addition, there are two important appendices containing source material – a 2000-word glossary of Katso and excerpted transcripts from the corpus. It is anticipated that the grammar will be completed by the end of 2014. Both the corpus and the grammar considerably expand our knowledge of this unique language, and set the stage for a number of longer-term projects. First, it preserves the language for the Katso community. All of the materials created are being shared with the village, where they may serve as the basis for future language maintenance and cultural preservation programs. Second, because of its unusual origin, Katso offers valuable insight into the linguistic questions of language contact, historical change and language classification. Without a comprehensive record of the language, however, no thorough investigation of these issues could take place. The corpus and the grammar now provide the data needed for future research into these questions. In addition, as the first in-depth work on Katso in English, the project also makes data available for the first time to linguists, anthropologists and historians outside China.

National Science Foundation (NSF)
Division of Behavioral and Cognitive Sciences (BCS)
Standard Grant (Standard)
Application #
Program Officer
Shobhana Chelliah
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Barbara
Santa Barbara
United States
Zip Code