Sino-Tibetan (ST), comprising Chinese on the one hand and the hundreds of Tibeto-Burman (TB) languages on the other, constitutes one of the great language families of the world, with well over a billion speakers. Many languages of the family are poorly attested and endangered, and the relationships among even the better known ones remain controversial. STEDT (the Sino-Tibetan Etymological Dictionary and Thesaurus project) began at U.C. Berkeley in 1987 with a twofold goal: (1) to create an etymological dictionary reconstructing the meaningful spoken units of the ancestor languages at major taxonomic levels (e.g., Proto-Kuki-Chin, Proto-Lolo-Burmese, Proto-TB, Proto-ST), paying due attention to typological and phonological plausibility; and (2) to produce a semantically based historical thesaurus, classifying the reconstructed roots by meaning, and recognizing phonosemantic variation at all time depths.
The backbone of STEDT research is an online MySQL database system with two main parts: (1) A lexical database containing nearly a million records of disparate types, for hundreds of languages and dialects, drawn from some 500 sources. (2) A database of reconstructed protoforms including some 3,000 roots and root variants. Distinguishing true cognates from borrowings is especially difficult in this complex linguistic area where multilingualism is pervasive. Together with work by specialists in other language families of southern China and Southeast Asia (Tai-Kadai, Hmong-Mien, Mon-Khmer, Austronesian), STEDT research is helping to decide larger questions about the interrelationships among these great families.
During the first year of the new grant period, a new user-friendly browser-based query interface will be created and made available to the public. The second and third years will be devoted to the creation of an online "collaboratory", to solicit new ST comparative/historical data and findings from linguists worldwide. Visit STEDT on the web at http://stedt.berkeley.edu.
The Sino-Tibetan language family, comprising Chinese and hundreds of Tibeto-Burman languages and dialects, was arguably the most important language family in the world for which an etymological dictionary had not been produced. The Sino-Tibetan Etymological Dictionary and Thesaurus ("STEDT") project, begun in 1987 and continued without interruption until 2014, has filled this gap in an original and intellectually satisfying way. The originality of the project largely resides in the fact that the roots reconstructed for Proto-Tibeto-Burman and its sub-branches are presented not only in terms of their phonological shape, but also according to their semantic content. To achieve these results, we have created customized software capable of dealing with massive amounts of data from disparate sources. Much of our effort during the final few years of the project has been devoted to assuring the integrity and longevity of our findings, and their swift dissemination to the scholarly community and the interested public. We have welcomed feedback from specialists in the many subgroups of the Sino-Tibetan family, and have been pleased to note that many scholars (both established experts and newly minted researchers carrying out fieldwork in China, the Himalayan region, and peninsular Southeast Asia) have benefited from the use of our database and etymologies. More specifically, we have created a very large database of forms from hundreds of Tibeto-Burman languages and dialects, extracted from both published and unpublished sources, developing specialized database tools for the comparative historical analysis of this family. Based on this database, we have reconstructed thousands of roots for Proto-Tibeto-Burman and its subgroups, comparing them to Chinese etyma where appropriate. We have traced the semantic associations of these roots and the meaning changes they have undergone through time. In the process, we have refined the subgrouping of the vast and disparate Tibeto-Burman language family. An important outcome of this research has been to clarify the nature of the relationship between Chinese and Tibeto-Burman, which have hitherto been regarded as the two major branches of the great Sino-Tibetan family. Our ultimate goal has been to elevate Tibeto-Burman and Sino-Tibetan historical linguistic research to the point where it achieves parity with the work that has been done in other better-known language families (Indo-European, Dravidian, Semitic, Bantu, Austronesian, etc.). We have exerted maximum care in refining our reconstructions, accepting many of them without change, revising others, and rejecting a small proportion of them outright. It has frequently happened that new data have led us to combine etyma previously considered separate, or conversely, to split up a previously reconstructed root into two or more separate etyma. In the course of our work, we have put the study of proto-variation at the forefront; since no modern human language is free from variation at all levels of structure, it must be assumed that reconstructed proto-languages also display such patterns. We have developed a conceptual framework for categorizing and evaluating variational phenomena. The STEDT project has led to a number of important publications, including a series of ten monographs, and a Handbook of Proto-Tibeto-Burman (2003, xlii+752 pp.). Most importantly, a large two folio volume printed version of the Dictionary-Thesaurus, including all the etymologies and their supporting forms, will shortly become available at a reasonable cost. As an additional benefit from our research over nearly three decades, a number of scholars in other disciplines, including anthropologists, cultural historians, and computer scientists, have been stimulated to include historical linguistic data in their work.