Language data is central to the research of a large social sciences community - not only linguists, but also anthropologists, archaeologists, historians, and sociologists interested in the culture of indigenous peoples. Members of this research community are currently faced with two urgent situations: the number of languages in the world is rapidly diminishing while the number of initiatives to create digital archives of language data is rapidly multiplying. The latter might seem to be an unalloyed good in the face of the former, but there are two ways things may go wrong without adequate collaboration among archivists, linguists, and language engineers. First, a common standard for the digitization of linguistic data may never be agreed upon. And the resulting variation in archiving practices and language representation would seriously inhibit data access, searching, and cross-linguistic comparison. Second, standards may be implemented without guidance from the people who best know the range of structural possibilities in human language-descriptive linguists who have done fieldwork on poorly described languages.

If digital archives of language data and documentation are to offer the widest possible access and to provide information in a maximally useful form, consensus must be reached about certain aspects of archive infrastructure. As the largest linguistic organization in the world and the central electronic publication of the discipline, The LINGUIST List <www.linguistlist.org> is organizing a collaborative project with a dual objective: (1) to preserve endangered languages data and documentation and (2) to aid in the development of infrastructure for linguistic archives. One outcome of the project will be a LINGUIST List digital archive housing data from 10 endangered languages. But the focus on infrastructure will produce other, equally important results. In the first place, The LINGUIST archive will function, not only as a repository, but also as a 'showroom of best practice.' The archive will offer endangered languages data marked up and catalogued according to community consensus about best practice; furthermore, the archive will disseminate reference material delineating best practice and software tools supporting it. Another outcome will be the establishment on the LINGUIST List site of a central metadata server for the discipline; this server will organize information on all the language-related resources residing at distributed sites, not just endangered languages information alone. Other infrastructure-related outcomes include (1) the involvement of the linguistics community in establishing best practice, (2) the widespread dissemination of the resulting recommendations, and (3) the hands-on training of a substantial core of linguists and language archivists in the implementation of the guidelines. Although the data collection efforts will focus initially on endangered languages, the metadata server, the recommendations for best practice, and the distribution of supporting software will have a significant impact on all empirical research in linguistics. The project will thus add value to many other language-related projects currently planned or underway.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Application #
0729644
Program Officer
Joan Maling
Project Start
Project End
Budget Start
2007-03-15
Budget End
2008-06-30
Support Year
Fiscal Year
2007
Total Cost
$48,674
Indirect Cost
Name
Eastern Michigan University
Department
Type
DUNS #
City
Ypsilanti
State
MI
Country
United States
Zip Code
48197