Audio and video data from understudied languages is useful to linguists, anthropologists, educators, and computer scientists interested in visual action extraction, speech technology or software localization. Terabytes of such data exist, having been collected in large amounts by documentary linguists since the advent of easy digital recording via handheld devices. As records of vanishing languages and cultures, video and audio records are far richer and more captivating than paper records, but they need to be indexed and transcribed so that they reach their full potential as research tools. The current project, AARDVARC (Automatically Annotated Repository of Digital Audio and Video Resources Community) will address the problem of untranscribed, and therefore unavailable, documentation of understudied languages by building an interdisciplinary community of linguists, anthropologists, and computer scientists to share knowledge and collaborate on the specification of a repository and suite of tools to facilitate transcription. It will provide for two workshops and a symposium to design a "take one leave one" repository and to explore recent advances in speech and video processing that will allow anthropologists and linguists to break the 'transcription bottleneck' for language data. Even partial automation will greatly facilitate the work of the analyst and dramatically increase the amount of transcribed audio and video available to researchers in multiple disciplines.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Type
Standard Grant (Standard)
Application #
1244713
Program Officer
William Badecker
Project Start
Project End
Budget Start
2012-09-15
Budget End
2015-02-28
Support Year
Fiscal Year
2012
Total Cost
$84,982
Indirect Cost
Name
Eastern Michigan University
Department
Type
DUNS #
City
Ypsilanti
State
MI
Country
United States
Zip Code
48197