Advances in speech engineering now allow audio to be transcribed as text, even for languages for which there are few computational resources. Automating text transcription for more languages allows public, community, and researcher access to previously inaccessible materials. This project uses several thousand hours of radio broadcasts in an under-resourced language as a test case to improve rapid audio-to-text development techniques, which are applicable to any language. The project allows speech engineers to apply technology to new languages, to learn about the characteristics of new languages and their impact on speech recognition performance, and how to overcome them with the goal of building better speech recognition systems. It also enables communities to preserve their language, distribute tools and data, and overall, improve the current extreme resource limitations of their language. The project encourages students to work and think across the fields of speech engineering, linguistics and journalism.
In this EAGER project, the Uyghur language (ISO 639-3: uig), a severely under-resourced Turkic language of Xinjiang in Central Asia with about 11 million speakers, is used to test the rapid development of an Automatic Speech Recognition (ASR) system with the long-term vision of creating web-based speech and language services including pronouncing dictionary generation, audio and text data archiving, and part-of-speech tagging. The project is exploratory because the language is devoid of computationally tractable resources, yet bootstrapping through a related language (Turkish) promises rapid ASR development. The project can serve as a model for such development for any language, large or small, and is potentially transformative -- first because so many of the world's languages are like Uyghur in having few available computational resources., and second because so many documentary linguists still rely entirely on non-automated methods.