Speech technology potentially allows everyone to participate in today's information revolution and can bridge the language barrier gap. Unfortunately, construction of speech processing systems requires significant resources. With some 4500-6000 languages in the world, traditionally speech processing is prohibitive to all but the most economically viable languages. In spite of recent improvements in speech processing, supporting new languages is a skilled job requiring significant effort from trained individuals. This project overcomes both limitations by providing innovative methods and tools for a users to develop speech processing models, collect appropriate data to build these models, and evaluate the results allowing iterative improvements.
Building on the existing GlobalPhone and FestVox projects, knowledge and data will be shared between recognition and synthesis such as phoneme sets, pronunciation dictionaries, acoustic models, and text resources. User studies are applied to indicate how well speech systems can be built, how well tools support their efforts, and what must be improved to create even better systems. This research increases the knowledge of how to rapidly create speech recognizers and synthesizers in new languages. Furthermore, archiving the data gathered on the fly from many native cooperative users will significantly increase the repository of languages and resources.
By integrating speech recognition and synthesis technologies into an interactive language creation and evaluation toolkit usable by unskilled users, speech system generation will be revolutionized. Data and components for new languages will become available to everybody improving the mutual understanding and the educational and cultural exchange between the U.S. and other countries.