Societal benefits of automatic speech recognition (ASR) technology are beginning to accrue, thanks to creative new devices and applications, such as voice-enabled search on mobile devices, in-car functions via voice commands, and educational software for language learning. Application developers, however, have a difficult time mastering the complexities of ASR technology. Academic research who use ASR, e.g. for studying human-computer dialog or language acquisition by children, also have a similarly significant barrier to entry. Finally, ASR researchers themselves need state-of-the-art baseline systems on which to make further improvements, which is a daunting task for most academic and even some industry groups. There is thus a strong demand for an ASR toolkit that is freely available, easy to use, state-of-the-art, and kept up to date with new advances in ASR technology. The Kaldi speech recognition toolkit, whose development was partially supported by past NSF awards is (i) freely available via http://kaldi.sourceforge.net, (ii) provides easy-to-use recipes to develop high-performing ASR systems for a number of widely used datasets, and (iii) is being adopted by hundreds of researchers to fulfill the aforementioned needs. Keeping Kaldi up-to-date and providing advice and technical support to Kaldi users is therefore becoming a crucial enabler of the research of faculty, students and developers in a variety of academic disciplines and industrial sectors.

This CISE research infrastructure project seeks to enhance and maintain the Kaldi speech recognition toolkit. Johns Hopkins University researchers, who first created Kaldi, will further develop its capabilities to better enable future research, both in core ASR and ASR applications. They have recently added support for deep neural network training and online (real-time) decoding. Specific new features that will be developed and added to Kaldi include improved voice activity detection, a faster decoder, support for recurrent neural net language models, a more flexible framework for experimenting with deep neural networks, and related capabilities such as recurrent neural net acoustic models. The investigators will publish their know-how and disseminate these enhancements through scientific conferences and workshops frequently attended by Kaldi users, and will continually solicit their feedback about future enhancements through discussion forums and conference participation. They have also already set up a mailing list and a discussion forum for Kaldi users to post technical questions and exchange solutions to commonly encountered problems. User support will also be provided by this project via these on-line discussion lists and developers forums.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
1513128
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2015-07-01
Budget End
2019-06-30
Support Year
Fiscal Year
2015
Total Cost
$839,999
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218