Just a few years ago, the majority of our computational input was confined to the traditional computer keyboard. Now, with the advent of ubiquitous computational devices in our hands, pockets, televisions, cars, and glasses, more fluid, natural, and less distracting methods of input are desirable. Besides the inherent limitations of the traditional keyboard, a more fundamental cost is the number of injuries sustained by repeated and long-term keyboard use. Carpal Tunnel Sydrome (CTS) alone affected nearly 5 million US workers in 2010. CTS and Cubital Tunnel Syndrome (CBTS) together account for $1 of every $3 spent for workers? compensation. Besides these economic costs, these syndromes severely limit the ability of sufferers to access computational technology, locking them out of professions such as computer programming.
This project aims to develop spoken language interfaces for computer programming. The immediate goal is to create a spoken language dictation system for the popular Java programming language, which removes the need for the programmer to dictate difficult-to-verbalize syntactic elements such as parentheses, brackets, punctuation, and word casing. Instead, the system will employ stochastic models to infer the intended program with high fidelity from the ambiguous speech stream of the user. This project lays the groundwork for a more general framework for relating ambiguous natural human language to the various formal languages and systems that drive the functioning of the computer.
The key technical innovations of this project lie in the development of stochastic models for computer programs. Such models have met with much success in recent years for inferring the structure, meaning, and translations of human languages. While traditional programming languages are designed as deterministic grammars, any speech input in a more natural human language idiom will perforce involve ambiguity. This ambiguity may only be resolved by developing accurate and predictive probability models over computer programs. The models to be explored in this project include traditional n-gram language models as well as syntactic language models that make use of the programming language's grammar in order to more accurately assign likelihood to various interpretations of the speech input. In the long term, this research lays the groundwork for the development of robust speech toolkits for a wide variety of computational languages, such as domain-specific languages for cars and entertainment devices and database query languages.