The technology developed through this project is a specific way of using speech recognition to "trigger" required and necessary action from a computing system, using only speech instead of push-to-talk buttons. Although speech recognition software is widely available, this technology has the potential to advance the industry. The project team has worked on this technology for over a decade. The proposed system will be used as a software tool-kit for software developers developing applications that fall into one of two categories: 1) the access to the system is restricted or; 2) access to the system is prohibited.

This software tool-kit has the potential to make an impact in helping directly or indirectly impaired users such as quadriplegics, surgeons, operators wearing HAZMAT suits, remotely operated vehicles, etc., where the same operation would be very difficult and/or very expensive to perform or not possible at all.

Project Report

Intellectual Merit & Broader Impact Voice recognition technology continues to improve, but one task is still very difficult: using voice commands to get the attention of an electronic device or computer. This task is referred to as Wake-On-Word. The challenge is that the device must listen to hours of irrelevant conversation, noise, and even sound tracks from adjacent TV’s, without triggering, but must trigger infallibly when a wake-up-word is spoken as a command by any human. The state-of-the-art is to use unusual phrases, such as "OK, Google," for the wake-up-word. The Principal Investigator, Dr. Veton Kepuska, has researched a general solution for the wake-on-word problem for nearly two decades. His present implementation supports any word, in any language, by any normal speaker, with good rejection of false positives, and very low false negatives. Although there are a handful of voice command applications on the market currently, they are only offered in embedded devices, or as part of an existing product. Zëri plans to develop an API to enable application developers to program their own custom WUW’s, and then install them into any application that is speech-recognition capable. Zëri’s API will give developers full control over the integration of the voice command algorithm and allow them to create new capabilities for existing products on the market. As part of the NSF i-Corps program, Zëri pursued customer validation interviews. These interviews uncovered a wide range of applications for the technology, but more importantly, the interviews confirmed a practical channel to market. Application developers prefer to test new software products, before they commit to incorporate those new products into their applications. Therefore, Zëri is in the process of deploying a web site that includes a "Freemium" product trial. Users can try the software for free. They can then purchase limited-use licenses for nominal fees. Once the application developer is comfortable with Zëri's product, they can purchase enterprise licenses for high-usage applications. Cloud-based services are becoming very common, especially in mobile application development, and for rapid prototyping. The software interfaces to cloud-based services are typically standardized, and easily supported across a variety of client platforms. Therefore, developers are able to focus on the unique business logic of their application and less on integration issues between software components. By providing industry standard HTTP RESTful services, Zëri’s customers will be able to fully focus on the value that hands-free voice commands add into their applications. Zëri will provide native libraries for a limited number of supported platforms, and cloud-based services for applications that can rely on an active Internet connection. An example of a customer using a native library may be a car manufacturer that is interested in providing voice commands in their vehicles. The car manufacturer could use the native library integrated into their vehicles to avoid requiring a constant connection to the Internet and Internet service charges. On the other hand, mobile application developers may opt to use the cloud-based services because their users’ platforms vary in processing power and a reliable Internet connection can typically be assumed. The main advantages for customers of the cloud-based service is platform independence and being able to provide voice commands on devices that would have previously been deemed underpowered to perform speech recognition. The i-Corps interviews also demonstrated that large corporations, such as military contractors, are comfortable with Zëri’s sales model, and that their discovery of the Zëri software is possible through visibility on a very small number of blogs and programmer forums. Thus the Zëri sales process is focused on 1) visibility in the industry, 2) support of new customers, and 3) conversion of strong customers to enterprise licenses. Zëri contracts with Cloud service providers, such as Microsoft or Amazon. This approach eliminates the need to acquire and operate computer servers. Next Stages The newly formed Zëri, Inc. is currently in the effort of developing its first product. As we continue our development efforts to make the technology production-ready we are seeking funding. We will be applying for grants and look forward to be generating revenue by early 2015.

Agency
National Science Foundation (NSF)
Institute
Division of Industrial Innovation and Partnerships (IIP)
Type
Standard Grant (Standard)
Application #
1312718
Program Officer
Rathindra DasGupta
Project Start
Project End
Budget Start
2013-01-15
Budget End
2014-06-30
Support Year
Fiscal Year
2013
Total Cost
$50,000
Indirect Cost
Name
Florida Institute of Technology
Department
Type
DUNS #
City
Melbourne
State
FL
Country
United States
Zip Code
32901