NovaSpeech proposes to develop an innovative perceptually-oriented hybrid approach to unconstrained speech synthesis for generating individualized, customized voices of either gender and any age. The system will provide human-sounding, intelligible, and mimetic speech, yet have small storage requirements, be able to support the cost-efficient addition of new voices, and be suitable for implementation on virtually any hardware platform. As a result, the technology will be well-suited to virtually any unlimited vocabulary synthesis application, but be of special benefit to speech-impaired individuals, who have a particularly great need for natural-sounding, individualized voices on a broad range of devices. With the hybrid system, individuals who know they will lose their voice due to illness or surgery will be able to cost-efficiently capture and utilize their pre-injury voice in a voice output communication aid; and all speech-impaired users will be able to obtain reliable, appropriate, individualized voices that can grow with them as they mature and age. No existing synthesis approach meets these needs, with each type of technology trading off one desirable property for another, be it low storage requirements for natural voice quality, or human voice quality for flexibility. The hybrid approach overcomes these limitations by integrating, in a novel and principled way, the best features of two well-known synthesis techniques: corpus-based waveform concatenation and rule-based formant synthesis. Capitalizing on a number of important perceptual principles, the system will prestore only a small number of intrinsic units, such as stressed vowels, from the target speaker, and synthesize other, adaptable units by rule. Thus with only a small prestored speech corpus, and a common set of rules across voices, it will produce speech that sounds like the intended speaker. In its proposed Phase II project, NovaSpeech will develop a complete hybrid prototype text-to-speech (TTS) system for eight voices in General American English, including male and female children, adults, and elderly adults (the base speakers), as well as for two speakers who know they will lose their ability to speak naturally as a result of future laryngectomies. Year 1 will be focused on exploring possible system architectures; implementing rules for adaptable units; and exploring through perceptual experiments possible strategies for storing and selecting intrinsic units. Year 2 will be focused on implementing a fully functional hybrid TTS prototype for the six base voices. By month six of year 2 at the latest, the company will verify the ability to quickly add new voices by implementing the voices of the laryngectomy patients, providing them with functional systems for their voices, and obtaining feedback from them and those who know them about the quality of the voices and system features. The ultimate objective of the hybrid project is to improve the naturalness and mimetic quality of speech synthesized from unrestricted symbolic input, with the particular goal of enhancing the utility and flexibility of voice output communication aids for speech-impaired individuals. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Institute on Deafness and Other Communication Disorders (NIDCD)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
5R44DC006761-03
Application #
7271981
Study Section
Special Emphasis Panel (ZRG1-BBBP-B (10))
Program Officer
Shekim, Lana O
Project Start
2004-04-01
Project End
2010-07-31
Budget Start
2007-08-01
Budget End
2010-07-31
Support Year
3
Fiscal Year
2007
Total Cost
$376,850
Indirect Cost
Name
Novaspeech, LLC
Department
Type
DUNS #
144511263
City
Ithaca
State
NY
Country
United States
Zip Code
14850