User Adaptation of AAC Device Voices

Klabbers, Esther

Abstract

Augmentative and alternative communication devices with voice output (also known as Speech Generating Devices, or SGDs) enable individuals to speak by electronic means. Typical users of SGDs are individuals who have suffered from a stroke, traumatic brain injury, or who have neurodegenerative or neurodevelopmental disorders. In most cases, the user was able to speak previously, or had or still has intermittent speech. In these cases, the user's relatives and friends are familiar with the user's voice. An often expressed desire is for the SGD to sound like the user. However, typical SGDs do not mimic any characteristics of the user's speech;in fact, they typically have an extremely limited array of synthetic voices, and the prosodic patterns of these voices are not customizable. As a result, the synthetic voice is impersonal, which may be a factor in discouraging impaired speakers and their communication partners from using the SGD. To address this impersonal character of current SGDs, we propose to offer a system with a wide range of personal customization options, by making use of (1) a substantial number of synthetic voices to choose from;(2) customizable prosody;and, most important, (3) Speaker Mimicry (SM) technology to mimic the user. Phase I of this project established the feasibility of using SM technology to adapt an existing Text-to-Speech (TTS) synthesis system to mimic a specific target speaker, requiring only a small set of """"""""training"""""""" recordings to be made of the target speaker. Mimicry of the spectral aspects of the target speaker was achieved with two Voice Transformation (VT) technologies, one that required extremely few recordings that, moreover, did not need to be of high acoustic quality (hence, pre-morbid home videos could in principle be used);and a second one that required more and better-quality recordings, but also provided better results. Mimicry of the prosodic aspects of the target speaker (Prosody Mimicry, or PM) was achieved by estimating static and dynamic parameters of the target speaker's intonational and durational patterns, which were then incorporated into the TTS system. The deliverables of this Phase II STTR project consists of: (1) A complete SM-capable SGD, comprising an SM-capable TTS system and a built-in touch-screen Graphical User Interface (GUI) for user input, installed on a low-cost touch-screen dedicated """"""""netbook"""""""" (alternative keyboards or special input devices will also be supported);(2) Efficient software tools and processes to be used by BioSpeech personnel to compute the individual user data needed for the SM capability. The SM capability will have multiple options, depending on the availability, quantity, and quality of user recordings. The goals of this Phase II proposal are to develop a complete prototype of this product concept, and to co- develop and field-test the system with a group of SGD users. Moreover, we will show that, even with these unique features, the system can be made available at a far lower cost than most current SGDs, thanks to BioSpeech's complete ownership of the technology, to minimal ROI pressures, and to the availability of low-cost """"""""netbooks"""""""".

Public Health Relevance

Millions of Americans with impaired or absent speech communication ability rely on Augmentative and Alternative Communication devices with voice output (Speech Generating Devices) to communicate. A psychologically important feature that no currently available systems have is the ability to speak with the user's voice, i.e. the ability to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. Phase I of this project established the feasibility of using Speaker Mimicry (SM) technology to adapt an existing SGD to mimic a specific target speaker, requiring only a small set of """"""""training"""""""" recordings to be made of the target speaker;the goal of this Phase II proposal is to develop, further improve, and evaluate a complete prototype of this concept, and to deploy it in a limited release to a select group of individuals for in-the-field use.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute on Deafness and Other Communication Disorders (NIDCD)
Type: Small Business Technology Transfer (STTR) Grants - Phase II (R42)
Project #: 2R42DC008712-02
Application #: 7910803
Study Section: Special Emphasis Panel (ZRG1-BBBP-T (10))
Program Officer: Shekim, Lana O

Project Start: 2007-01-01
Project End: 2012-03-31
Budget Start: 2010-04-15
Budget End: 2011-03-31
Support Year: 2
Fiscal Year: 2010
Total Cost: $536,871
Indirect Cost

Institution

Name: Biospeech, Inc.
Department
Type
DUNS #: 144815151

City: Lake Oswego
State: OR
Country: United States
Zip Code: 97034

Related projects


NIH 2011 R42 DC	User Adaptation of AAC Device Voices Klabbers, Esther / Biospeech, Inc.	$410,194
NIH 2010 R42 DC	User Adaptation of AAC Device Voices Klabbers, Esther / Biospeech, Inc.	$536,871

Comments

Be the first to comment on Esther Klabbers's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: