Adapting a Text-to-Speech Synthesizer to Convey User Identity

Patel, Rupal

Abstract

This project will advance computerized speech synthesis methods so that they can better approximate the unique vocal characteristics of individual human speakers. Voice quality is unique to each individual and thus is inextricable from personality, self-image, and the perceptions of others. We can alter our voice to sound like others, attract attention, project confidence, convey authority, and to perform countless other functions. This flexibility of the natural voice is inconceivable using even state-of-the art text-to-speech (TTS) synthesis. While voice quality may not matter for many text-to-speech applications, it is essential for assistive communication aids which are meant to be an extension of the user. Over two million Americans have severe speech and motor impairments that require them to use an assistive communication aid, many of whom use TTS to speak on their behalf. The speech output options on commercially available devices are extremely limited. Moreover, the synthetic voices are not representative of the user along basic dimensions such as age, gender, rate of speech, and voice quality thus drawing unnecessary attention and detracting from the content of the spoken message as well as impeding social integration.

This project aims to harness the residual vocal control in the productions of individuals with severe speech impairment in order to adapt a text-to-speech synthesizer such that the resultant voice resembles that of the user. Conventional methods of voice morphing cannot be applied directly given the severity of the user's speech impairment. Recent empirical work suggests that children and adults with severe speech impairment retain the ability to control fundamental frequency, accent, rhythm, and speaking rate which are among the many acoustic cues that signal speaker identity. This research will leverage this preserved ability toward building an adaptive text-to-speech synthesizer that conveys the user's identity without degradation of intelligibility. Identity-bearing vocal cues will be elicited from children with severe speech impairment and used to adapt age and gender-matched concatenative synthetic voices using novel voice transformation techniques. Usability tests will be conducted to assess the impact of user identity adaptation on TTS intelligibility, naturalness, and acceptability, the results of which will provide insights for iterative design of TTS adaptation.

The research will have broader impact on users of assistive aids and able-bodied users of communication technologies that use speech synthesis. This project strives to make communication accessible and socially fulfilling by designing an enabling technology in which the line between system and user is blurred. The ultimate goal is to afford users of speech synthesis technology the same ownership and individuality as the natural voice. The interdisciplinary nature of this work will promote teaching, training and learning in computer science and in speech and hearing sciences.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 0712821
Program Officer: William Bainbridge

Project Start
Project End
Budget Start: 2007-07-15
Budget End: 2011-06-30
Support Year
Fiscal Year: 2007
Total Cost: $449,996
Indirect Cost

Adapting a Text-to-Speech Synthesizer to Convey User Identity
Patel, Rupal
Northeastern University, Boston, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments