HCC: High-Quality Compression, Enhancement, and Personalization of Text-to-Speech Voices

Kain, Alexander; Leen, Todd

Abstract

The vast variability of the human speech signal remains a central challenge for Text-to-Speech (TTS) systems. The objective of this research is to develop TTS technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics. The investigators are exploring an asynchronous interpolation model (AIM), which promises to provide for high-quality and flexible TTS. The core idea of AIM is to represent a short region of speech as a composition of several types of features called streams. Each stream is computed by asynchronous interpolation of basis vectors. Each basis vector is associated with a particular phoneme, allophone, or more specialized unit. Thus, the speech region is described by the varying degrees of influence of several types of preceding and following acoustic features. Using AIM, the investigators are also developing methods to optimally compress the acoustic inventories of TTS systems, given a size or a quality constraint, and to adapt the system to a new voice, given a few training samples. The system being researched forms a hybrid between traditional concatenative and formant-based synthesis, having advantages of both, resulting in a high-quality, optimized TTS system with voice adaptation capabilities. TTS has generally recognized societal benefits for universal access, education, and information access by voice. Our research will make it possible, for example, to build personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 0713617
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2007-09-01
Budget End: 2011-08-31
Support Year
Fiscal Year: 2007
Total Cost: $413,764
Indirect Cost

HCC: High-Quality Compression, Enhancement, and Personalization of Text-to-Speech Voices
Kain, Alexander Leen, Todd
Oregon Health and Science University, Portland, OR, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments