VocaliD: Optimized Speech Corpora for Personalized Speech Synthesis

Patel, Rupal

Abstract

The human voice is a complex signal that conveys multiple aspects of one's identity including age, gender, ethnicity, size, and personality, among others. Yet, to date, users of augmentative and alternative communication (AAC) devices, screen reading technologies, and other text-to-speech applications have relied on a limited set of generic voices. VocaliD Inc.
aims to create custom crafted synthetic voices that reflect the end-user by combining the recipient's residual vocal abilities with an anatomically similar donor's speech database. The resultant voice sounds like the recipient in age, personality and vocal identity but is as clear and understandable as the donor. We have successfully integrated our research prototype into several AAC devices and have three beta users currently using the technology. Family members and users attribute increased AAC device usage in educational and social settings as well as improved self-esteem and quality of life to this innovation. Our process to date, however, has relied on an onerous procedure to collect a sufficiently comprehensive corpus of donor speech and an ad hoc process to elicit speaker-identifying cues from recipients. This Phase I SBIR project aims to make VocaliD's personalized voice technology a viable option for millions of users by standardizing and optimizing the donor and recipient voice collection processes. These advances are critical to transforming this work from lab prototype into a commercial venture. Our innovation is grounded in the source-filter theory of speech production, which divides speech into a source component (the vocal folds) and a filter component (the rest of the vocal tract) that are largely independent. Empirical evidence suggests that despite impaired filter modulation, individuals with speech impairment have residual control over source characteristics. Since source and filter characteristics both contribute to speaker identity, our key challenge is to extract as much identity information from limited amount of recipient vocalizations as possible and combine this with the speech clarity information from donor voices so as to create an authentic yet understandable transformed voice. Thus, this Phase I has two specific aims: 1) to determine the optimal number and composition of stimuli recorded by donors that will result in a sufficiently intelligible and naturl sounding concatenative synthesis voice, and 2) to determine a set of speaker identity cues that can be extracted from sparse vocalization samples produced by voice recipients. Our ultimate goal is to produce resultant voices that are acoustically and perceptually identified as belonging to the recipients. In the United States alone, there are over 2.5 million AAC users who need to be heard in their own voices; an additional 3-5 million individuals with visual impairment who could benefit from a personalized screen reader especially when composing written text; and several hundred million devices and applications in the `internet of things' that enable us to access information, communicate and interact via speech. VocaliD has the potential to give the gift of voice to all those who need and want it to enhance how they learn, work and play.

Public Health Relevance

VocaliD Inc. aims to create custom crafted synthetic voices that reflect the end-user by combining the recipient's residual vocal abilities with an anatomicall similar donor's speech database. The resultant voice sounds like the recipient in age, personality and vocal identity but is as clear and understandable as the donor. This Phase I SBIR project aims to make VocaliD's personalized voice technology a viable option for millions of users by standardizing and optimizing the donor and recipient voice collection processes.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute on Deafness and Other Communication Disorders (NIDCD)
Type: Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #: 1R43DC014607-01
Application #: 8904180
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Shekim, Lana O

Project Start: 2015-06-01
Project End: 2016-05-31
Budget Start: 2015-06-01
Budget End: 2016-05-31
Support Year: 1
Fiscal Year: 2015
Total Cost
Indirect Cost

VocaliD: Optimized Speech Corpora for Personalized Speech Synthesis
Patel, Rupal
Vocalid, Inc., Belmont, MA, United States

Abstract

Public Health Relevance

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Public Health Relevance

Funding Agency

Institution

Comments