Speech is a unique human capability. The vocal tract is the universal human instrument played with great dexterity and skill in the production of speech to convey rich linguistic and paralinguistic information. The project will enable fundamental understanding of how individuals differ in their speech articulation due to differences in shape and size of their physical vocal instrument. Knowledge of how people differ in their speech production can help create improved automatic speaker recognition, technologies important for national security. The project can inform design of technologies for robust speech-based access for all members of the population, including children, the elderly, and non-native speakers of a language. Results from the project can also assist in better understanding and treating disorders (e.g., cleft lip/palate), illness (e.g., head and neck cancer, apnea) or injury where human speech articulation is affected. The novel imaging data from 200 individuals, and associated tools, annotations and interpretations created by the interdisciplinary team will be shared broadly with the scientific community. The project will provide a unique research training opportunity for students in integrated speech science and technology.

The overarching goal of this project is to advance scientific understanding of how vocal tract morphology and speech articulation interact and explain the variant and invariant aspects of speech signal properties across talkers. Of particular scientific interest is the nature of articulatory strategies adopted by individuals in the presence of structural differences across them to achieve phonetic equivalence. Equally of interest are what aspects of, and how, vocal tract morphological differences are reflected in the acoustic speech signal, and if those differences can be estimated from speech acoustics. A crucial part of this goal is to create forward and inverse computational models that relate vocal tract details to speech acoustics toward shedding light on individual speaker differences and informing design of robust speaker recognition technologies. This project goes beyond state-of-the-art methods by focusing on direct investigation of the dynamic human vocal tract using novel imaging techniques and computational modeling to illuminate inter-speaker variability in vocal tract structure, as well as the strategies by which linguistic articulation is implemented. Using novel Magnetic Resonance Imaging with superior spatial resolution of the entire moving vocal tract that we helped develop (dynamic realtime 2D with excellent temporal resolution and accelerated volumetric 3D), the project will gather and quantify spatio-temporal details of speech production from 160 native American English covering the major dialectal regions of North America and 40 non-native speakers. The experimental, theoretical, and methodological approaches investigating the interplay between structure (shape and size) and function (dynamics of vocal-tract shaping and its acoustic consequences) can lead to new theoretical advances with improved phonetic characterizations of linguistic units that are general across speakers. It also offers the ability to explain individual specific speech patterns that can improve both understanding the scientific underpinning and creating robust automatic speaker recognition technology, enabling to determine not only that two talkers are different by the adoption of novel speaker dependent features, but also how and why they differ, by analyzing biologically-inspired details of structure and articulation.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1514544
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2015-09-01
Budget End
2020-08-31
Support Year
Fiscal Year
2015
Total Cost
$1,199,532
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089