Despite a sizeable literature on emotional speech and speech under stress, little is understood about how features in continuous speech vary with subtle and real-world-relevant changes in physiological state within any particular speaker. This EArly Grant for Exploratory Research relates speech features to direct measures of physiological activation, rather than to categorical hand-annotated labels of emotion or state. The study collects and analyzes a corpus of speech and autonomic nervous system (ANS) sensor data to discover what changes occur in speech features when a person is exposed to different activation-relevant emotional, cognitive, stress-related conditions. The broader significance and impact is discovery of cues in speech that can be used to estimate changes in a speaker's physiological activation level when no sensors are available. Applications include health care (monitoring physical, mental, cognitive states), education and learning (monitoring engagement), social interaction (monitoring activation level), and law enforcement/intelligence (monitoring behavioral changes of high interest individuals).

In Phase 1 (Corpus Collection), the project creates a 40-subject corpus of time-aligned speech and physiological signals. Activation is measured using state-of-the-art methods to extract cardiovascular (ECG), blood pressure, respiration rate, and skin conductance signals. Each subject participates in five conditions: (1) neutral baseline; (2) emotional (description of emotionally salient pictures); (3) stressed (speaking task incentivized for accuracy and completion time); (4) cognitive load (speaking task with a visual distractor, incentivized for task completion and distractor task accuracy); and (5) computer-directed speech (task requiring perfect recognition from a speech recognizer). In Phase 2 (Analysis), sensor output is post-processed to calibrate the signals and look for changes. These changes are then compared to a range of automatically extracted features (based on acoustics, prosody, discourse patterns, and disfluency patterns) from the time-aligned speech. Analyses and machine learning experiments then examine which speech feature changes correlate with changes in sensor output, both within and across speakers. Results shed light on how information from natural continuous speech can be used to estimate changes in a speaker?s physiological activation level in ongoing, subtle and everyday contexts.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1449202
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2014-08-01
Budget End
2015-07-31
Support Year
Fiscal Year
2014
Total Cost
$49,990
Indirect Cost
Name
Sri International
Department
Type
DUNS #
City
Menlo Park
State
CA
Country
United States
Zip Code
94025