Spoken language is much more than simply audible text. The way we produce words and phrases tells our hearers much about our mental state and the intentions that underlie our words and actions. While machines have become increasingly proficient at recognizing sentences in current spoken dialog systems, they are still very poor at detecting, inter alia, whether speakers are frustrated or confident, or whether they are trying to deceive or to convey helpful information to their hearers. Despite promising work in identifying verbal and non-verbal cues to emotion and intention in acted, laboratory speech, and early results identifying limited types of emotion in more natural settings, we have only a limited understanding of reliable verbal cues (acoustic, prosodic, lexical and syntactic) to these phenomena and consequently do not know how to automatically recognize the phenomena automatically.

The PIs will first conduct a series of laboratory experiments to identify acoustic, prosodic, lexical and syntactic cues to emotion and intention in elicited (non-acted) speech. They will, in parallel, discover new features that may be useful in automatically identifying emotions and intentions such as deceptiveness, confidence, and frustration/anger using available corpora with augmented labeling and labeling new corpora for these speaker states/intentions. They will subsequently test these features on the laboratory recordings and identify new features that may be suggested by analysis of these recordings. The result should provide a better understanding of what auditory cues characterize certain speaker states/intentions and which of these provide reliable features for their automatic identification.

From a practical point of view, identifying speaker state/intentions automatically should be of considerable benefit for interactive voice response systems such as call center response systems, over the phone banking or travel reservation systems, or tutorial systems. Automatic identification of speaker state/intentions should also provide useful information for speaker screening in a variety of applications that currently depend upon human assessment of speaker state/intention.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0325399
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2003-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2003
Total Cost
$2,172,551
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027