The objective of this project is to develop a modeling framework in order to enable extensive use of prosodic information, such as pitch, duration and energy characteristics, in a large class of applications that call for spoken language understanding. For this purpose, prosodic features are extracted from the speech signal over regions defined by automatically detectable events. The result is a variable-length sequence of usually high-dimensional vectors, with mixed discrete and continuous distributions and undefined values. The focus of the project is the search for a transformation that, when applied to the prosodic features, results in a single vector that can adequately represent the important characteristics of the original sequence of prosodic features. The proposed transform is formed by projecting the distribution of the features in a certain sample onto a set of probability distributions represented by dynamic Bayesian networks in a predetermined dictionary.

The ultimate goal of the project is the creation of a general probabilistic model-based transform paradigm that can act robustly on complex feature sets. This work will therefore also contribute to other domains where features exhibit characteristics that are challenging for standard approaches. The tools and corpora developed during the project will be made available to the community. The results from this project will contribute to scientific knowledge on the use of prosodic information and increase the capabilities of spoken language understanding and dialog systems.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0710833
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-09-01
Budget End
2009-08-31
Support Year
Fiscal Year
2007
Total Cost
$199,107
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304