SGER: Creation of a Goal-Oriented, Human-Machine Spoken Dialog Corpus

Hakkani-Tur, Dilek

Abstract

Annotated data sets are a necessity for data-driven speech and language processing approaches. Many of the speech and natural language processing tasks such as automatic speech recognition, question answering, machine translation, part-of-speech tagging, parsing, named entity extraction, and semantic role labeling have benefited significantly from shared tasks for benchmarking of algorithms and comparison of results on shared data sets. The goal of this project is to create a goal-oriented, mixed-initiative, naturally spoken human-machine spoken dialog system for conference services and publicize the spoken dialogs collected from this system for research purposes. The users can call a phone number and learn about the conference paper submission, program, venue, visa requirements, accommodation options and costs, etc.

We have an iterative approach, where the SDS is first deployed for the IEEE SLT workshop, to be held in December 2006, and all the components can be improved using the data collected from this deployment. Further data can be collected using the improved system for other conference/workshops.

Given that data-driven approaches are getting more popular for many speech and language processing applications, we believe that such a corpus annotated with system prompts, user utterance transcriptions, user intentions, overall task success, etc., would be a useful resource for dialog management, spoken language understanding, automatic speech recognition and other related tasks. These annotations can also be extended with user emotion tags, disfluencies, syntactic and semantic parses, etc. in the future.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0624389
Program Officer: Tatiana D. Korelsky

Project Start
Project End
Budget Start: 2006-04-01
Budget End: 2007-09-30
Support Year
Fiscal Year: 2006
Total Cost: $75,088
Indirect Cost

SGER: Creation of a Goal-Oriented, Human-Machine Spoken Dialog Corpus
Hakkani-Tur, Dilek
International Computer Science Institute, Berkeley, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments