A set of small-scale human-computer dialogue corpora ("micro-corpora") is being transcribed and annotated, and distributed to researchers in the area of dialogue. The raw data for these corpora already exists and has been generated using two existing system (DiSCoH from AT&T labs and ConQuest from Carnegie Mellon University). The resulting corpora are being distributed to researchers in the field with the goal of soliciting feedback on the corpus composition and annotation that can best support research in the field of human-machine dialogue interaction.

Feedback from this exercise is being collated and disseminated back to the community. Discussion on the outcome of this exercise at a workshop collocated with the 2007 HLT/NAACL conference provides the opportunity to develop a set of guidelines for the large scale collection of such data. The workshop collocation makes attendance convenient for many researchers in the dialogue community. In addition, support from NSF allows the workshop to ensure broad participation by researchers from both North America and international centers. Discussions and documents generated in the feedback and solicitation process provide a basis for the preparation of a community-supported proposal to the NSF CRI:CRD program. The availability of systematically collected and annotated data supports progress in human-computer dialogue research which in turn enables the development of more sophisticated and broadly-accessible technologies for information access.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
0709161
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2007-04-01
Budget End
2008-12-31
Support Year
Fiscal Year
2007
Total Cost
$50,000
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213