HCC: Small: Collaborative Research: Real-Time Captioning by Groups of Non-Experts for Deaf and Hard of Hearing Students

Bigham, Jeffrey; Gildea, Daniel

Abstract

Many deaf and hard of hearing students use real-time captioning to participate in education. Generally, real-time captions are provided by skilled professional captionists (stenographers) who use specialized keyboards or software to keep up with natural speaking rates of up to 225 words per minute. But professional captionists are expensive and must be arranged in advance in blocks of at least an hour. Automatic speech recognition (ASR) is improving, but still experiences high error rates in real classrooms. In this collaborative effort involving the University of Rochester and Rochester Institute of Technology, the PIs will address these issues by blending human- and machine-powered captioning to produce captions on demand, in real time, for low cost. The PIs' approach is for multiple non-experts and ASR to collectively caption speech in under 5 seconds, with the help of interfaces which encourage quick, incomplete captioning of live audio. Because non-experts cannot keep up with natural speaking rates, new algorithms will merge incomplete captions in real time. (While the sequence alignment problem can be solved exactly with dynamic programming, existing approaches are too slow, are not robust to input error, and do not incorporate natural language semantics.) Systematically varying audio saliency will encourage complete coverage of speech. Non-expert captions will train ASR engines in real time, so that ASR may improve during a lecture. (Traditional approaches for ASR training assume that training occurs offline.) The quikCaption mobile application will embody these ideas and will be iteratively designed with deaf and hard of hearing students at the National Technical Institute of the Deaf (NTID) via design sessions, lab studies and in-class deployments. Non-expert captionists can be drawn from broad sources: volunteers willing to donate their time, classmates with relevant domain knowledge, or always-available paid workers. They may be local (in the classroom) or remote. Captionists may have experience from prior quikCaption sessions, or novice crowd workers recruited on demand from existing marketplaces (e.g., Mechanical Turk). A flexible worker pool will allow real-time captions to be available on demand at low cost and for only as long as needed.

Broader Impacts: This research will dramatically improve education for deaf and hard of hearing students by enabling access to serendipitous opportunities, such as conversations after class or last-minute guest lectures for which no interpreter or captionist was arranged. Real-time captioning will also be useful in other settings such as school programs, artistic performances, and political events. Older hard of hearing adults usually prefer captioning, and represent a sizable and growing population; hearing people may benefit because captioning is a first step in automatic translation of aural speech. The algorithms developed as part of this project for real-time merging of incomplete natural language will likely be adaptable for other applications such as collaborative translation or communication over noisy mediums.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1218209
Program Officer: Ephraim Glinert

Project Start
Project End
Budget Start: 2012-08-01
Budget End: 2014-09-30
Support Year
Fiscal Year: 2012
Total Cost: $419,243
Indirect Cost

HCC: Small: Collaborative Research: Real-Time Captioning by Groups of Non-Experts for Deaf and Hard of Hearing Students
Bigham, Jeffrey Gildea, Daniel
University of Rochester, Rochester, NY, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments