It is estimated that American Sign Language (ASL) is used by up to 2 million people in the United States. Yet many resources that are taken for granted by users of spoken languages are not available to users of ASL, given its visual nature and its lack of a standard written form. For instance, when an ASL user encounters an unknown sign, looking it up in a dictionary is not an option. With existing ASL dictionaries one can easily find what sign corresponds to an English word, but not what English word (or, more generally, what meaning) corresponds to a given sign. Another example is searching for computer files or web pages using keywords, which is now a frequent activity for computer users. At present, no equivalent for keyword search exists for ASL. ASL is not a written language, and the closest equivalent of a text document is a video sequence of ASL narration or communication. No tools are currently available for finding video segments in which specific signs occur. The lack of such tools severely restricts content-based access to video libraries of ASL literature, lore, poems, performances, or courses. The core goal of this research is to push towards making such resources available, by advancing the state-of-the-art in vision-based gesture recognition and retrieval. This poses challenging research problems in the areas of computer vision, machine learning, and database indexing. The effort will focus on the following: developing methods for learning models of sign classes, given only a few training examples per sign, by using a decomposition of signs into phonological elements; designing scalable indexing methods for video lexicons of gestural languages that achieve sign recognition at interactive speeds, in the presence of thousands of classes; creating indexing methods for spotting signs appearing in context in an ASL video database; incorporating linguistic constraints to improve performance of both lower-level vision modules, such as hand pose estimation and upper body tracking, and higher-level learning and indexing modules; and explicitly designing methods that can work with error-prone vision modules that often provide inaccurate or ambiguous outputs. The PIs will create two demonstration systems: an ASL lexicon containing a comprehensive database of ASL signs; and a "Sign Language Google" that can search for specific signs in large databases of ASL video content. The systems will be trained and evaluated using thousands of video sequences of signs performed in isolation and in context by native ASL signers. This usage data will be valuable for studying co-articulation effects and context-dependent sign variations. The signs collected will include the full list of ASL signs appearing in the first three years of standard college ASL curricula.
Broader Impacts: The methods developed in this project will enable sign-based search of ASL literature, lore, poems, performances, courses, from digital video libraries and DVDs, a capability which will have far-reaching implications for improving education, opportunities, and access for the deaf. These algorithms also aim to enable video-based queries of ASL lexicons, and eventually full-fledged dictionaries with metalinguistic information about signs and examples of usage. By enabling those learning ASL to "look up" a sign they do not know, this technology promises to transform the way students of ASL (both deaf and hearing), parents of deaf children, sign language interpreters, and linguists learn about signs they encounter. The algorithms developed in this effort may well lead to more robust ASL recognition systems, which can handle natural signing with a large lexicon of signs and the technology will also advance the state of the art in gesture recognition and synthesis systems. The large linguistically annotated corpus of native ASL produced as part of this effort will itself be an important resource.