This project is aimed at advancing the state of the art in the field of computer-based American Sign Language (ASL) recognition. To date, sign language recognition has focused primarily on detecting individual signs (words), articulated primarily with the arms and hands. This is a major limitation, given that critical linguistic information-including grammatical features such as negation, agreement, and question status-is conveyed through "non-manual" linguistic markings. These non-manual markings include facial expressions (such as raised or lowered eyebrows, varying gaze and aperture of the eyes, wrinkling of the nose, and mouth movements) and gestures or periodic movements of the head (such as tilts, nods, and shakes). No system for sign language recognition or generation can succeed without properly modeling the linguistic information produced both manually and non-manually.
The fact that these critical non-manual behaviors occur in parallel with manual signing, and that they are temporally aligned with phrases rather than with individual signs, greatly complicates the task. Further problems arise because of the difficulties of tracking the minute details of human facial movements from video, and the variations in the specific realizations (style) of manual signs and non-manual linguistic markings across different individuals, just as there is variation in the specific ways in which individuals produce a given spoken language. A comprehensive approach to ASL recognition thus requires the integration of information from multiple data sources with different spatial and temporal scales, the application of linguistic knowledge about both the manual and the non-manual aspects of ASL, and the modeling of interdependencies of activities in the manual and non-manual channels.
This collaborative project brings together the expertise of researchers in the fields of computer vision, linguistics, and recognition to achieve its goals. On the computer vision side, the principal investigators (PIs) will investigate the use of local free-form deformations and novel registration methods to enhance our existing face tracking software, so as to capture the minute details of the facial movements, and to improve robustness of the tracking. The tracking process results in a large number of facial parameters, which the researchers propose to reduce through nonlinear subspace manifold embedding. This embedding reduces the dimensionality of the parameter space, and more importantly, also results in a separation of style and content. Whereas style is specific to each signer, content captures the commonalities across all signers. Hence, by focusing on the content component, the PIs expect to be able to overcome the variations across signers and perform signer-independent recognition.
On the recognition side, the researhcers will combine the linguistic knowledge about facial microactions with computational clustering approaches to develop the necessary statistical models for recognition. Initially, these will be based on Hidden Markov Models from previous work by this research group and elsewhere; however, their power to describe the dynamical aspects of human movements is limited. To overcome these limitations, the PIs will research the use of Switching Linear Dynamic Systems, augmented by Coupled Dynamic Bayesian Networks to model and capture the interactions of the simultaneously occurring microactions. Linguists and computer scientists will collaborate in exploring the best ways to leverage information about the linguistic organization of ASL for improvement of recognition strategies.
This research will be performed on the existing linguistically annotated corpus of the National Center for Sign Language and Gesture Resources, as well as new data to be collected from 5-8 native ASL signers, which will also be annotated over the course of the project. The annotations will be used for the linguistic modeling; they also provide the "ground truth" for performing and validating the computer vision and recognition research.
Broader impact: The computer-based techniques for ASL can be extended to more general systems for sign language recognition and generation, as well as for interpretation of other types of human movements, such as face gesture recognition for HCI, surveillance, verification of identity, interrogation, interviews and medical diagnosis applications. The materials to be distributed will benefit researchers in linguistics, computer science, and other domains. There are immediate applications for primary and secondary education of the deaf and training of sign language interpreters. Improvements in multimedia (linguistic) information technology promise to offer expanded employment possibilities for the deaf, as well as improved access to vocational and post-secondary education. Finally, this project itself will provide a huge boost in terms of education, awareness and encouragement of deaf students in enabling them to work on cutting-edge research that directly affects them and their community.