American Sign Language (ASL) is a primary means of communication for 500,000 people in the United States and a distinct language from English, conveyed through hands, facial expressions, and body movements. Studies indicate that deaf children of deaf parents read better than deaf children of hearing parents, mainly due to better communication when both children and parents are deaf. However, more than 80% of children who are deaf or hard of hearing are born to hearing parents. It is challenging for parents, teachers, and other people in the life of a deaf child to learn ASL rapidly enough to support the visual language acquisition of the child. Technology that can automatically recognize aspects of ASL signing and provide instant feedback to these students of ASL would give them a time-flexible way to practice and improve their signing skills. The goal of this project, which involves an interdisciplinary team of researchers at three colleges within the City University of New York (CUNY) with expertise in computer vision, human-computer interaction, and Deaf and Hard of Hearing education, is to discover the most effective underlying technologies, user-interface design, and pedagogical use for an interactive tool to provide such immediate, automatic feedback for students of ASL.
Most prior work on ASL recognition has focused on identifying a small set of simple signs performed, but current technology is not sufficiently accurate on continuous signing of sentences with an unrestricted vocabulary. The PIs will develop technologies to fundamentally advance ASL partial recognition, that is to identify linguistic/performance attributes of ASL without necessarily identifying the entire sequence of signs, and automatically determine if a performance is fluent or contains errors. The research will include five thrusts: (1) based on ASL linguistics and pedagogy, to identify a set of observable attributes indicating ASL fluency; (2) to discover new technologies for automatic detection of the ASL fluency attributes through fusion of multimodality (facial expression, hand gesture, and body pose) and multisensory information (RGB and Depth videos); (3) to collect and annotate a dataset of RGBD videos of ASL, performed at varied levels of fluency, by students and native signers; (4) to develop an interactive ASL learning tool that provides ASL students immediate feedback about whether their signing is fluent or not; and (5) to evaluate the robustness of the new algorithms and the effectiveness of the ASL learning tool, including its educational benefits. The work will lead to advances in computer vision technologies for human behavior perception, to new understanding of user-interface design with ASL video, and to a revolutionary and cost-effective educational tool to assist ASL learners achieve fluency, using recognition technologies that are robust and accurate in the near-term. Project outcomes will include a dataset of videos at varied fluency levels, which will be valuable for future ASL linguists or instructors, students learning ASL, and computer vision researchers.