American Sign Language (ASL) is the primary means of communication for about 500,000 people in the United States. ASL is a distinct language from English; in fact, a majority of deaf U.S. high school graduates have only a fourth-grade (age 10) English reading level. Consequently, many deaf people find it difficult to read English text on computers, in TV captioning, or in other settings. Software to translate English text into an animation of a human character performing ASL would make more information and services accessible to deaf Americans. Unfortunately, however, essential aspects of ASL are not yet modeled by modern computational linguistic software. Specifically, ASL signers associate entities under discussion with 3D locations around their bodies, and the movement of many types of ASL signs changes based on these locations: pronouns, determiners, many noun phrases, many types of verbs, and others. When do signers associate entities under discussion with locations in space? Where do they position them? How must ASL sign movements be modified based on their arrangement? Creation of robust software to understand or generate ASL requires answers to questions such as these. The PI's goal in this research is to discover techniques for generation of ASL animations that automatically predict when to associate conversation topics with 3D locations, where to place them, and how these locations affect ASL sign movements. To these ends, he will create the first annotated corpus of ASL movement data from native signers (in a motion-capture suit and gloves), annotate this corpus with features relating to the establishment of entity-representing locations in space, use machine learning approaches to analyze when/where these locations are established and how 3D motion paths of signs are parameterized on those locations, incorporate the models into ASL generation software, and recruit native ASL signers to evaluate the 3D animations that result. This work will advance our linguistic knowledge relating to little-understood yet frequent ASL phenomena, and so will lay the foundation for software to produce a huge variety of ASL signs and constructions that are beyond the ability of current techniques to generate. This will in turn lead to ASL generation systems that produce higher quality animations that are more grammatical and understandable to deaf users, which will greatly benefit accessibility applications for deaf users and ASL machine translation.
Broader Impact: The ASL motion-capture corpus and an ASL generator that automatically handles spatial phenomena will enable more computational linguistic researchers to study ASL. This research also has applications for sign languages used in other countries (most with similar phenomena), and for the generation of animations of human gestures (for which empirical techniques developed in this work will apply). The PI is committed to finding ways to encourage deaf high school students to pursue science careers, and to creating Ph.D. research opportunities for deaf students. He will give presentations in ASL at local deaf high schools about computing research, make available summer research experiences for deaf high school students (using ASL skills to annotate the corpus, conduct evaluation studies, and inform the Deaf community about computing), recruit native ASL signers as Ph.D. and undergraduate researchers, and create courses on people-focused computer science research and careers (to attract diverse students to the field) and on assistive technology research (to interest and train Ph.D. students). These educational activities will be enabled by the PI's conversational ASL skills, by the research's relevance to deaf students, and by the Queens College's unique proximity to five local high schools for deaf students.
Standardized testing has revealed that many deaf adults in the U.S. have lower levels of English literacy; therefore, providing American Sign Language (ASL) on websites can make information and services more accessible. Unfortunately, video recordings of human signers are difficult to update when information changes, and there is no way to support just-in-time generation of information content from an online request. Thus, software is needed that can automatically synthesize understandable animations of a virtual human performing ASL, based on an easy-to-update script as input. The challenge is for this software to select the details of such animations so that they are linguistically accurate, understandable, and acceptable to users. This can be accomplished by analyzing digital recordings of the movements of ASL signers, to determine how they perform linguistically accurate movements. By building mathematical models of human movements, the researchers on this project have been able to create software for producing animations of ASL of higher quality than the previous state of the art. To support this effort, the researchers in this project created a corpus (a large collection of language recordings) of ASL, by videotaping and recording native ASL signers wearing motion-capture equipment while they perform unscripted ASL sentences and stories. Then, these digital recordings were linguistically analyzed and annotated by native ASL signers (people who grew up using ASL) and ASL linguistic researchers, in order to produce a valuable data collection for research on ASL linguistics, ASL animation-synthesis, and other research. Using this corpus as a data resource, the researchers on this project next wrote mathematical software that calculated statistical patterns in the collected data. These statistical patterns were then incorporated into the software for automatically producing animations of ASL. In particular, the researchers focused on how to produce some especially challenging aspects of ASL: the way in which ASL verbs change their motion path based on how the signer has set up 3D points around their body, to represent entities under discussion. To evaluate the quality of the animation software, experimental studies were conducted in which native ASL signers evaluated the quality and understandability of the animations produced by the software. Often, multiple versions of the software were compared to determine the best approach. During this project, the researchers invented new methods for conducting precise experiments with deaf research participants evaluating animations of ASL. In addition to publishing and giving research presentations on their linguistic corpus and animation software, the researchers also shared details about their new methods for conducting experiments. There was also a significant educational aspect to this project, including the creation of new courses on accessibility technology for people with disabilities and the creation of new research opportunities for deaf high school students and undergraduate students. These students assisted with experiments, learned about the research process, checked for errors in the sign language recordings, produced a videos and blogs about what they have learned during the summer, and interact with undergraduate and graduate students pursuing computer science degrees, thereby encouraging them to think about their future careers and education in the sciences. The principal investigator on this project published an article on how the high school and undergraduate research program for deaf and hard-of-hearing students was conducted, so that other researchers at other universities could learn how to run programs like this. In addition, the scientific results of the research on ASL linguistics, ASL animation, and experimental evaluations of technology with deaf participants have been published in a variety of high-quality scientific journals and conferences. This includes 4 book chapters, 9 peer-reviewed conference papers, 6 scientific journal articles, a doctoral dissertation by a student who worked on the project for five years, and other publications.