This project will develop and test psycholinguistic models of the relationship between first language fluency, second language competence and second language fluency. These models will be applied toward the automatic assessment of fluency in a second language. The project involves a unique collaboration between researchers with backgrounds in second-language pedagogy, testing methodology, linguistics, speech and language technology, and psychology, and is organized around a common set of data, namely oral presentations given by university students in third year Mandarin classes. These student performances are videotaped, transcribed and rated by trained raters who rate the students' fluency according to a custom-designed and validated testing procedure. The same students will be recruited at the beginning of the semester to participate in psycholinguistic experiments to measure their first language fluency, and related studies will be conducted during the course of the year. The results from expert rating of second-language fluency will be correlated with the psycholinguistic studies of first-language fluency. In parallel with this, the team will develop algorithms that will automatically assign scores to a student's second-language performance that will correlate with expert judgments. These algorithms will range from low-level signal processing methods to estimate such factors as syllable rate and pause duration, to Dynamic Bayesian Networks that combine information from large number of sources to improve the performance of Automatic Speech Recognition on the data. The results of this work will be both a better understanding of what it means to be fluent in a second language, as well as robust methods that will allow for objective automatic assessment of fluency.

Project Report

How can you speak a foreign language fluently? What does a native speaker listen to? Researchers at the University of Illinois have collected 180 hours of longitudinal conversational data from Chinese Language classes in the US. Snippets extracted from this database were rated for second-language fluency using four radically different methods: (1) speech extracts were rated by trained second-language fluency experts, (2) extracts were rated by untrained native speakers of Mandarin Chinese, in web-based rating experiments conducted both in the US and in Taiwan, (3) automatic second-language fluency rating systems were trained, using methods related to automatic speech recognition, and finally, (4) twenty of the original students were recruited to participate in a carefully timed cartoon narration task, in which they narrated (in Chinese) the action occurring in a simple cartoon, while an automatic eye tracker logged all of their eye movements. First impressions are important. Native speakers, without any formal fluency-rating expertise, can judge the fluency of a second-language learner within fifteen seconds. Their judgments are reliable (different raters agree), and they match pretty well with the ratings generated by trained raters. Automatic second-language fluency systems perform almost as well using fifteen-second speech samples as using fifty-second samples. Besides rating the fluency of the second-language learners, raters were also asked questions like "is the speaker a native speaker of Chinese," "how strong is his/her accent," "are there a lot of disfluencies (uh's and um's)," "does he/she have good pronunciation, grammar and vocabulary," and "can the speech be easily understood?" Results were surprising. For raters who currently live in the United States (whether trained or untrained), the term "fluency" refers to overall second-language proficiency, and is indistinguishable from the other rated measures including speech flow, phonological control, grammatical accuracy, lexical accuracy, communication skill, nativeness, and accent. On the other hand, raters who currently live in Taiwan are willing to call a speaker fluent even if that speaker has a perceptible non-native accent: the two measures are highly correlated, but distinguishable. Experiments with automatic second-language fluency rating systems suggest that human raters depend primarily on measures of speech timing, even when they claim to be rating vocabulary size, grammatical competence, or the number of disfluencies. Three types of measures were tested: (1) automatic measures of vocabulary size were estimated based on the variety of words used by a talker during the rated speech sample, (2) automatic measures of disfluency rate were computed by detecting filled pauses (um and uh) and other types of difluency, (3) measures of timing included the number of syllables per second, and the "phonation time ratio" --- literally, the demonstrated ability of the speaker to fill every second of the recording with some kind of speech sound. All three types of measure were significantly correlated with human-rated fluency measures, but the best predictors were all in the third category. Furthermore, when timing measures are considered, no further accuracy boost is achieved by adding information about vocabulary size or disfluencies. The data suggest that the best way to sound fluent in a second language is to talk as much as you can. Researchers at the University of Illinois are currently exploring new ideas for second-language training programs, based on this research, that will enhance fluency (with real-time feedback to the user) by encouraging the learner to speak rapidly and confidently, consistently planning his or her speech a few words ahead. Some of the human raters judged speech samples with video, some with audio only. The two types of measurement were different: apparently, video changes one's perceived fluency. Those who judged speech samples with video were asked to rate the fluency of the speaker's body movements, as well as the fluency of his or her speech. The two measures were correlated: apparently, body movement is part of our cognitive construct of "fluency." Future research will explore audiovisual second-language fluency ratings, in order to see if machines can learn to consider the fluency of one's hand and body gestures in the same way that people do.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0623805
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2007-02-01
Budget End
2011-01-31
Support Year
Fiscal Year
2006
Total Cost
$710,781
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820