Watching a speaker's face and lips provides powerful information in speech perception and language understanding. Visible speech is particularly effective when the auditory speech is degraded, because of noise, bandwidth filtering, or hearing impairment. The proposed research involves three main areas of inquiry on the use of visible information in speech perception. The first area involves research and development of computer animated facial displays. Synthetic visible speech has a great potential for advancing our knowledge about the visible information in speech perception, how it is utilized by human perceivers, and combined with auditory speech. But a better model of speech articulation is needed- incorporating physical measurements from real speech and rules describing coarticulation between segments. Further work is proposed to increase the available information and to improve the realism of the face. Standard tests of intelligibility will be used to assess the quality of the facial synthesis. The second area of inquiry is the measurement of facial movements and tongue during speech production, and analysis of features used by human observers rn visual-auditory speech perception. Systematic measurements of visible speech will be made using a computer controlled video motion analyzer. These measurements will be used for control of synthetic visual speech and also will be correlated with perceptual measures to identify which physical characteristics are actually used by human observers. The third area evaluates the contribution of facial information in general (and various visual features in particular) to speech perception. Experimental studies with human observers will be carried out to assess the quality of the synthetic facial display and to better understand speech perception by eye and ear. Synthetic visible speech will allow the visual signal to be manipulated directly, an experimental feature central to the study of psychophysics and perception. Although these three areas of inquiry address different problem domains in cognitive science and engineering, their simultaneous study affords potential developments not feasible in separate investigations. The general hypotheses examined in this research are that l) animated visual speech from synthetic talkers is a valuable communication medium 2) research with this medium will contribute our understanding of speech perception by ear and by eye, and 3) the research will have valuable applications for improving communication for deaf and hearing-impaired individuals, people in noisy environments, people in difficult language situations such as second language learning, and human-machine interactions.

National Institute of Health (NIH)
National Institute on Deafness and Other Communication Disorders (NIDCD)
Research Project (R01)
Project #
Application #
Study Section
Sensory Disorders and Language Study Section (CMS)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Santa Cruz
Schools of Arts and Sciences
Santa Cruz
United States
Zip Code
Chen, Trevor H; Massaro, Dominic W (2008) Seeing pitch: visual information for lexical tones of Mandarin-Chinese. J Acoust Soc Am 123:2356-66
Massaro, Dominic W; Chen, Trevor H (2008) The motor theory of speech perception revisited. Psychon Bull Rev 15:453-7;discussion 458-62
Massaro, Dominic W; Bosseler, Alexis (2006) Read my lips: The importance of the face in a computer-animated tutor for vocabulary learning by children with autism. Autism 10:495-510
Massaro, Dominic W; Light, Joanna (2004) Using visible speech to train perception and production of speech for individuals with hearing loss. J Speech Lang Hear Res 47:304-20
Chen, Trevor H; Massaro, Dominic W (2004) Mandarin speech perception by ear and eye follows a universal principle. Percept Psychophys 66:820-36
Bosseler, Alexis; Massaro, Dominic W (2003) Development and evaluation of a computer-animated tutor for vocabulary and language learning in children with autism. J Autism Dev Disord 33:653-72
Srinivasan, Ravindra J; Massaro, Dominic W (2003) Perceiving prosody from the face and voice: distinguishing statements from echoic questions in English. Lang Speech 46:1-22
Massaro, D W; Cohen, M M; Campbell, C S et al. (2001) Bayes factor of model selection validates FLMP. Psychon Bull Rev 8:1-17
Massaro, D W; Cohen, M M (2000) Tests of auditory-visual integration efficiency within the framework of the fuzzy logical model of perception. J Acoust Soc Am 108:784-9
Massaro, D W; Cohen, M M (1999) Speech perception in perceivers with hearing loss: synergy of multiple modalities. J Speech Lang Hear Res 42:21-41

Showing the most recent 10 out of 17 publications