This is a three-year standard award. When people talk, they produce both acoustic and optical speech signals. When people with normal hearing and vision communicate face to face, they make use of both types of signals. Being able to see and hear a talker can enhance speech understanding in conditions of noisy or difficult-to-comprehend speech. The goals of this multidisciplinary project are to quantitatively characterize optical speech signals, examine how optical speech characteristics relate to acoustic and to physiologic speech characteristics, study several fundamental issues in human visual speech perception, and apply obtained knowledge to optical speech synthesis. Across the entire project, the main questions we address are: (1) What speech information can perceivers get from seeing talkers? (2) How are optical and acoustic signals related to underlying speech articulations? (3) What are the perceptual and neurophysiological bases for visual speech perception? and (4) Can we demonstrate usefulness of this knowledge for developing synthesis of artificial talking faces? A multi-talker database is being recorded for this project. Recordings include acoustic, optical (with retroreflector labeling on the faces), and physiologic signals (Electromagnetic Midsaggital Articulography -- EMA). Studies follow up on recent results in the literature showing high correlations between acoustic and optical speech measures, and between external (optical) and internal (physiologic) speech measures. Studies include perceptual experiments to determine segmental and prosodic speech characteristics. We are investigating the neurophysiologic bases for visual speech perception in deaf and hearing adults using electrophysiologic measures. Optical speech synthesis is being employed to (1) test our understanding of the cues that control visual perception of phonemes and prosody, and (2) investigate the neurophysiological bases for human sensitivity to optical speech characteristics.
The project will impact engineering in the areas of speech synthesis and audiovisual automatic speech recognition. It will extend understanding of human speech perception and its neurophysiologic bases in deaf and hearing individuals. The applications that will derive from the project include ones in second language training, enhancement of speech transmission quality and recognition accuracy under conditions of environmental noise, efficient storage and transmission of optical speech information, stimulus control in audiovisual perceptual experiments, and communication enhancement for hearing impaired people. The multidisciplinary team of principal investigators represent the fields of cognitive science, speech perception, linguistics, electrical engineering, and neurophysiology.