Previous research has revealed that our ability to understand speech depends not only on the sound coming from a talker's mouth, but also on the ability to watch the talker's mouth and face. For example, our ability to understand what a person is saying in a noisy environment such as an office or busy street is improved if we can watch the talker's face as well as listen to the voice. These findings were originally seen as having important implications for improving speech recognition in individuals with moderate hearing losses. More recently, research has shown that people with normal hearing combine auditory and visual information even in situations in which the auditory signal is clear and intact. The implication of these studies is that recognition of speech depends not only on auditory information from the talker's voice but also visual information from the talker's mouth and face. Clearly, speech recognition can occur without visual information, as evidenced by the impact the phone and radio have had on society, so why do even normal hearing individuals use visual information when it's available? Moreover, how do they acquire this capability? Is it learned or is it part of the basic endowment that all infants have for dealing with spoken language? The studies we will carry out will provide information bearing on these questions. We will examine how people with normal hearing combine auditory and visual information from a talker's face so as to recognize the individual sounds that make up words. We will investigate this issue under a variety of conditions ranging from isolated syllables (such as `ba` and `da`), to whole words in sentence contexts. Our goal is to provide additional information regarding how visual information is used to help recognize speech in normal listening situations. The findings from this research should have practical as well as theoretical implications. For example, considerable efforts are currently begin devoted towards improving the capabilities of computers to recognize speech, especially under noisy conditions such as the cockpit of an airplane. Recent approaches to computer speech recognition are attempting to combine both auditory and visual information to improve the computer's accuracy. Our results will help provide insight into computer models for combining (or integrating) the auditory and visual information. With respect to theory, our results should improve our understanding of how the brain works, especially with regard to the understanding of spoken language. Finally, the stimuli and techniques developed in the proposed studies on adults can be used to determine how the ability to integrate auditory and visual information develops in young children. Such findings are important for understanding both normal spoken language development, and language development in children who are blind or have moderate to severe hearing impairments.

Agency
National Science Foundation (NSF)
Institute
Division of Behavioral and Cognitive Sciences (BCS)
Application #
9809013
Program Officer
Guy Van Orden
Project Start
Project End
Budget Start
1998-09-15
Budget End
2002-02-28
Support Year
Fiscal Year
1998
Total Cost
$299,997
Indirect Cost
Name
University of Arizona
Department
Type
DUNS #
City
Tucson
State
AZ
Country
United States
Zip Code
85721