Video-based Speech Enhancement for Persons with Vision and Hearing Loss

Tekin, Ender

Abstract

Video-based Speech Enhancement for Persons with Hearing and Vision Loss Project Summary It is estimated that by 2030, the number of people in the United States over the age of 65 will account for over 20% of the total population. Hearing and vision loss naturally accompanies the aging process. Persons with hearing loss can benefit from observing the visual cues from a speaker such as the shape of the lips and facial expression to greatly improve their ability to comprehend speech. However, persons with vision loss cannot make use of these visual cues, and have a harder time understanding speech, especially in noisy environments. Furthermore, people with normal vision can use visual information to identify a speaker in a group, which allows them to focus on this person. This can greatly benefit a person with hearing loss who may be using a device such as a sound amplifier or a hearing aid. A user with vision loss, however, needs to be provided with this speaker information to make optimal use of such devices. We propose developing a prototype device that will clean the speech signal from a target speaker and improve speech comprehension for persons with hearing and vision loss in everyday situations. In order to accomplish this task, we need to harness the visual cues that have so far largely been ignored in the design of assistive technolo- gies for persons with hearing loss.
Our first aim i s to learn speaker-independent visual cues that are associated with the target speech signal, and use these audio-visual cues to design speech enhancement algorithms that perform much better in noisy everyday environment than current methods which only utilize the audio signal. We will utilize a video camera and computer vision methods to design advanced digital signal processing techniques to enhance the target speech signals recorded through a microphone.
Our second aim i s to use the video and audio signals to detect and efficiently localize the visible speaker. The information regarding the location of the speaker of interest can then be used to efficiently perform speaker separation, as well as be provided to the user. Finally, we aim to implement these developed algorithms on a portable prototype system. We will test the performance of this system and improve the user-interface through user experiments in real-world situations as well as laboratory conditions. The end product will show the feasibility and importance of incorporating multiple modalities into sensory assistive devices, and set the stage for future research and development efforts.

Public Health Relevance

It is estimated that by 2030, more than one in five people in the United States will be over the age of 65. Age- related hearing and vision loss is considered a natural consequence of the aging process, yet current assistive technology approaches do little to address this type of sensory loss. The proposed research will test the feasibility of incorporating visual information in hearing aids, which is expected to improve speech perception for persons with hearing and vision loss in everyday situations, greatly enhancing their ability to lead independent lives, remain employable, and maintain active participation in society.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Eye Institute (NEI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21EY022200-01A1
Application #: 8443624
Study Section: Special Emphasis Panel (BNVT)
Program Officer: Wiggs, Cheri

Project Start: 2013-06-01
Project End: 2015-05-31
Budget Start: 2013-06-01
Budget End: 2014-05-31
Support Year: 1
Fiscal Year: 2013
Total Cost: $198,801
Indirect Cost: $73,801

Institution

Name: Smith-Kettlewell Eye Research Institute
Department
Type
DUNS #: 073121105

City: San Francisco
State: CA
Country: United States
Zip Code: 94115

Related projects


NIH 2014 R21 EY	Video-based Speech Enhancement for Persons with Vision and Hearing Loss Tekin, Ender / Smith-Kettlewell Eye Research Institute
NIH 2013 R21 EY	Video-based Speech Enhancement for Persons with Vision and Hearing Loss Tekin, Ender / Smith-Kettlewell Eye Research Institute	$198,801

Publications

Tekin, Ender; Coughlan, James M; Simon, Helen J (2014) An Investigation Into Incorporating Visual Information in Audio Processing. Comput Help People Spec Needs 8547:437-440

Comments

Be the first to comment on Ender Tekin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: