Multimodal Speech Translation for Assistive Communication

Patel, Rupal

Abstract

Dysarthria, a neuromotor speech disorder impacting over 4 million Americans, is often so severe that speech is rendered unintelligible, requiring the use of augmentative and/or alternative communication (AAC) devices. These devices are dated, cumbersome and bulky. Rather than engaging in face-to-face interaction, AAC users spend a disproportionate amount of time navigating through menus of letters/icons to compose a message, which can then be spoken aloud by an integrated text-to-speech synthesis system. Thus AAC interactions are slow, effortful, unnatural, and often hinder rather than support social, educational and vocational opportunities. In fact, many AAC users continue to vocalize with familiar caregivers implying that consistent patterns must underlie dysarthric productions. It is these imprecise yet consistent productions that we propose to capture via multimodal sensors and classify using pattern recognition algorithms for speech translation. While automatic speech recognition is a viable technology for neurologically intact speakers or those with mild impairments, it fails in acoustically harsh speaking contexts and for those with more severe dysarthria. Instead, we focus on multimodal (lingual kinematic and acoustic;LinKA) representations of speech as they provide redundant and complementary channels of input for improved disambiguation. While other approaches have used computer vision, ultrasound imaging and electromyography to simultaneously estimate articulatory and acoustic parameters of speech, they are limited in portability, cost, and application to clinical settings. The current proposal leverages a novel, lightweight, wearable and low-cost array of magnetic sensors near the cheeks that can recognize the magnetic field patterns generated by a small magnetic tracer placed on the tongue to capture lingual kinematics during speech. Coupling tongue movements with the acoustic signal, captured via microphones mounted on the same headset, provides a multidimensional representation of speech that can then be translated into clear understandable speech for a new generation of wearable, speech-driven AAC devices. The proposed work will optimize the efficiency and robustness of lingual-kinematic and acoustic sensing for mobile speech translation (Aim 1), yield a standardized implementation protocol for training and independent use of the LinKA system (Aim 2), and culminate in a 2-week field test of the LinKA translator with 12 potential users with speech impairment (Aim 3). The current proposal is a first and essential step toward a low-cost, wearable, personalized communication enhancement system that can broaden communication opportunities and networks for individuals with speech impairment and thereby increase communication participation, independence and overall quality of life.

Public Health Relevance

Neuromotor speech disorders limit communication opportunities and access to social, educational and employment activities for nearly 4 million Americans. The LinKA (Lingual Kinematic and Acoustic) system is the first low-cost, wireless and wearable technology to simultaneously capture tongue movements and corresponding acoustics of speech. Coupling multimodal speech detection with sophisticated pattern recognition algorithms and an intelligent user interface, we propose to develop the LinKA Translator - an enabling technology that would disambiguate disordered productions and translate them into clear, understandable speech to broaden communication networks for individuals with speech disorder and thereby increase quality of life and independence.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21EB018764-01
Application #: 8737379
Study Section: Special Emphasis Panel (ZRG1-SBIB-Q (80))
Program Officer: Lash, Tiffani Bailey

Project Start: 2014-08-15
Project End: 2016-07-31
Budget Start: 2014-08-15
Budget End: 2015-07-31
Support Year: 1
Fiscal Year: 2014
Total Cost: $194,951
Indirect Cost: $48,352

Institution

Name: Northeastern University
Department: Other Health Professions
Type: Schools of Allied Health Profes
DUNS #: 001423631

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2015 R21 EB	Multimodal Speech Translation for Assistive Communication Patel, Rupal / Northeastern University	$217,880
NIH 2014 R21 EB	Multimodal Speech Translation for Assistive Communication Patel, Rupal / Northeastern University	$194,951

Publications

Lu, Jun; Yang, Zhongtao; Okkelberg, Klaus Z et al. (2018) Joint Magnetic Calibration and Localization Based on Expectation Maximization for Tongue Tracking. IEEE Trans Biomed Eng 65:52-63

Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad et al. (2017) Multimodal Speech Capture System for Speech Rehabilitation and Learning. IEEE Trans Biomed Eng 64:2639-2649

Comments

Be the first to comment on Rupal Patel's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: