Dysarthria, a neuromotor speech disorder impacting over 4 million Americans, is often so severe that speech is rendered unintelligible, requiring the use of augmentative and/or alternative communication (AAC) devices. These devices are dated, cumbersome and bulky. Rather than engaging in face-to-face interaction, AAC users spend a disproportionate amount of time navigating through menus of letters/icons to compose a message, which can then be spoken aloud by an integrated text-to-speech synthesis system. Thus AAC interactions are slow, effortful, unnatural, and often hinder rather than support social, educational and vocational opportunities. In fact, many AAC users continue to vocalize with familiar caregivers implying that consistent patterns must underlie dysarthric productions. It is these imprecise yet consistent productions that we propose to capture via multimodal sensors and classify using pattern recognition algorithms for speech translation. While automatic speech recognition is a viable technology for neurologically intact speakers or those with mild impairments, it fails in acoustically harsh speaking contexts and for those with more severe dysarthria. Instead, we focus on multimodal (lingual kinematic and acoustic;LinKA) representations of speech as they provide redundant and complementary channels of input for improved disambiguation. While other approaches have used computer vision, ultrasound imaging and electromyography to simultaneously estimate articulatory and acoustic parameters of speech, they are limited in portability, cost, and application to clinical settings. The current proposal leverages a novel, lightweight, wearable and low-cost array of magnetic sensors near the cheeks that can recognize the magnetic field patterns generated by a small magnetic tracer placed on the tongue to capture lingual kinematics during speech. Coupling tongue movements with the acoustic signal, captured via microphones mounted on the same headset, provides a multidimensional representation of speech that can then be translated into clear understandable speech for a new generation of wearable, speech-driven AAC devices. The proposed work will optimize the efficiency and robustness of lingual-kinematic and acoustic sensing for mobile speech translation (Aim 1), yield a standardized implementation protocol for training and independent use of the LinKA system (Aim 2), and culminate in a 2-week field test of the LinKA translator with 12 potential users with speech impairment (Aim 3). The current proposal is a first and essential step toward a low-cost, wearable, personalized communication enhancement system that can broaden communication opportunities and networks for individuals with speech impairment and thereby increase communication participation, independence and overall quality of life.
Neuromotor speech disorders limit communication opportunities and access to social, educational and employment activities for nearly 4 million Americans. The LinKA (Lingual Kinematic and Acoustic) system is the first low-cost, wireless and wearable technology to simultaneously capture tongue movements and corresponding acoustics of speech. Coupling multimodal speech detection with sophisticated pattern recognition algorithms and an intelligent user interface, we propose to develop the LinKA Translator - an enabling technology that would disambiguate disordered productions and translate them into clear, understandable speech to broaden communication networks for individuals with speech disorder and thereby increase quality of life and independence.
|Lu, Jun; Yang, Zhongtao; Okkelberg, Klaus Z et al. (2018) Joint Magnetic Calibration and Localization Based on Expectation Maximization for Tongue Tracking. IEEE Trans Biomed Eng 65:52-63|
|Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad et al. (2017) Multimodal Speech Capture System for Speech Rehabilitation and Learning. IEEE Trans Biomed Eng 64:2639-2649|