Nearly 7.5 million people live without the ability to vocalize effectively. Existing augmentative and alternative communication (AAC) technology provides some function for these individuals, typically by converting physical gestures, eye movements or text into words that can be acoustically synthesized or visually displayed. However, a key limitation of these devices is that they do not involve natural mechanisms of speech production and therefore can be less intuitive as substitutes for the human vocal system. Consequently, they can suffer from lexical ambiguity, lack of emotional expression, and difficulty in conveying intent. There remains an unmet need to restore the natural mechanisms of speech production for the vocally impaired. To meet this need, we propose to develop a first-of-its-kind AAC system that restores personalized, prosodic, near real-time vocalization based on surface electromyographic (sEMG) signals produced during subvocal (i.e., silently mouthed) speech. In Phase I, we demonstrated the ability to recognize orthographic content and categorize emphatic stress between phrases subvocalized by (n=4) control and (n=4) post-laryngectomy participants with a 96.3% word recognition rate and 91.2% emphatic stress discrimination rate, respectively. Subvocal speech corpus transcripts were synthesized into prosodic speech using personalized, digital voices unique to each participant, then evaluated by nave listeners (n=12). Listeners consistently rated our sEMG-based digital voice as having greater intelligibility, acceptability, emphasis discriminability and vocal affinity than the state-of-the-art electrolarynx (EL) speech aid used by laryngectomees. Having achieved these capabilities with lengthy post-processing of single phrases, we now aim to advance this technology in Phase II by solving the more fundamental challenges of transcribing prosodic speech and tracking variations in intonation and timing in near-real-time to restore conversational interactions in everyday life. To achieve this goal, our team of engineers at Altec Inc. is partnering with the world?s leading provider of personalized digitized voice for AAC (VocaliD, Inc), and world-class laryngeal cancer clinical experts (Massachusetts General Hospital) to develop algorithms for transcribing prosodic speech and tracking variations in intonation and timing throughout narratives, monologues and conversations (Aim 1); design MyoVoice? system for near real-time mobile use (Aim 2); and evaluate the prototype system for conversational efficacy (Aim 3). Our milestone is to demonstrate within-subject improvements in ease-of-use, functional efficacy, and social reception amongst post-laryngectomy participants using our sEMG-based digital voice when compared to their typical EL speech aid. The final deliverable will consist of a single 4-contact sensor veneer and cross-platform, near-real-time mobile software that can operate on an AAC tablet or mobile device. Once commercialized, our vision for the future of this device is for a person?who is facing the devastating need to undergo laryngectomy?to have their voice banked and subvocal models trained such that immediately following surgery, they can receive a custom MyoVoice? system to restore their original voice.
This project will deliver a speech augmentative and alternative communication (AAC) system that uses noninvasive surface electromyographic (sEMG) signals from speech articulator muscles to restore the ability to communicate for those with vocal impairments that resulted from surgical treatment of laryngeal and oropharyngeal cancers. The MyoVoiceTM system will improve upon current AAC technologies by offering a non- invasive, hands-free device that synthesizes a personalized, prosodic voice in near real-time based entirely on the sEMG content of prosodic subvocal (silently-mouthed) speech. Restoring one?s personalized and expressive voice following such a loss will not only have a profound impact on the quality of life and outlook for cancer survivors but will lead to alternative methods for interfacing with computers and machines by people with other speech disorders or limited motor function.