According to NIDCD, 6 to 8 million people in the United States have some form of speech or communication disorder. Speech perception requires a listener to map variable acoustic signals onto a finite set of phonological categories known as phonemes, and to integrate those categories over time to form larger linguistic units such as syllables and words. It remains speculative where these different speech features are encoded and what cortical computations are needed for their calculation from an acoustic signal. A better understanding of what neural circuits are involved, how they are organized, and what computations they perform to support speech comprehension is critical for developing a detailed neurobiological model of speech perception. The major aim of this proposal is to use a joint framework to study the encoding of acoustic and linguistic features and the computational underpinnings of natural speech processing, using invasive surface and depth electrodes implanted in human neurosurgical patients. To study the cortical organization of acoustic features, we will characterize the encoding and anatomical organization of acoustic features in auditory cortical regions. To study the cortical organization of linguistic features, we will measure the encoding of phonetic, phonotactic, and semantic information using multivariate linear regression. To understand the underlying computational mechanisms, we will train convolutional neural network models to predict the neural responses to speech and use a novel method to express their computation as a set of linear transforms. By interpreting these models, we will uncover nonlinear computations used in different auditory areas and relate them to the encoding of acoustic and linguistic features. These complementary analyses will extend our knowledge of speech processing in the human auditory cortex and lead to new hypotheses about the mechanisms of various speech and language disorders. Together, the proposed research will greatly improve the current models of cortical speech processing, which are of great interest in many disciplines including neurolinguistics, speech pathology, speech prostheses, and speech technologies.
Speech and language disorders are major health issues. Speech perception requires a listener to compute linguistic units from variable acoustic signals, and where and how these computations happen in the human auditory cortex remains speculative. Using invasive human electrophysiology, we propose to study the neural encoding and computational underpinnings of the acoustic and linguistic features that enable speech perception; investigating this process in various auditory areas at high resolution will extend our knowledge of human speech perception and produce new insights into the mechanisms of speech and language disorders.
Khalighinejad, Bahar; Nagamine, Tasha; Mehta, Ashesh et al. (2017) NAPLIB: AN OPEN SOURCE TOOLBOX FOR REAL-TIME AND OFFLINE NEURAL ACOUSTIC PROCESSING. Proc IEEE Int Conf Acoust Speech Signal Process 2017:846-850 |
O'Sullivan, James; Chen, Zhuo; Herrero, Jose et al. (2017) Neural decoding of attentional selection in multi-speaker environments without access to clean sources. J Neural Eng 14:056001 |
Khalighinejad, Bahar; Cruzatto da Silva, Guilherme; Mesgarani, Nima (2017) Dynamic Encoding of Acoustic Features in Neural Responses to Continuous Speech. J Neurosci 37:2176-2185 |
Chen, Zhuo; Luo, Yi; Mesgarani, Nima (2017) DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION. Proc IEEE Int Conf Acoust Speech Signal Process 2017:246-250 |
Luo, Yi; Chen, Zhuo; Hershey, John R et al. (2017) DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER. Proc IEEE Int Conf Acoust Speech Signal Process 2017:61-65 |
Yildiz, Izzet B; Mesgarani, Nima; Deneve, Sophie (2016) Predictive Ensemble Decoding of Acoustical Features Explains Context-Dependent Receptive Fields. J Neurosci 36:12338-12350 |
Moses, David A; Mesgarani, Nima; Leonard, Matthew K et al. (2016) Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity. J Neural Eng 13:056004 |
Räsänen, Okko; Nagamine, Tasha; Mesgarani, Nima (2016) Analyzing Distributional Learning of Phonemic Categories in Unsupervised Deep Neural Networks. Cogsci 2016:1757-1762 |
Khalighinejad, Bahar; Long, Laura Kathleen; Mesgarani, Nima (2016) Designing a hands-on brain computer interface laboratory course. Conf Proc IEEE Eng Med Biol Soc 2016:3010-3014 |
Hullett, Patrick W; Hamilton, Liberty S; Mesgarani, Nima et al. (2016) Human Superior Temporal Gyrus Organization of Spectrotemporal Modulation Tuning Derived from Speech Stimuli. J Neurosci 36:2014-26 |
Showing the most recent 10 out of 11 publications