In crisis response domains, emotional manifestations are very complex and extreme emotions are common. While speech technologies have shown significant progress over the years, recognizing and understanding emotional speech in noisy environments is still a big challenge. Understanding such language is daunting given fragmented and ungrammatical utterances in addition to errors from automatic speech recognition (ASR). Furthermore, there is little research on the analysis of the relationship between emotion detection and language understanding which have traditionally been viewed as parallel independent tasks. Even when the output of one of these tasks is used as an input feature to the other, typically, during training a "true" classification is used instead of the "estimated" classes (as would be the case if the system were to be used in a real-setting) resulting in a mismatch and degraded performance. This project attempts to overcome such a limitation of current approaches by focusing on analyzing the degree of the dependencies between emotion and intent and investigating joint classification methods via multitask learning for language understanding and emotion detection tasks.
The primary intellectual merit of the project is integrated research on developing an end-to-end information processing system that has the potential to very significantly impact the crisis response process. For speech processing, communucation between individuals and emergency dispatch personnel as well as communications during responders during response pose a big challenge since callers are typically very emotional and the language used reflects that. Processing such speech requires significant research, and this project can is a first step toward this genre.