This CAREER project is focused on voice source extraction directly from recorded speech signals for the enhancement of acoustical models in speaker affect analysis. The project has four primary research goals. The first goal is the creation of new techniques in voice source estimation from the acoustic speech signal without the use of auxiliary devices (e.g., EGG's). This is accomplished through the development of metrics for assessing glottal quality and choosing estimates that reflect the best approximation of the voice source given the available speech data. A second goal is the automated parameterization of the voice source in both the time and frequency domain allowing the shape and characteristics of the voice source to be quantified for analysis. The third goal is to integrate the parameterized voice source into an acoustical framework for describing the characteristics of a speaker's affective/emotional state. Current databases consisting of representations of speaker emotion, deception, and stress are under analysis for the incorporation of voice source features not previously available. The final goal of the project is the creation and dissemination of a robust set of extraction and analysis tools of the voice source to the scientific community to encourage and enable the analysis of voice source parameters in all forms of speech research. The project contributes greatly to speech analysis applications by enhancing the theoretical knowledge of assessing the quality of voice source estimates and providing the tools to enhance the acoustical models of the voice. Additionally, the project allows for the design of educational demonstrations for outreach activities focused on improving community exposure to science and engineering research and increasing the participation of underrepresented groups.

Project Report

The core goal of the work in this project has been to take part in the investigation of technology that can provide analysis of the human state aside from the most basic objective analysis. Current technology in practice can extract the content of speech (speech recognition) and determine who is speaking (facial recognition) but it can not analyze the affective state of the person. Such an analysis can be important in law enforcement, health monitoring, and general customer service applications. The focus of the work in this proposal has been on the extraction and integration of voice source features (related to the vocal fold motion) in the analysis and classification of emotion in speech. Intellectual Merit The work in this proposal has produced several outcomes of relevance to the analysis of emotion in speech including: 1) methodologies for the extraction of voice source information, 2) analysis of pertinent features related various states of affect (including various emotions and deception), and 3) strategies for performing cross-database training and testing with the hope of improving the ability to create a stable emotion classification model. Broader Impact All results of the project have been and are continuing to be disseminated through publication and presentations. The algorithms developed during the course of project are freely available upon request. Additionally, the PI has worked extensively in STEM outreach that has produced a number of MATLAB based demonstrations and presentations on signal processing concepts that are freely available upon request.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0545772
Program Officer
Tatiana D. Korelsky
Project Start
Project End
Budget Start
2006-02-01
Budget End
2012-09-30
Support Year
Fiscal Year
2005
Total Cost
$529,911
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332