The objective of this research is to advance our understanding of prosodic relationships and their synchronizations between verbal and nonverbal communication modes. Prosody and kinesics such as hand gestures, head nods, facial expressions, and posture, play a crucial role in everyday communication by adding expressiveness, as well as by structuring information. Although there are different types and various levels of synchronization across these modalities, their exact mapping remains unclear. Whereas cross-modal synchronization between verbal and nonverbal modalities has been explored mostly at the semantic and discourse levels to date, in this work the PI will focus on the interplay between them at various levels of granularity. In particular, co-analyses of speech, using language and discourse models, with kinesics will be used to uncover prosodic correspondences which, in turn, will be used to develop novel algorithms for modeling dialog acts, emotions and other kinesics. The two primary goals of the project are to develop computational methods and software tools to iteratively uncover prosodic relationships between nonverbal and verbal behaviors, and to use derived prosodic relationships and their synchronizations to develop novel computational methods and software tools for the robust recognition of gestures, facial expressions, emotions, head nods, and dialog acts. The PI hopes to thereby lay the foundation for a framework for co-analysis of multimodal articulations to obtain a deeper understanding of (a) how the nucleus of an utterance and visual prosody interact to render the intent of the utterance, and (b) how synchronization with other modalities affects the production of multimodal co-articulation. To further improve the robustness and recognition accuracies, a set of classifiers will be designed and fused by taking into account the diversity among them. Systematic methods will be developed to evaluate the classifiers using various performance metrics (i.e., precision, recall, F-measures), graphical analyses and measure functions. The outcomes of the research, together with the PI's prior work, will ultimately enable the development of a perceptual interface for AutoTutor (an artificially intelligent web-based tutoring system), providing a natural means to interact with multimedia contents for instruction.

Broader Impact: The results of this research will have profound impact on the understanding and tracking of multimodal communications in humans and agents. The interplay between the complementary modalities and prosodic manifestations of their synchronization will also broaden the understanding of multi-channel communications in cognitive science, discourse processing, linguistics, and human-machine interaction, which will enable the development of innovative applications such as collaborative environments for agents and humans, and assistive technologies for the elderly and disabled. The long-term vision of the proposed research is to develop a perceptual interface for web-based tutoring systems such as AutoTutor. Use of an enhanced artificially intelligent web-based tutor offers significant opportunities for improving the math and science preparation of incoming engineering and science undergraduates of the Memphis City Schools and other regional or national clients. The PI will also create an online collaborative learning environment, using newer frameworks such as Web 2.0, to organize a massive amount of digital contents in such a way that communities of learners can effectively share and co-manage the information. The software and databases developed as part of this project will be made available to other researchers through the project website.

Project Report

This report highlights the major activities, specific objectives significant results, and key achievements of the Year 2013-2014. A number of new initiatives were taken to solve some of the critical problems in meeting the stated objective. In particular, we have strived to: (i) Assistive Technology for People who are Blind or Visually Impaired, (ii) Modeling of Cognitive Ability-Demand Gaps in Collaborative Sense-making to develop Assistive Technology Solutions, (iii) Semi-automated system to annotate large audio-visual database, (iii) develop software system to co-analyze features for both the audio and video data, (iv) Modality and Task Independent Cognitive Load for Assistive Technology and (v) Tool for Integration and Mining of Big Data in Biomedicine. Finding of this research is being published in well-reputed journals and conferences and many of the research outcomes are directly applied to the NSF RESSEE proposal entitled "Contextual Research-Empirical Research--Detecting, Tracking, and Modeling Cognitive, Affective, and Meta-cognitive Regulatory Processes to Optimize Learning with MetaTutor, where Dr. Yeasin is Co-PI. During the period (05/01/2013 to 04/30/2014) several new initiatives were taken to build on top of the prior years research accomplishments. The PI and the research team made significant effort in developing assistive technology solutions for people who are blind or visually impaired. Culminating research effort in this period produces: (i) Five (5) Journal paper, (ii) Ten (10) Peer reviewed conference papers, (iii) Two (2) Journal abstract, (iv) Two (2) Ph.D. Dissertation and (v) Two (2) MS Thesis. List of Publications during the reporting period (May 2013 –April 2014): Journal Papers: AKM Mahbubur Rahman, ASM Iftekhar Anam and Mohammed Yeasin, "EmoAssist: Emotion Enabled Assistive Tool to Enhance Dyadic Conversation for the Visually Impaired, IEEE Transaction on Affective Computing, 2014 (in Press). AKM Mahbubur Rahman, and Mohammed Yeasin,"A Unified Framework for Dividing and Predicting a Large Set of Action Units", IEEE Transaction on Affective Computing, 2014 (In Press). M. Iftekhar Tanveer, A.S.M. Iftekhar Anam and Mohammed Yeasin, "Designing a Technology for the Blind Users to Understand Others Facial Expressions," ACM Transaction of Accessible Computing (TACCESS), 2014 (in Press). G. Hossain and M. Yeasin,"Cognitive Ability-Demand Gap Analysis with Latent Response Models", IEEE Transactions on ACCESS, EEE (DOI) - 10.1109/ACCESS.2014.2339328, 2014. G. Hossain and M. Yeasin,"Assistive Thinking: Integrating System Dynamics into Design Thinking Approach", IEEE Journal of Systems, 2014 (in Press). Conference Papers: Expression: A Dyadic ConversationAid using Google Glass for Peoplewith Visual Impairments, Anam, ASM Iftekhar and Alam, Shahinur and Yeasin, To appear in ACM Ubiquitous computing 2014 Adjunct Publicaiton, Seattle, WA, Sept 13-17 M. Iftekhar Tanveer, A.S.M. Iftekhar Anam, Mohammed Yeasin and Majid Khan, "Do You See What I See? Designing a Sensory Substitution Device to Access Non-verbal Modes of Communication", To appear in proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’13), Bellevue, Washington, 2013. A K M M. Rahman, ASM. I. Anam, M. I. Tanveer, and M. Yeasin, "EmoAssist: A Real-time Social Interaction Tool to assist the Visually Impaired", In Proceedings of the 15th International Conference on Human- Computer Interaction (HCII 2013), Las Vegas, NV. G. Hossain and M. Yeasin,"Understanding Effects of Cognitive Load from Pupillary Responses Using Hilbert Analytic Phase," in Proc. IEEE Workshop on Vision Meets Cognition (in conjunction with CVPR 2014), June 23, 2014. G. Hossain and M. Yeasin, "Assistive Thinking: A New Approach in Assistive Technology Design in Disability Management", IEEE 1st International Conference on Technology for Helping People with Special Needs (ICTHP-2013), February 18-20, 2013, Riyadh, KSA. G. Hossain and M. Yeasin, "Collaboration Gaps in Disabilities Sense-making: Deaf and Blind Communication Perspective", Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW) CIS Workshop 2013, Feb 24, 2013, San Ontario, TX. P.Bashivan, Gavin M Bidelman, Mohammed Yeasin, "Predicting working memory capacity using spectro-temporal characteristics of the oscillatory EEG", Cognitive Neuroscience Society (CNS) Annual Meeting, April 2014 P. Bashivan, Gavin M Bidelman, Mohammed Yeasin, "Neural correlates of visual working memory load through unsupervised spatial filtering of EEG", Proceedings of 3rd NIPS workshop on Machine Learning and Interpretation in Neuroimaging (MLINI13), 2013. V. Abedi, M. Yeasin, R. Zand, "Context-Sensitive Use of Bioinformatics Tools with Complementary Functionalities for Generation of Relevant Hypothesis, International Conference on Computational Advances in Bio and Medical Sciences ICCABS, Miami, USA, June 12-14, 2014. V. Abedi, M. Yeasin, R. Zand, "ARIANA: Adaptive Robust and Integrative Analysis for finding Novel Associations," International Conference on Advances in Big Data Analytics. Las Vegas NV, USA, July 2014. The NSF career project has made it possible to recruit, train and retain twenty three (23) students (18 graduate students and 5 UG students). In particular, eleven (11) students were from underrepresented groups. The PI made significant effort training students to conduct research and build collaboration with the "Clovernook" (a learning center for the people who are blind or visually impaired) to develop assistive solutions.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0746790
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2008-05-01
Budget End
2014-04-30
Support Year
Fiscal Year
2007
Total Cost
$494,919
Indirect Cost
Name
University of Memphis
Department
Type
DUNS #
City
Memphis
State
TN
Country
United States
Zip Code
38152