With the advances in data collection techniques, large amounts of multimodal data collected from multiple sources are widely available, including text, images, video, and audio. Such multimodal data can provide complementary information that can reveal the fundamental characteristics of applications. Thus, multimodal machine learning which builds models that can process and relate information from multiple modalities has become an active research area. Extensive works have been developed to combine different modalities, learn joint representations to exploit complementarity and redundancy of multiple modalities, and fuse information to perform prediction tasks. However, effectively integrating and analyzing multimodal data remains a challenging problem, especially when the data is incomplete. Missing modality is a common issue in real-world multimodal data, which can be caused by various reasons such as sensor damage, data corruption, and human mistakes in recording. Such incomplete data imposes significant challenges to multimodal machine learning. This project develops a multimodal machine learning framework that formulates a new fundamental structure to facilitate the complex information extraction and integration from multimodal data with incomplete modalities for classification and prediction. This project has significant potential to advance the theory and practice of multimodal machine learning, with strong implication in targeted domains such as Information Systems, Engineering, and Biomedicine.

Technically, this project develops a framework that exploits the powerful representational features of graph based structure to model the complex interaction between heterogeneous datasets, promoting deep information fusion and sustainable data analysis. The proposed multimodal machine learning approach formulates a new fundamental framework that enables the complex information extraction and integration from multimodal data with incomplete modalities. A multi-level hypergraph structure is designed to model the multimodal data with incompleteness, and a multistage data fusion framework is developed to enable a transductive learning process through which all heterogeneous data points with different missing data conditions are projected into the same embedding space and multi-modalities are fused along the way. The proposed method models the complex intra and inter modality relationships, extract complementary multimodal information, and fuse the information from multiple subspaces to a unified representation. The proposed research develops a unique strategy for learning on data with incomplete modalities, without data deletion or data imputation. The proposed research enables the information from incomplete modalities to be effectively included in the learning process when one or more modalities are missing. Moreover, the proposed interpretation method utilizes the rich semantics learned from the data to interpret the model behaviors behind a particular prediction decision. This increases the transparency of the proposed approach and also contributes towards explainable machine learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$500,000
Indirect Cost
Name
University of Virginia
Department
Type
DUNS #
City
Charlottesville
State
VA
Country
United States
Zip Code
22904