Sequential data are ubiquitous in domains including healthcare, cyber security, social science, and online search and recommendation systems. Deep learning techniques have recently demonstrated tremendous successes in sequential data analysis tasks, such as sequential prediction, and text understanding, and times series classification and clustering. However, modern sequential data created from many domains are becoming ever more massive, complex, and domain-specific. When handling such complex and multi-domain sequential data, existing deep learning methods are limited in capturing long-term dependency and generalizing to multiple domains. To bridge such a gap, this project will develop principled algorithms and methodologies that can handle data heterogeneity and long-term dependencies for analyzing complex and multi-domain sequential data. The proposed framework will be generic to various types of sequential data including human language, time series, and trajectory data. It will open new possibilities of enabling deep learning techniques for more challenging sequential data analysis applications in social network analysis, clinical care, smart transportation, text mining, and natural language processing.

The proposed framework leverages the popular transformer architecture and incorporates multi-domain adaptation to handle the data heterogeneity. Specifically, this project aims to develop (i) unsupervised learning techniques for transformer-based multi-domain point process analysis, (ii) transformer-based metric learning techniques that enable large-scale and multi-domain time series analysis, and (iii) techniques for robust and efficient domain transfer learning over pre-trained transformers. The developed techniques will enjoy both computational efficiency and modeling flexibility of capturing long-term dependency and data heterogeneity, by addressing the computational and statistical challenges for these problems. The proposed research will also deliver open-source software in the form of easy-to-use libraries, which facilitate researchers and practitioners in related fields to analyze complex sequential data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2008334
Program Officer
Wei-Shinn Ku
Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$327,169
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332