Discrete Time Event Sequences (DTES) are ordered event sequences with a concrete timestamp associated with each event. DTES are ubiquitous in our daily life. One representative example is patient electronic health records. Computational modeling of DTES can reveal the hidden event evolving mechanisms and improve the performance of endpoint analytical tasks such as sequence forecasting and grouping. Conventional approaches for analyzing DTES are typically based on strong statistical assumptions and may not work well in practice. Motivated by the recent empirical success of deep learning methods in various application domains, the objective of this project is to develop interpretable deep learning approaches for modeling DTES. This project validates the utility of the developed algorithms in various medical applications. It incorporates the resulting research outcomes into curriculum development and courses, to train a new generation of machine learning and data mining practitioners. In addition, special training opportunities are provided to high school students and community college students for a broader education of modern data analysis techniques.

This project consists of three synergistic research thrusts. First, it develops a series of approaches for integrating external domain knowledge into the modeling process. This guarantees the learned models align well with the domain knowledge and at the same time provides effective regularizations to avoid overfitting. Second, it devises approaches based on mimic learning and pattern dissection to interpret the knowledge hidden in the learned models. This makes the learned models much more practical and reusable. Third, effective model and data sharing mechanisms are developed to transfer the knowledge across similar learning tasks. This maximizes the utilizations of the available samples for each task by leveraging the task relationships. Two key problems in medical domain, hospital readmission and disease phenotyping, are used as the target applications for validating the proposed approaches based on several real-world large-scale patient electronic health record data sets.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1750326
Program Officer
Wei Ding
Project Start
Project End
Budget Start
2018-07-01
Budget End
2023-06-30
Support Year
Fiscal Year
2017
Total Cost
$425,920
Indirect Cost
Name
Joan and Sanford I. Weill Medical College of Cornell University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10065