The long term goal of this project is to improve the standard-of-care of patients and build clinicians? trust in utilizing advanced machine learning and artificial intelligence tools for computational healthcare. The national push for Electronic Health Records through the 2009 Health Information Technology for Economic and Clinical Health (HITECH) Act and the recent advances of wearable sensor technologies has resulted in an exponential surge in volume, detail, and availability of digital health data. This provides an exciting opportunity for researchers, healthcare professionals, and the patients alike to infer richer, data-driven understanding of health and illness. However, unlike other data types, healthcare data is inherently noisy, has missing values, and comes from multiple heterogeneous sources such as lab tests, doctor notes, medical images, and monitor readings. These data properties make it very challenging for most existing machine learning approaches and statistical models to discover meaningful patterns of diseases or to make robust predictions. To address these challenges, this project will develop and validate novel data-driven methods based on powerful deep learning techniques to model the complex correlations and patterns present in the healthcare data. In particular, the proposed data-driven methods will learn disease-specific and patient-specific feature patterns from heterogeneous and limited healthcare data. The effectiveness of the proposed data-driven methods will be showcased on challenging and important healthcare prediction tasks such as early prediction of sepsis and predicting the outcome of Intensive Care Units patients.
This project will advocate a model-based data-driven paradigm shift for computational healthcare, and it will focus on addressing the main challenges of analyzing healthcare data, i.e., heterogeneity and limited dataset size, by developing novel data-driven methods. The proposed data-driven methods will accelerate medical discovery and aid in clinical decision making in several ways: (a) connect and learn from the disconnected heterogeneous piles of healthcare data; (b) yield new representations of illness/diseases, and (c) build clinicians? trust in the data-driven models. The technical aims of the project are divided into three thrusts. The first thrust will focus on developing a novel deep learning framework to learn shared feature representations from heterogeneous healthcare data. Specifically, the researchers will employ machine learning approaches, such as multi-view learning and correlation analysis, to exploit the correlation structures present within and across different healthcare data sources. In addition, adversarial training based domain adaptation techniques will be used to learn joint feature representations from multi-cohort patient populations. The second thrust will focus on feature learning from limited healthcare data by utilizing patient or task similarity networks and a few-shot learning framework. In particular, multi-task learning and embedding techniques will be used to learn feature representations from limited data available for a specific patient cohort or healthcare task. In the third thrust, model uncertainty of the proposed data-driven methods will be studied using ensembles and regularization techniques. The adequacy of the proposed data-driven solutions will be validated on real-world healthcare datasets for multiple clinically relevant prediction tasks.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.