Latent variable models (LVMs), which extract hidden information, such as topics, themes, or disease patterns, from raw data, play an important role in electronic health record (EHR) management and applications. With the dramatic increase of the volume and complexity of EHR data, current LVMs face several new challenges, including inadequacy in capturing rare patterns existing in only small number of patients in a population (also known as long tail patterns), redundancy amongst patterns being discovered, and low computational efficiency, which all seriously impair the value of EHR data in driving high-quality personalized medicine. There is a critical need in developing new methods to transform conventional LVMs to ones that can circumvent such limitations so that the EHR data can be more effectively and reliably used for healthcare applications. This project addresses this need and develops a new technique known as "diversity-inducing machine learning models", which promote rare patterns and condense redundant patterns, at high computational efficiency, to enable more effective pattern discovery and knowledge extraction from complex and heterogeneous (e.g., textual, image, and time series) EHR data.

Specifically, this project contains the following research components: 1. Develop a new regularized LVM learning framework that allows the basis of the latent space to favor a more diversity-inducing geometry and less redundancy, thereby accomplish long-tail pattern coverage and better interpretability for both Euclidean and Hilbert space settings. 2. Develop a diversity-promoting Bayesian LVM learning framework that enables efficient inference of posteriors probability distributions to facilitate quantization of uncertainty and alleviate over fitting. 3. Theoretically analyze the diversity-inducing techniques proposed in 1 and 2 to understand how these techniques affect the generalization errors in supervised LVMs, posterior contraction rate in unsupervised LVMs, and the information geometry of the distributions induced by LVMs. 4. Apply the diversified LVMs to healthcare applications. This project also provides rich opportunities for multi-disciplinary education and research training, at both undergraduate, graduate, and professional levels.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1617583
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2016-09-01
Budget End
2021-08-31
Support Year
Fiscal Year
2016
Total Cost
$499,361
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213