This project creates novel methods and tools for the analysis of large-scale Electronic Health Record (EHR) data. Models of disease, or phenotypes, are derived from a large collection of patient characteristics, as recorded in the EHR. To assess their value and robustness in a clinical application, the phenotypes are incorporated into a longitudinal patient record summarization system for clinicians at the point of patient care.

The research for this project contributes to two inter-related outcomes: (i) a probabilistic graphical model of a patient record, specifically a Latent Dirichlet Allocation (LDA) model of the patient phenotypes. Models that can handle the heterogeneous data types in the EHR, along with their challenges, such as sparseness and artificial redundancy are investigated. For the models to be useful in the clinical world, they must be interpretable by humans, easily adaptable for EHR-driven applications, and clinically relevant. This is achieved by specifying prior clinical knowledge into the models and learning from clinicians' feedback automatically; and (ii) a patient record summarizer for clinicians at the point of patient care. The summarizer leverages the probabilistic patient model and learns new models of salience through the clinicians' interactions with the deployed summarizer, in essence learning relevance of different patient phenotypes. For the evaluation of the phenome model and the summarizer, particular care is given to assessing their value in a real-world clinical setting, at the point of care.

The research builds on and is translated into deliverables that are robust and are inter-operable with the EHR of a large hospital in New York City. If successful, the availability of interpretable and actionable patient models can impact drastically both EHR-driven research activities and patient care, through better tools for clinicians. Finally, the project introduces students in the field of medicine to STEM activities, while presenting real-world, exciting application to STEM students.

For further information see the project website at: http://people.dbmi.columbia.edu/noemie/phenosum

Project Start
Project End
Budget Start
2014-02-15
Budget End
2020-01-31
Support Year
Fiscal Year
2013
Total Cost
$1,994,224
Indirect Cost
Name
Columbia University
Department
Type
DUNS #
City
New York
State
NY
Country
United States
Zip Code
10027