The long term goal of our ongoing project, "Discovering and applying knowledge in clinical databases," is to learn from data in the electronic health record (EHR) and to apply that knowledge to relevant problems. The increasing adoption of the EHR promises to provide data for clinical research and informatics research, but secondary use of the data has been limited. Challenges include the complexity, incompleteness, and inaccuracy of the record. We propose to study the EHR from an information theoretic point of view, treating the EHR as a natural object worthy of study, and applying methods from non-linear time series analysis. Armed with a better understanding of the record, we hope to measure and account for data completeness and to improve interpretation and use of the data. We hypothesize that we can characterize an electronic health record using a formal information theoretic framework, and that the measured properties can help answer informatics and clinical questions.
Our aims are to (1) develop an information theoretic framework for characterizing the electronic health record, (2) use the information theoretic framework to study EHR and sampling issues, and (3) use the framework and traditional data mining to answer clinical and informatics questions. We will approach the EHR as a complex time series and characterize the information in the record using univariate sequential mutual information (the degree to which observations of a variable predict future observations) and a network of pair-wise mutual information among all variables, discrete and continuous. The result will be a measure of the predictability of the record and a set of associations among clinical features. We will use the predictability results to study the completeness of a patient's record, the appropriateness of a clinician's sampling rate, outlier data points, and changes in patient acuity. We will use predictability and associations to link narrative abstractions with their primary data, to interpret narrative modifiers, to cluster terms, to find associations (in the context of phenome-wide association studies), and to carry out exploratory analyses of defining phenotype profiles and of mutual information-based surveillance.

Public Health Relevance

This project studies the electronic health record, looking at properties like predictability and association, in order to learn whether records are complete and accurate and to learn how health data are organized. This, in turn, should lead to improved reuse of the data for purposes such as clinical research and informatics research.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Green, Robert A; Hripcsak, George; Salmasian, Hojjat et al. (2015) Intercepting wrong-patient orders in a computerized provider order entry system. Ann Emerg Med 65:679-686.e1
Overby, Casey Lynnette; Hripcsak, George; Shen, Yufeng (2014) Estimating heritability of drug-induced liver injury from common variants and implications for future study designs. Sci Rep 4:5762
Walsh, Colin; Hripcsak, George (2014) The effects of data sources, cohort selection, and outcome definition on a predictive model of risk of thirty-day hospital readmissions. J Biomed Inform 52:418-26
Albers, D J; Elhadad, NoƩmie; Tabak, E et al. (2014) Dynamical phenotyping: using temporal analysis of clinically collected physiologic data to stratify populations. PLoS One 9:e96443
Claassen, Jan; Albers, David; Schmidt, J Michael et al. (2014) Nonconvulsive seizures in subarachnoid hemorrhage link inflammation and outcome. Ann Neurol 75:771-81
Pivovarov, Rimma; Albers, David J; Sepulveda, Jorge L et al. (2014) Identifying and mitigating biases in EHR laboratory tests. J Biomed Inform 51:24-34
Pivovarov, Rimma; Albers, David J; Hripcsak, George et al. (2014) Temporal trends of hemoglobin A1c testing. J Am Med Inform Assoc 21:1038-44
Li, Ying; Salmasian, Hojjat; Vilar, Santiago et al. (2014) A method for controlling complex confounding effects in the detection of adverse drug reactions using electronic health records. J Am Med Inform Assoc 21:308-14
Weng, C; Li, Y; Ryan, P et al. (2014) A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records. Appl Clin Inform 5:463-79
Boland, Mary Regina; Hripcsak, George; Albers, David J et al. (2013) Discovering medical conditions associated with periodontitis using linked electronic health records. J Clin Periodontol 40:474-82

Showing the most recent 10 out of 70 publications