Discovering and Applying Knowledge in Clinical Databases

Hripcsak, George

Abstract

The long-term goal of our ongoing project, ?Discovering and applying knowledge in clinical databases,? is to learn from data in the electronic health record (EHR) and to apply that knowledge to understand and improve health. The EHR, because of its broad capture of human health, greatly amplifies our ability to carry out observational research, opening the possibility of covering emerging problems, diverse populations, rare diseases, and chronic diseases in long-term longitudinal studies. Unfortunately, the strength of EHR data?its breadth and flexible nature?imposes additional challenges. We have found that the biggest challenge comes from the inaccuracy, incompleteness, complexity, and resulting bias inherent in the recording of the health care process. We previously showed that health care process bias exists to the extent, for example, that simple use of the data can create signals implying the opposite of what we know to be true. One of the most important factors is sparse, irregular sampling; we found that sampling bias can be reduced by reparameterizing time and that prediction techniques that can accommodate EHR-specific data and resist their biases like data assimilation can be used on EHR data to produce good estimates of glucose and HA1c. The previous cycle of this project produced 75 publications. We propose to develop methods to accommodate health care process bias, using both knowledge engineering and experience with health care process bias as well as advanced statistical techniques that employ dynamical models and latent variables. We hypothesize that heuristics and models combined with knowledge can improve our ability to generate inferences and learn phenotypes despite health care process bias.
Our aims are as follows: (1) Taking a knowledge engineering approach, study the effect of preprocessing and analytic choices on reducing health care process bias, and using machine learning techniques, learn more about health care process bias. (2) Taking a more empirical approach, use dynamic latent factor modeling and variation inference to accommodate health care process bias, learning how a patient's health state and health processes affect censoring, exploiting information from many variables at once. (3) Use data assimilation and mechanistic models to learn otherwise unmeasurable physiologic phenotypes despite irregular, sparse sampling typical of electronic health records. (4) Use the developed models and generated phenotypes to answer clinical questions, and disseminate the results.

Public Health Relevance

This project studies the biases that health care processes bring to electronic health record data, and it develops methods to overcome those biases to improve reuse of the data for purposes such as clinical research and quality improvement.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM006910-20
Application #: 9873996
Study Section: Special Emphasis Panel (ZLM1)
Program Officer: Sim, Hua-Chuan

Project Start: 2000-04-01
Project End: 2024-02-28
Budget Start: 2020-03-01
Budget End: 2021-02-28
Support Year: 20
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects

Publications

Vilar, Santiago; Friedman, Carol; Hripcsak, George (2018) Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 19:863-877

Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273

Grossman, Lisa V; Mitchell, Elliot G; Hripcsak, George et al. (2018) A method for harmonization of clinical abbreviation and acronym sense inventories. J Biomed Inform 88:62-69

Schuemie, Martijn J; Hripcsak, George; Ryan, Patrick B et al. (2018) Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data. Proc Natl Acad Sci U S A 115:2571-2577

Levine, Matthew E; Albers, David J; Hripcsak, George (2018) Methodological variations in lagged regression for detecting physiologic drug effects in EHR data. J Biomed Inform 86:149-159

Albers, D J; Elhadad, N; Claassen, J et al. (2018) Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms. J Biomed Inform 78:87-101

Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla et al. (2018) Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell 173:1692-1704.e11

Sottile, Peter D; Albers, David; Higgins, Carrie et al. (2018) The Association Between Ventilator Dyssynchrony, Delivered Tidal Volume, and Sedation Using a Novel Automated Ventilator Dyssynchrony Detection Algorithm. Crit Care Med 46:e151-e157

Schuemie, Martijn J; Ryan, Patrick B; Hripcsak, George et al. (2018) Improving reproducibility by using high-throughput observational studies with empirical calibration. Philos Trans A Math Phys Eng Sci 376:

Sottile, Peter D; Albers, David; Moss, Marc M (2018) Neuromuscular blockade is associated with the attenuation of biomarkers of epithelial and endothelial injury in patients with moderate-to-severe acute respiratory distress syndrome. Crit Care 22:63

Showing the most recent 10 out of 120 publications

Comments

Be the first to comment on George Hripcsak's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: