The ongoing goal of our project, """"""""Discovering and applying knowledge in clinical databases,"""""""" is to develop and apply methods to exploit electronic medical record data for decision support, with an emphasis on narrative data. Since the inception of our project as an R29 in 1994, we have been developing methods for preparing raw electronic medical record data, applying and evaluating natural language processing, developing data mining techniques including machine learning, and putting the results to use for clinical care and research. ? ? In this competing continuation, we propose to address the temporal information in the electronic medical record and to apply natural language processing and temporal processing to the task of syndromic surveillance in collaboration with the New York City Department of Health and Mental Hygiene (NYC DOHMH). ? ? We have begun work on a temporal processing system. It extracts temporal assertions stated in narrative reports, uses the MedLEE natural language processor to parse the non-temporal information, infers implicit temporal assertions based on a knowledge base, and produces the information in the form of a simple temporal constraint satisfaction problem. The latter can be used to answer questions about the time of events and the temporal relation between pairs of events. We propose to complete the system, expand the knowledge base, speed computation, address the uncertainty of temporal assertions, incorporate temporal information from structured data, and evaluate the system. ? ? NYC DOHMH has a mature syndromic surveillance system that watches over almost eight million persons, and it has as-yet unexploited data sources in the form of narrative and structured electronic medical records. We propose to apply natural language processing and our proposed temporal processing to convert the data to a form appropriate for surveillance. We will evaluate the incremental benefit of structured data, narrative data, and temporally processed narrative data. ? ? ?

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Internal Medicine/Medicine
Schools of Medicine
New York
United States
Zip Code
Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla et al. (2018) Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell 173:1692-1704.e11
Sottile, Peter D; Albers, David; Higgins, Carrie et al. (2018) The Association Between Ventilator Dyssynchrony, Delivered Tidal Volume, and Sedation Using a Novel Automated Ventilator Dyssynchrony Detection Algorithm. Crit Care Med 46:e151-e157
Schuemie, Martijn J; Ryan, Patrick B; Hripcsak, George et al. (2018) Improving reproducibility by using high-throughput observational studies with empirical calibration. Philos Trans A Math Phys Eng Sci 376:
Sottile, Peter D; Albers, David; Moss, Marc M (2018) Neuromuscular blockade is associated with the attenuation of biomarkers of epithelial and endothelial injury in patients with moderate-to-severe acute respiratory distress syndrome. Crit Care 22:63
Vilar, Santiago; Friedman, Carol; Hripcsak, George (2018) Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 19:863-877
Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273
Grossman, Lisa V; Mitchell, Elliot G; Hripcsak, George et al. (2018) A method for harmonization of clinical abbreviation and acronym sense inventories. J Biomed Inform 88:62-69
Schuemie, Martijn J; Hripcsak, George; Ryan, Patrick B et al. (2018) Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data. Proc Natl Acad Sci U S A 115:2571-2577
Levine, Matthew E; Albers, David J; Hripcsak, George (2018) Methodological variations in lagged regression for detecting physiologic drug effects in EHR data. J Biomed Inform 86:149-159
Albers, D J; Elhadad, N; Claassen, J et al. (2018) Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms. J Biomed Inform 78:87-101

Showing the most recent 10 out of 120 publications