Discovering and Applying Knowledge in Clinical Databases

Hripcsak, George

Abstract

With the advent of improved clinical information system products (e.g., ambulatory systems, order entry systems), improved data entry technologies (e.g., speech recognition, text processing techniques), and further adoption of data interchange standards, more institutions are generating electronic medical records, and these records will expand in breadth, depth, and degree of coding in the future. The records are used mainly for individual patient care, but exploiting the records for clinical research and quality functions has lagged behind. Major challenges include the wide range of complex data and missing and inaccurate data. We propose to continue our work to develop and test methods to mine a clinical data repository. A special emphasis will be to exploit the vast amount of information in the repository (latent associations and knowledge) and to use computer intensive techniques and advances in data representation and manipulation to better interpret what is in the database and to overcome the challenges of complex, missing, and inaccurate data. We hypothesize that data mining techniques can be applied to a repository to generate accurate clinical interpretations. We further hypothesize that associations latent in a clinically rich repository can be used to improve the classification of cases in that repository.
We aim to develop methods to prepare data for mining; to characterize the information in the clinical data repository; to develop similarity measures based on manipulation of natural language processor output and on information retrieval techniques; to apply nearest neighbor technique and case-based reasoning to improve classification; to develop a statistically based method to improve classification of cases with incomplete or inaccurate data; and to apply our methods to real clinical research questions and carry out additional data mining research. The researchers in the Department of Medical Informatics at Columbia University are uniquely positioned to carry out this research, given the experience of the team (data mining, statistics, health data organization, health knowledge representation, natural language processing), the availability of a repository of 13 years of data on 2 million patients, and the availability of a natural language processor called MedLEE to convert millions of narrative reports into richly coded clinical data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project (R01)
Project #: 5R01LM006910-05
Application #: 6754395
Study Section: Special Emphasis Panel (ZLM1-MMR-W (J2))
Program Officer: Sim, Hua-Chuan

Project Start: 2003-06-01
Project End: 2006-05-31
Budget Start: 2004-06-01
Budget End: 2005-05-31
Support Year: 5
Fiscal Year: 2004
Total Cost: $380,979
Indirect Cost

Institution

Name: Columbia University (N.Y.)
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 621889815

City: New York
State: NY
Country: United States
Zip Code: 10032

Related projects

Publications

Polubriaginof, Fernanda C G; Vanguri, Rami; Quinnies, Kayla et al. (2018) Disease Heritability Inferred from Familial Relationships Reported in Medical Records. Cell 173:1692-1704.e11

Sottile, Peter D; Albers, David; Higgins, Carrie et al. (2018) The Association Between Ventilator Dyssynchrony, Delivered Tidal Volume, and Sedation Using a Novel Automated Ventilator Dyssynchrony Detection Algorithm. Crit Care Med 46:e151-e157

Schuemie, Martijn J; Ryan, Patrick B; Hripcsak, George et al. (2018) Improving reproducibility by using high-throughput observational studies with empirical calibration. Philos Trans A Math Phys Eng Sci 376:

Sottile, Peter D; Albers, David; Moss, Marc M (2018) Neuromuscular blockade is associated with the attenuation of biomarkers of epithelial and endothelial injury in patients with moderate-to-severe acute respiratory distress syndrome. Crit Care 22:63

Vilar, Santiago; Friedman, Carol; Hripcsak, George (2018) Detection of drug-drug interactions through data mining studies using clinical sources, scientific literature and social media. Brief Bioinform 19:863-877

Ta, Casey N; Dumontier, Michel; Hripcsak, George et al. (2018) Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records. Sci Data 5:180273

Grossman, Lisa V; Mitchell, Elliot G; Hripcsak, George et al. (2018) A method for harmonization of clinical abbreviation and acronym sense inventories. J Biomed Inform 88:62-69

Schuemie, Martijn J; Hripcsak, George; Ryan, Patrick B et al. (2018) Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data. Proc Natl Acad Sci U S A 115:2571-2577

Levine, Matthew E; Albers, David J; Hripcsak, George (2018) Methodological variations in lagged regression for detecting physiologic drug effects in EHR data. J Biomed Inform 86:149-159

Albers, D J; Elhadad, N; Claassen, J et al. (2018) Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms. J Biomed Inform 78:87-101

Showing the most recent 10 out of 120 publications

Comments

Be the first to comment on George Hripcsak's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: