Challenge Area and Specific Challenge Topic: This application addresses broad Challenge Area (10) Information Technology for Processing Health Care Data and specific Challenge Topic, 10-RR-101*: Information Technology Demonstration Projects Facilitating Secondary Use of Healthcare Data for Research. Clinical Data Warehouses (CDWs) archive data from electronic medical records (EMRs). Unlike EMRs, which are designed to store and retrieve data by patient (e.g., all data about John Smith), CDWs support queries across patients (e.g., percentage of patients on vs. off aspirin who develop unstable angina). CDWs are critical components of an infrastructure that enables reuse of healthcare data for research. As such, they are important enablers of comparative effectiveness research (CER). However, simply transferring healthcare data from EMRs to a CDW is not sufficient. Healthcare data, unlike clinical trial data, are not collected with a research question in mind. Thus, they may be poorly structured (e.g., free-text list of diagnoses, not a list of ICD9 terms) and contain protected health information (e.g., names, addresses) or identifying phrases such as """"""""senator with lymphoma."""""""" Our unifying hypothesis is that concept-level approaches can be applied to CDWs to bring meaning to vast amounts of healthcare data while protecting subject privacy. To test this hypothesis, we will: 1) adapt and evaluate our novel indexing system (based on graph analysis, a modification of Google's PageRank algorithm) to improve concept extraction from clinical text, 2) evaluate the privacy afforded to """"""""subjects"""""""" by working with clinical text at the concept level and 3) adapt and evaluate existing visualization techniques to visualize relationships among concept-level healthcare data, thereby facilitating exploratory data analysis by biomedical researchers. Although these aims build on each other, every individual aim can succeed even if the others fail. At the conclusion of this project we will have developed and evaluated novel concept-extraction algorithms for CER using healthcare data. We will have determined the privacy implications of working with concept-level data and developed interactive visualizations for concept-level browsing of large healthcare data sets. Many organizations are building clinical data warehouses to enable comparative effectiveness research. However, simply loading data from electronic medical records into clinical data warehouses is not enough. To enable reuse of healthcare data for research, we will develop new ways to access and visualize clinical data within data warehouses. Specifically, we will develop new ways to extract concepts from unstructured text, visualize large data sets to quickly see patterns and determine the privacy implications of our methods.

Public Health Relevance

Many organizations are building clinical data warehouses to enable comparative effectiveness research. However, simply loading data from electronic medical records into clinical data warehouses is not enough. To enable reuse of healthcare data for research, we will develop new ways to access and visualize clinical data within data warehouses. Specifically, we will develop new ways to extract concepts from unstructured text, visualize large data sets to quickly see patterns and determine the privacy implications of our methods.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
NIH Challenge Grants and Partnerships Program (RC1)
Project #
1RC1RR028254-01
Application #
7815047
Study Section
Special Emphasis Panel (ZRG1-HDM-A (58))
Program Officer
Filart, Rosemarie
Project Start
2009-09-24
Project End
2011-08-31
Budget Start
2009-09-24
Budget End
2010-08-31
Support Year
1
Fiscal Year
2009
Total Cost
$471,398
Indirect Cost
Name
University of Texas Health Science Center Houston
Department
Type
Schools of Allied Health Profes
DUNS #
800771594
City
Houston
State
TX
Country
United States
Zip Code
77225
Joffe, Erel; Byrne, Michael J; Reeder, Phillip et al. (2014) A benchmark comparison of deterministic and probabilistic methods for defining manual review datasets in duplicate records reconciliation. J Am Med Inform Assoc 21:97-104
Johnson, Todd R; Markowitz, Eliz; Bernstam, Elmer V et al. (2013) SYFSA: a framework for systematic yet flexible systems analysis. J Biomed Inform 46:665-75
McCoy, Allison B; Wright, Adam; Kahn, Michael G et al. (2013) Matching identifiers in electronic health records: implications for duplicate records and patient safety. BMJ Qual Saf 22:219-24
Jonnalagadda, Siddhartha; Cohen, Trevor; Wu, Stephen et al. (2013) Using empirically constructed lexical resources for named entity recognition. Biomed Inform Insights 6:17-27
Jonnalagadda, Siddhartha; Cohen, Trevor; Wu, Stephen et al. (2012) Enhancing clinical concept extraction with distributional semantics. J Biomed Inform 45:129-40
Joffe, Erel; Bearden, Charles F; Byrne, Michael J et al. (2012) Duplicate patient records--implication for missed laboratory results. AMIA Annu Symp Proc 2012:1269-75
Goodwin, J Caleb; Johnson, Todd R; Cohen, Trevor et al. (2012) Predicting biomedical document access as a function of past use. J Am Med Inform Assoc 19:473-8
Joffe, Erel; Havakuk, Ofer; Herskovic, Jorge R et al. (2012) Collaborative knowledge acquisition for the design of context-aware alert systems. J Am Med Inform Assoc 19:988-94
Herskovic, Jorge R; Cohen, Trevor; Subramanian, Devika et al. (2011) MEDRank: using graph-based concept ranking to index biomedical texts. Int J Med Inform 80:431-41
Silva, Pamela A Bozzo; Bernstam, Elmer V; Markowitz, Eliz et al. (2011) Automated medication reconciliation and complexity of care transitions. AMIA Annu Symp Proc 2011:1252-60

Showing the most recent 10 out of 11 publications