A critical element of translating science into practice is the ability to find patient populations for clinical research. Many studies rely on administrative data for selecting relevant patients for studies of comparative effectiveness, but the limitations of administrative data is well-known. Much of the information critical for clinical research is locked in free-text dictated reports, such as history and physical exams and radiology reports. Data repositories, such as the Medical Archival Retrieval System (MARS) at the University of Pittsburgh, are useful for identifying supersets of patients for clinical research studies through indexed word searches. However, simple text-based queries are also limited in their effectiveness, and researchers are often left reading through hundreds or thousands of reports to filter out false positive cases. Current processes are time-consuming and extraordinarily expensive. They lead to long delays between the development of a testable hypothesis and the ability to share findings with the medical community at large. A potential solution to this problem is pre-annotating de-identified clinical reports to facilitate more intelligent and sophisticated retrieval and review. Clinical reports are rich in meaning and structure and can be annotated at many different levels using natural language processing technology. It is not clear, however, what types of annotations would be most helpful to a clinical researcher, nor is it clear how to display the annotations to best assist manual review of reports. There is interdependence between the annotation schema used by an NLP system and the user interface for assisting researchers in retrieving data for retrospective studies. In this proposal, we will interactively revise an NLP annotation schema as well as explore various methods for annotation display based on feedback from users reviewing patient data for specific research studies. We hypothesize that an interactive search application that relies on NLP-annotated clinical text will increase the accuracy and efficiency of finding patients for clinical research studies and will support visualization techniques for viewing the data in a way that improves a researcher's ability to review patient data.

Public Health Relevance

We will develop a novel review application for this proposal that will facilitate translational research from secondary use of EHR data by assisting researchers in more efficiently finding retrospective populations of patients for clinical research studies. The application will rely both on multi-layered annotation of the textual data, using natural langauge processing, and on coordinated views of the patient data.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM010964-03
Application #
8520393
Study Section
Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer
Sim, Hua-Chuan
Project Start
2011-09-30
Project End
2015-08-31
Budget Start
2013-09-30
Budget End
2014-08-31
Support Year
3
Fiscal Year
2013
Total Cost
$542,645
Indirect Cost
$112,610
Name
University of California San Diego
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093
Velupillai, Sumithra; Skeppstedt, Maria; Kvist, Maria et al. (2014) Cue-based assertion classification for Swedish clinical text--developing a lexicon for pyConTextSwe. Artif Intell Med 61:137-44
Chapman, Wendy W; Hillert, Dieter; Velupillai, Sumithra et al. (2013) Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 192:677-81