A critical element of translating science into practice is the ability to find patient populations for clinical research. Many studies rely on administrative data for selecting relevant patients for studies of comparative effectiveness, but the limitations of administrative data is well-known. Much of the information critical for clinical research is locked in free-text dictated reports, such as history and physical exams and radiology reports. Data repositories, such as the Medical Archival Retrieval System (MARS) at the University of Pittsburgh, are useful for identifying supersets of patients for clinical research studies through indexed word searches. However, simple text-based queries are also limited in their effectiveness, and researchers are often left reading through hundreds or thousands of reports to filter out false positive cases. Current processes are time-consuming and extraordinarily expensive. They lead to long delays between the development of a testable hypothesis and the ability to share findings with the medical community at large. A potential solution to this problem is pre-annotating de-identified clinical reports to facilitate more intelligent and sophisticated retrieval and review. Clinical reports are rich in meaning and structure and can be annotated at many different levels using natural language processing technology. It is not clear, however, what types of annotations would be most helpful to a clinical researcher, nor is it clear how to display the annotations to best assist manual review of reports. There is interdependence between the annotation schema used by an NLP system and the user interface for assisting researchers in retrieving data for retrospective studies. In this proposal, we will interactively revise an NLP annotation schema as well as explore various methods for annotation display based on feedback from users reviewing patient data for specific research studies. We hypothesize that an interactive search application that relies on NLP-annotated clinical text will increase the accuracy and efficiency of finding patients for clinical research studies and will support visualization techniques for viewing the data in a way that improves a researcher's ability to review patient data.

Public Health Relevance

We will develop a novel review application for this proposal that will facilitatetranslational research from secondary use of EHR data by assisting researchers in moreefficiently finding retrospective populations of patients for clinical research studies. Theapplication will rely both on multi-layered annotation of the textual data; using naturallangauge processing; and on coordinated views of the patient data.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (01))
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Utah
Salt Lake City
United States
Zip Code
Scuba, William; Tharp, Melissa; Mowery, Danielle et al. (2016) Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semantics 7:42
Conway, Mike; Khojoyan, Artem; Fana, Fariba et al. (2016) Developing a web-based SKOS editor. J Biomed Semantics 7:5
Mowery, Danielle L; Chapman, Brian E; Conway, Mike et al. (2016) Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 7:26
Velupillai, S; Mowery, D; South, B R et al. (2015) Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis. Yearb Med Inform 10:183-93
Velupillai, Sumithra; Mowery, Danielle L; Abdelrahman, Samir et al. (2015) Towards a Generalizable Time Expression Model for Temporal Reasoning in Clinical Notes. AMIA Annu Symp Proc 2015:1252-9
Velupillai, Sumithra; Skeppstedt, Maria; Kvist, Maria et al. (2014) Cue-based assertion classification for Swedish clinical text--developing a lexicon for pyConTextSwe. Artif Intell Med 61:137-44
Chapman, Wendy W; Hillert, Dieter; Velupillai, Sumithra et al. (2013) Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 192:677-81