Narratives of electronic health records (EHRs) contain useful information that is difficult to automatically extract, index, search, or interpret. Clinical natural language processing (NLP) technologies for automatic extraction, indexing, searching, and interpretation of EHRs are in development;however, due to privacy concerns related to EHRs, such technologies are usually developed by teams that have privileged access to EHRs in a specific institution. Technologies that are tailored to a specific set of data from a given institution generate inspiring results on that data;however, they can fail to generalize to similar data from other institutions and even other departments from the same institution. Therefore, learning from these technologies and building on them becomes difficult. In order to improve NLP in EHRs, there is need for head-to-head comparison of approaches that can address a given task on the same data set. Shared-tasks provide one way of conducting systematic head-to- head comparisons. This proposal describes a series of shared-task challenges and conferences, spread over a five year period, that promote the development and evaluation of cutting edge clinical NLP systems by distributing de-identified EHRs to the broad research community, under data use agreements, so that: * the state-of-the-art in clinical NLP technologies can be identified and advanced, * a set of technologies that enable the use of the information contained in EHR narratives becomes available, and * the information from EHR narratives can be made more accessible, for example, for clinical and medical research. The scientific activities supporting the organization of the shared-task challenges are sponsored in part by Informatics for Integrating Biology and the Bedside (i2b2), grant number U54-LM008748, PI: Kohane. This proposal aims to organize a series of workshops, conference proceedings, and journal special issues that will accompany the shared-task challenges in order to disseminate the knowledge generated by the challenges.

Public Health Relevance

this proposal will address two main challenges related to the use of clinical narratives for research: availability of clinical records for research and identification of the state of the art in clinical natural language processing (NLP) technologies so that we can push the state of the art forward and so that future work can build on the past. Progress in clinical NLP will improve access to electronic health records for research, and for clinical applications, benefiting healthcare and public health.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Conference (R13)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
State University of New York at Albany
Schools of Arts and Sciences
United States
Zip Code
Chasin, Rachel; Rumshisky, Anna; Uzuner, Ozlem et al. (2014) Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods. J Am Med Inform Assoc 21:842-9
Roberts, Kirk; Rink, Bryan; Harabagiu, Sanda M (2013) A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text. J Am Med Inform Assoc 20:867-75
Grouin, Cyril; Grabar, Natalia; Hamon, Thierry et al. (2013) Eventual situations for timeline extraction from clinical reports. J Am Med Inform Assoc 20:820-7
Kovacevic, Aleksandar; Dehghan, Azad; Filannino, Michele et al. (2013) Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J Am Med Inform Assoc 20:859-66
Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem (2013) Annotating temporal information in clinical narratives. J Biomed Inform 46 Suppl:S5-12
D'Souza, Jennifer; Ng, Vincent (2013) Classifying temporal relations in clinical data: a hybrid, knowledge-rich approach. J Biomed Inform 46 Suppl:S29-39
Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem (2013) Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc 20:806-13
Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem (2013) Temporal reasoning over clinical text: the state of the art. J Am Med Inform Assoc 20:814-9
Tang, Buzhou; Wu, Yonghui; Jiang, Min et al. (2013) A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc 20:828-35
Sohn, Sunghwan; Wagholikar, Kavishwar B; Li, Dingcheng et al. (2013) Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification. J Am Med Inform Assoc 20:836-42