Annotation, development and evaluation for clinical information extraction

Chapman, Wendy; Elhadad, Noemie; Savova, Guergana

Abstract

Much of the clinical information required for accurate clinical research, active decision support, and broad-coverage surveillance is locked in text files in an electronic medical record (EMR). The only feasible way to leverage this information for translational science is to extract and encode the information using natural language processing (NLP). Over the last two decades, several research groups have developed NLP tools for clinical notes, but a major bottleneck preventing progress in clinical NLP is the lack of standard, annotated data sets for training and evaluating NLP applications. Without these standards, individual NLP applications abound without the ability to train different algorithms on standard annotations, share and integrate NLP modules, or compare performance. We propose to develop standards and infrastructure that can enable technology to extract scientific information from textual medical records, and we propose the research as a collaborative effort involving NLP experts across the U.S. To accomplish this goal, we will address three specific aims:
Aim 1 : Extend existing standards and develop new consensus standards for annotating clinical text in a way that is interoperable, extensible, and usable.
Aim 2 : Apply existing methods and tools, and develop new methods and tools where necessary for manually annotating a set of publicly available clinical texts in a way that is efficient and accurate.
Aim 3 : Develop a publicly available toolkit for automatically annotating clinical text and perform a shared evaluation to evaluate the toolkit, using evaluation metrics that are multidimensional and flexible.

Public Health Relevance

In this project, we will develop a publicly available corpus of annotated clinical texts for NLP research. We will experiment with methods for increasing the efficiency of annotation and will annotate de-identified reports of nine types for linguistic and clinical information. In addition, we will create an NLP toolkit that can be shared and will evaluate it against other NLP systems in a shared task evaluation with the community.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM090187-04
Application #: 8288078
Study Section: Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer: Somers, Scott D

Project Start: 2010-09-01
Project End: 2014-06-30
Budget Start: 2012-07-01
Budget End: 2013-06-30
Support Year: 4
Fiscal Year: 2012
Total Cost: $663,130
Indirect Cost: $68,504

Institution

Name: University of California San Diego
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 804355790

City: La Jolla
State: CA
Country: United States
Zip Code: 92093

Related projects


NIH 2013 R01 GM	Annotation, development and evaluation for clinical information extraction Chapman, Wendy W.; Elhadad, Noemie; Savova, Guergana K. / University of California San Diego	$634,107
NIH 2012 R01 GM	Annotation, development and evaluation for clinical information extraction Chapman, Wendy W.; Elhadad, Noemie; Savova, Guergana K. / University of California San Diego	$663,130
NIH 2011 R01 GM	Annotation, development and evaluation for clinical information extraction Chapman, Wendy W.; Elhadad, Noemie; Savova, Guergana K. / University of California San Diego	$664,617
NIH 2010 R01 GM	Annotation, development and evaluation for clinical information extraction Chapman, Wendy W.; Elhadad, Noemie; Savova, Guergana K. / University of Pittsburgh	$1
NIH 2010 R01 GM	Annotation, development and evaluation for clinical information extraction Chapman, Wendy W.; Elhadad, Noemie; Savova, Guergana K. / University of California San Diego	$642,650

Publications

Suominen, Hanna; Kelly, Liadh; Goeuriot, Lorraine (2018) Scholarly Influence of the Conference and Labs of the Evaluation Forum eHealth Initiative: Review and Bibliometric Study of the 2012 to 2017 Outcomes. JMIR Res Protoc 7:e10961

Mowery, Danielle L; Chapman, Brian E; Conway, Mike et al. (2016) Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 7:26

Mowery, Danielle L; South, Brett R; Christensen, Lee et al. (2016) Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. J Biomed Semantics 7:43

Pradhan, Sameer; Elhadad, Noémie; South, Brett R et al. (2015) Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc 22:143-54

Dligach, Dmitriy; Miller, Timothy; Savova, Guergana K (2015) Semi-supervised Learning for Phenotyping Tasks. AMIA Annu Symp Proc 2015:502-11

South, Brett R; Mowery, Danielle; Suo, Ying et al. (2014) Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J Biomed Inform 50:162-72

Dligach, Dmitriy; Bethard, Steven; Becker, Lee et al. (2014) Discovering body site and severity modifiers in clinical texts. J Am Med Inform Assoc 21:448-54

Chapman, Wendy W; Hillert, Dieter; Velupillai, Sumithra et al. (2013) Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 192:677-81

Dublin, Sascha; Baldwin, Eric; Walker, Rod L et al. (2013) Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf 22:834-41

Zhang, Shaodian; Elhadad, Noémie (2013) Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inform 46:1088-98

Showing the most recent 10 out of 11 publications

Comments

Be the first to comment on Wendy Chapman's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: