NLM's Software Application to De-identify Clinical Text Documents

Kayaalp, Mehmet

Abstract

Clinical text documents contain a rich set of clinical knowledge that is invaluable for clinical research. Unfortunately, they remain a largely untapped resource since disseminating such data as-is would jeopardize the privacy of patients and reveal protected health information. Computational de-identification is a means to overcome this problem. It involves processing clinical text documents using natural language processing (NLP) tools and techniques, recognizing patient-related individually identifiable information (e.g., names, addresses, and telephone and social security numbers) in the text, and redacting only those identifiers. In this way, patient privacy is protected and clinical knowledge is preserved. Without computational tools, de-identification places a heavy burden on clinicians shoulders, but it is a necessary step for protecting patient privacy as mandated by both the Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA) and the Privacy Act of 1974. After exploring existing de-identification tools, the U.S. National Library of Medicine (NLM) is developing new software that is capable of de-identifying many kinds of clinical text documents with high accuracy. The software design uses a number of deterministic and probabilistic pattern recognition algorithms and various computational linguistic methods. We are using many large datasets for names, addresses, and organizations, all of which have the potential to identify patients, in order to find and remove such content from the text. The application accepts text documents in plain text or in HL7 format. If documents are provided in an HL7 format, the application makes use of patient related information embedded in various HL7 segments and fields in order to find and remove that information, including typographical errors and misspellings, from the corpus of the text with high accuracy. The application software includes an editor for visualization and markup called the Visual Tagging Tool (VTT). Although designed specifically for tagging identifiers that contain personally identifiable protected health information, VTT will be made publicly available to the greater NLP community for expanded lexical tagging and text annotation. We are beginning a series of studies to assess the success of de-identifying on a large corpus of tagged clinical documents. The preliminary results of this study suggest that computational de-identification methods may attain an accuracy at or better than the level of 99% sensitivity and 99% specificity across a large spectrum of identifiers containing personally identifiable information.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM010002-01
Application #: 8158053
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 1
Fiscal Year: 2010
Total Cost: $317,141
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2019 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2018 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2017 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2016 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2015 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2014 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2013 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$379,288
NIH 2012 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$349,273
NIH 2011 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$333,200
NIH 2010 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$317,141

Publications

Kayaalp, Mehmet (2018) Patient Privacy in the Era of Big Data. Balkan Med J 35:8-17

Kayaalp, Mehmet; Browne, Allen C; Sagan, Pamela et al. (2015) Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation. AMIA Annu Symp Proc 2015:707-16

Browne, Allen C; Kayaalp, Mehmet; Dodd, Zeyno A et al. (2014) The Challenges of Creating a Gold Standard for De-identification Research. AMIA Annu Symp Proc 2014:353-8

Huser, Vojtech; Kayaalp, Mehmet; Dodd, Zeyno A et al. (2014) Piloting a deceased subject integrated data repository and protecting privacy of relatives. AMIA Annu Symp Proc 2014:719-28

Kayaalp, Mehmet; Browne, Allen C; Dodd, Zeyno A et al. (2014) De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc 2014:767-76

Kang, Yanna Shen; Kayaalp, Mehmet (2013) Extracting laboratory test information from biomedical text. J Pathol Inform 4:23

Kayaalp, Mehmet; Browne, Allen C; Callaghan, Fiona M et al. (2013) The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc :

Fung, Kin Wah; Kayaalp, Mehmet; Callaghan, Fiona et al. (2013) Comparison of electronic pharmacy prescription records with manually collected medication histories in an emergency department. Ann Emerg Med 62:205-11

Comments

Be the first to comment on Mehmet Kayaalp's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: