NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents

Kayaalp, Mehmet

Abstract

Narrative clinical reports contain a rich set of clinical knowledge that could be invaluable for clinical research. However, they may also contain personally identifiable information (PII) that make those clinical reports classified as PHI, which is associated with use restrictions and risks to privacy. Computational de-identification seeks to remove all instances of PII in such narrative text in order to produce de-identified documents, which would no longer be classified as PHI and can be used in research with fewer constraints and with almost no risk to privacy. Computational de-identification uses pattern recognition and computational linguistic methods to recognize words and other alphanumeric tokens denoting PII (e.g., names, addresses, and telephone and social security numbers) in the text, and redacts them. In this way, both patient privacy is protected and clinical knowledge is preserved. After exploring existing de-identification tools, the U.S. National Library of Medicine (NLM) began developing a new software application called NLM Scrubber, which is capable of de-identifying many types of clinical reports with high accuracy. The software design is based on both deterministic and probabilistic pattern recognition and computational linguistic methods utilizing large dictionaries of personal names, addresses, and organizations. The application accepts narrative reports in plain text or in HL7 format. When the input reports are formatted as HL7 messages, the application software leverages patient information embedded in HL7 segments to find such information in the text portion of the HL7 message. In November 2014, we released the first beta version of NLM Scrubber, which is freely downloadable from https://scrubber.nlm.nih.gov. NLM Scrubber performs quite well on detecting words and other alphanumeric tokens containing PII found on dictated reports. Our focus is on extending our work to further improve NLM Scrubbers de-identification performance across a large spectrum of identifiers and additional report types. NLM Scrubber will be used to de-identify the entire Biomedical Translational Research Information System (BTRIS) repository of clinical narrative reports at NIH as well as the narrative pathology reports in Surveillance, Epidemiology and End Results (SEER) database maintained by National Cancer Institute.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM010002-08
Application #: 9554455
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 8
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2019 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2018 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2017 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2016 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2015 ZIA LM	NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2014 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine
NIH 2013 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$379,288
NIH 2012 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$349,273
NIH 2011 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$333,200
NIH 2010 ZIA LM	NLM's Software Application to De-identify Clinical Text Documents Kayaalp, Mehmet / National Library of Medicine	$317,141

Publications

Kayaalp, Mehmet (2018) Patient Privacy in the Era of Big Data. Balkan Med J 35:8-17

Kayaalp, Mehmet; Browne, Allen C; Sagan, Pamela et al. (2015) Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation. AMIA Annu Symp Proc 2015:707-16

Browne, Allen C; Kayaalp, Mehmet; Dodd, Zeyno A et al. (2014) The Challenges of Creating a Gold Standard for De-identification Research. AMIA Annu Symp Proc 2014:353-8

Huser, Vojtech; Kayaalp, Mehmet; Dodd, Zeyno A et al. (2014) Piloting a deceased subject integrated data repository and protecting privacy of relatives. AMIA Annu Symp Proc 2014:719-28

Kayaalp, Mehmet; Browne, Allen C; Dodd, Zeyno A et al. (2014) De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc 2014:767-76

Kang, Yanna Shen; Kayaalp, Mehmet (2013) Extracting laboratory test information from biomedical text. J Pathol Inform 4:23

Kayaalp, Mehmet; Browne, Allen C; Callaghan, Fiona M et al. (2013) The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc :

Fung, Kin Wah; Kayaalp, Mehmet; Callaghan, Fiona et al. (2013) Comparison of electronic pharmacy prescription records with manually collected medication histories in an emergency department. Ann Emerg Med 62:205-11

Comments

Be the first to comment on Mehmet Kayaalp's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: