The goals of our project are as follows: 1. Create a corpus of temporally annotated data. Under the supervision of our consultants Dr. Frank Sacks, Dr. Vincent Carey, and two Registered Nurses, we will create a gold-standard annotation of events and temporal information within patient narratives from de- identified Electronic Health Record data using the CLEF and TimeML guidelines. We will use the framework of the Brandeis Annotation Tool, a system we have designed to facilitate the quick construction of accurately annotated corpora against a specified guideline. Extensions to the current event library and lexicon with medical event references will be made during the annotation process, under the guidance of the Registered Nurses. 2. Adapt the TARSQI Toolkit (TTK) to targeted temporal properties and relations in the EHR domain. We will use the TARSQI toolkit, a robust set of temporal processing algorithms we have designed for parsing natural language text, to automatically annotate the events and temporal information in EHR data. Combined with the Brandeis AcroMed Medical Abbreviation Server and those terms introduced in part 1, we will employ the Specialist Lexicon and other medical resources to extend the toolkit capabilities for recognizing and interpreting medical event information. Algorithms for identifying events, temporal expressions, and event anchorings and orderings will be trained against the gold standard created in Aim 1, and tested against held-out data. 3. Create a cross-document temporal database of medical events. Using the recognition algorithms introduced in Aim 2, we will create a searchable, temporally ordered database of medical events such as diseases, symptoms, surgeries/interventions, and test results. Events referred to multiple times in the data will be merged using a constraint- satisfaction analysis in order to create a more coherent narrative for a single patient over multiple records.

Public Health Relevance

It is becoming increasingly common for medical researchers to use Electronic Health Records (EHRs) as a primary source of data for researching correlations between various medical issues and concepts. However, EHRs typically contain unstructured text, making them difficult to mine. This research will create a database of temporal orderings from events extracted from EHR patient narratives, using algorithms previously applied to news articles.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21LM009633-02
Application #
7941063
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2009-09-30
Project End
2012-09-29
Budget Start
2010-09-30
Budget End
2012-09-29
Support Year
2
Fiscal Year
2010
Total Cost
$175,973
Indirect Cost
Name
Brandeis University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
616845814
City
Waltham
State
MA
Country
United States
Zip Code
02454