Medical errors have been shown to be the third leading cause of death in the United States. The Institute of Medicine and several state legislatures have recommended the use of patient safety event reporting systems (PSRS) to better understand and improve safety hazards. Numerous healthcare providers have adopted these systems, which provide a framework for healthcare provlder staff to report patient safety events. Public databases like MAUDE and VAERS have also been created to collect and trend safety events across healthcare systems. A patient safety event (PSE) report generally consists of both structured and unstructured data elements. Structured data are pre-defined, fixed fields that solicit specific information about the event. The unstructured data fields generally include a free text field where the reporter can enter a text description of the event. The text descriptions are often a rich data source in that the reporter ls not constrained to limited categories or selection options and is able to freely descrlbe the details of the event. The goal of this project is to develop novel statistical methods to analyze unstructured text like patient safety event reports arising in healthcare, which can lead to significant improvements to patient safety and enable timely intervention strategies. We address three problems: (a) Building realistic and meaningful baseline models for near misses, and detecting systematic deterioration of adverse outcomes relative to such baselines; (b) Understanding critical factors that lead to near misses & quantifying severity of outcomes; and (c) ldentifylng document groups of interest. We will use novel statistical approaches that combine Natural Language Processing with Statistical Process Monitoring, Statistical Networks Analysis, and Spatio-temporal Modeling to build a generalizable toolbox that can address these issues in healthcare. An important advantage of our research team is the involvement of healthcare domain e~perts and access to frontline staff, and we will leverage this strength to develop our algorithms. A key feature of our work is the generalizability of our methods, which will be applicable to biomedical documents arising across a remarkable variety of areas, such as patient safety and equipment malfunction reports, electronic health records, adverse drug or vaccine reports, etc. We will also release open source software via R packages & GitHub, which will enable healthcare staff and researchers to execute our methods on their datasets.

Public Health Relevance

}: Estimates of preventable adverse events in healthcare are staggering, despite the frequently cited Institute of Medicine (IOM) report that first brought attention to the problem over ten years ago. Identifying temporal trends and patterns in the data is particularly important to improving patient safety and patient care. Using our algorithms to effectively analyze documents from reporting systems has the potential to dramatically improve the safety and quality of care by exposing possible weaknesses in the care process.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Virginia Polytechnic Institute and State University
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code