Primary care physicians (PCPs) are responsible for reviewing and understanding a wide spectrum of a patient's medical history in order to make informed decisions regarding care. However, a variety of factors impede this process, including: the increasing complexity and number of diagnostic tests and treatments, health information exchange standards that may add more information to the medical record, and the need to efficiently see more patients in less time. These obstructions can lead to an inhibition of dialogue between patients and providers, and possibly even medical errors. New methods are required to help expedite a healthcare provider's understanding of a patient's medical history, summarizing key information. The use of topic models for summarizing large, unstructured data collections is a growing area of research. However, to date little work has been done on adapting these models to the clinical reporting environment. This proposal seeks to develop a topic model and ensuing visualization system for automatically summarizing medical records to support PCPs.
Two specific aims guide the proposed work: 1) to create a topic model of free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts; and 2) to utilize the proposed model to drive a web application that includes concept-, source-, and time-oriented views for automatically summarizing patient records. The proposed model's innovation is that it is uniquely adapted to clinical records by the incorporation of demographic and discrete data (e.g., lab results), which influences the discovery of topics in documents and allows for adaptation to each patient's specific history. As a test bed for this project, we will gather medical records coded with myocardial infarction (MI), breast cancer, or liver cirrhosis, as these patients will span a spectrum of clinical complexity. We estimate that 68,539 patient records will be included in this study. The developed topic model will be integrated into a web-based visualization that displays clinically pertinent topics over time, as well as other relevant clinical data. This visualization will be evaluated by PCPs to gauge its utility to support the review of medical histories. This R21 proposal breaks new ground in the use of topic models for clinical data, and will provide future avenues of research in new applications of the proposed model.

Public Health Relevance

Primary care physicians are critical to the task of patient care. Underpinning this task is the time-intensive pro- cess of understanding complex interactions between past medical conditions and treatments, and current prob- lems for each patient. The focus of this research proposal is the development of an automatic summarization system to expedite the review of a patient's medical history. Through future studies, such a system may enable increased patient-provider dialogue and improved clinical workflow.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Los Angeles
Schools of Medicine
Los Angeles
United States
Zip Code
Speier, William; Ong, Michael K; Arnold, Corey W (2016) Using phrases and document metadata to improve topic modeling of clinical reports. J Biomed Inform 61:260-6
Arnold, Corey W; Oh, Andrea; Chen, Shawn et al. (2016) Evaluating topic model interpretability from a primary care physician perspective. Comput Methods Programs Biomed 124:67-75