The proposed research will further develop and evaluate a probabilistic approach to disease surveillance. In this approach, a probabilistic case detection system (CDS) uses Bayesian diagnostic networks to compute the likelihoods of patient findings for each of a set of infectious diseases for every patient in a monitored population. CDS computes these likelihoods from data in electronic medical records, including information derived from free-text reports by natural language processing. CDS makes those estimates available to a probabilistic outbreak detection and characterization component (ODCS). ODCS also utilizes a Bayesian approach to compute the probability that an outbreak is ongoing for each of a set of infectious diseases of interest, given information from CDS. ODCS also computes probability distributions over the current and future size of a detected outbreak and other characteristics such as incubation period used by public health officials when responding to an outbreak. The proposed research will extend the approach, which we have already developed and evaluated for the disease influenza to six additional respiratory infectious diseases. The research will also extend the capabilities of ODCS to utilize non-EMR data, detect an unknown disease, and detect and characterize concurrent outbreaks. The planned evaluations will measure the accuracy of both CDS and ODCS using historical surveillance data from two regions and simulated outbreak data, which we will create by adding outbreak cases generated by an agent-based epidemic simulator to real baseline surveillance data from non-outbreak periods. The innovation being advanced by this research is a novel, integrated, Bayesian approach for the early and accurate detection of cases of diseases that threaten health and for the detection and characterization of outbreaks of diseases that threaten public health. The proposed approach has significant potential to improve the information available to public health officials and physicians, which can be expected to improve clinical and public health decision making, and ultimately to improve population health.

Public Health Relevance

Project Relevance The proposed research will improve the ability of public health officials and physicians to estimate the current incidence of influenza and other infectious diseases and to predict the future course of epidemics of those diseases. The improved information will better support decisions made by health departments to control epidemics, which is expected to reduce morbidity and mortality from epidemic diseases.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Cooper, Gregory F; Villamarin, Ricardo; Rich Tsui, Fu-Chiang et al. (2015) A method for detecting and characterizing outbreaks of infectious disease from clinical reports. J Biomed Inform 53:15-26
López Pineda, Arturo; Ye, Ye; Visweswaran, Shyam et al. (2015) Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. J Biomed Inform 58:60-9
Ye, Ye; Tsui, Fuchiang Rich; Wagner, Michael et al. (2014) Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. J Am Med Inform Assoc 21:815-23