Our interdisciplinary research team will develop algorithms to accelerate the detection of respiratory virus outbreaks at an unprecedented local scale in US cities. We propose to advance outbreak detection by combining machine learning data integration methods and spatial models of disease transmission. The dynamic models that will be developed will provide mechanistic engines for distinguishing typical from atypical disease trends and the optimization methods evaluate the informativeness of data sources to achieve specified public health goals through the rapid evaluation of diverse input data sources. Working with local healthcare and public health leaders, we will translate the algorithms into user-friendly online tools to support preparedness plans and decision-making. Our proposed research is organized around three major aims.
In Aim 1, we will apply machine learning and signal processing methods to build systems that track the earliest indicators of emerging outbreaks within seven US cities. We will evaluate non-clinical data reflecting early and mild symptoms as well as clinical data covering underserved communities and geographic and demographic hotspots for viral emergence.
In Aim 2, we will develop sub-city scale models reflecting the syndemics of co-circulating respiratory viruses and chronic respiratory diseases (CRD) that can exacerbate viral infections. We will infer viral transmission rates and socio-environmental risk cofactors by fitting the model to respiratory disease data extracted from millions of electronic health records (EHRs) for the last nine years. We will then partner with clinical and EHR experts to translate our models into the first outbreak detection system for severe respiratory viruses that incorporates EHR data on CRDs. Using machine learning techniques, we will further integrate other surveillance, environmental, behavioral and internet predictor data sources to maximize the accuracy, sensitivity, speed and population coverage of our algorithms.
In Aim 3, we will develop an open-access Python toolkit to facilitate the integration of next generation data into outbreak surveillance models. This project will produce practical early warning algorithms for detecting emerging viral threats at high spatiotemporal resolution in several US cities, elucidate socio-geographic gaps in current surveillance systems and hotspots for viral emergence, and provide a robust design framework for extrapolating these algorithms to other US cities.

Public Health Relevance

We will develop innovative algorithms for detecting emerging respiratory viruses within US cities. To do so, we will model the syndemic dynamics of respiratory viruses and chronic respiratory diseases and apply machine learning to combine geospatial data that track early indicators of emerging threats. Working with local public health and healthcare collaborators, we will translate this research into practical tools for addressing socio- geographic gaps in surveillance and accelerating the detection, prevention and mitigation of severe outbreaks.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Infectious Diseases, Reproductive Health, Asthma and Pulmonary Conditions Study Section (IRAP)
Program Officer
Kim, Sonnie
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code