Infectious diseases surveillance programs provide public health workers with important information for predicting and understanding emergence of epidemics, allowing for timely allocation of resources needed to contain the epidemics. Collecting disease agent molecular sequence information is becoming widespread, especially in surveillance of infectious diseases caused by RNA viruses, such as influenza. Phylodynamics is an emerging statistical framework that allows epidemiologists to harness information present in disease agent sequences in order to shed light on spatio-temporal population dynamics of these agents. Although sophisticated Bayesian inferential tools for phylodynamics have emerged in the last decade, these tools concentrate on sequence data alone, failing to integrate other sources of information (e.g. incidence time series data) into the phylodynamic framework. We claim that integrating multiple sources of information will make phylodynamic inference more precise, allowing for sharper predictions of disease dynamics and for statistical testing of scientific hypotheses. To test this assertion, we propose a series of new statistical methods for integration of multiple sources of information into Bayeisan infectious disease phylodynamics. We will start by developing a new Bayesian method for estimation of population dynamics directly from genomic data that combines the coalescent process, a powerful tool from population genetics, with modern Gaussian process-based Bayesian nonparametric inference (Aim 1). Our preliminary results show that the new method is more accurate than state-of-the-art Bayesian phylodynamics methods. Moreover, the proposed Gaussian process framework will liberate us from drawbacks of the current methodology and will allow us to extend this approach further to estimate correlations between the population size fluctuations and other time-varying variables of interest (Aim 2). This extension is significant, because estimating such correlations is of paramount importance to infectious disease epidemiologists and because all current phylodynamic methods are incapable of such estimation. We will also develop a new model to confront currently ignored dependence of times at which disease agent sequences are sampled on the disease dynamics (Aim 2). Explicit modeling of these sampling times should improve both accuracy and precision of the phylodynamic inference. In all our modeling efforts, we will pay close attention to computational feasibility of the proposed methods by designing efficient Markov chain Monte Carlo algorithms to perform Bayesian inference. To test our new methodology we will analyze benchmark infectious disease data sets, where available external information about disease dynamics will help us validate our methods. In addition, we will mine publicly available databases in order to perform novel data analysis using our newly developed methodology (Aim 3). One of the main deliverables of this research will be open source software, implementing the proposed new Bayesian phylodynamic methods for integration of infectious disease sequence data with other sources of information. 1

Public Health Relevance

Monitoring infectious disease dynamics is important for timely detection of infectious disease epidemics and for organizing timely public health response to these epidemics. Disease agent sequence data is becoming an important source of information in the infectious disease surveillance programs. We propose a series of new statistical methods for analyzing such sequence data. This new statistical methodology will enable epidemiologists to elucidate population dynamics of infectious disease agents and to integrate sequence data with other data collected during infectious disease surveillance programs. 1

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gezmu, Misrak
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Washington
Biostatistics & Other Math Sci
Schools of Arts and Sciences
United States
Zip Code
Dudas, Gytis; Carvalho, Luiz Max; Bedford, Trevor et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309-315
Holbrook, Andrew; Vandenberg-Rodes, Alexander; Fortin, Norbert et al. (2017) A Bayesian supervised dual-dimensionality reduction model for simultaneous decoding of LFP and spike train signals. Stat (Int Stat Inst) 6:53-67
Karcher, Michael D; Palacios, Julia A; Lan, Shiwei et al. (2017) phylodyn: an R package for phylodynamic simulation and inference. Mol Ecol Resour 17:96-100
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai (2017) Hamiltonian Monte Carlo acceleration using surrogate functions with random bases. Stat Comput 27:1473-1490
Baele, Guy; Suchard, Marc A; Rambaut, Andrew et al. (2017) Emerging Concepts of Data Integration in Pathogen Phylodynamics. Syst Biol 66:e47-e65
Vrancken, Bram; Suchard, Marc A; Lemey, Philippe (2017) Accurate quantification of within- and between-host HBV evolutionary rates requires explicit transmission chain modelling. Virus Evol 3:vex028
Ho, Lam Si Tung; Xu, Jason; Crawford, Forrest W et al. (2017) Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol :
Bielejec, Filip; Baele, Guy; Rodrigo, Allen G et al. (2016) Identifying predictors of time-inhomogeneous viral evolutionary processes. Virus Evol 2:vew023
Worobey, Michael; Watts, Thomas D; McKay, Richard A et al. (2016) 1970s and 'Patient 0' HIV-1 genomes illuminate early HIV/AIDS history in North America. Nature 539:98-101
Baele, Guy; Lemey, Philippe; Suchard, Marc A (2016) Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty. Syst Biol 65:250-64

Showing the most recent 10 out of 45 publications