Infectious diseases surveillance programs provide public health workers with important information for predicting and understanding emergence of epidemics, allowing for timely allocation of resources needed to contain the epidemics. Collecting disease agent molecular sequence information is becoming widespread, especially in surveillance of infectious diseases caused by RNA viruses, such as influenza. Phylodynamics is an emerging statistical framework that allows epidemiologists to harness information present in disease agent sequences in order to shed light on spatio-temporal population dynamics of these agents. Although sophisticated Bayesian inferential tools for phylodynamics have emerged in the last decade, these tools concentrate on sequence data alone, failing to integrate other sources of information (e.g. incidence time series data) into the phylodynamic framework. We claim that integrating multiple sources of information will make phylodynamic inference more precise, allowing for sharper predictions of disease dynamics and for statistical testing of scientific hypotheses. To test this assertion, we propose a series of new statistical methods for integration of multiple sources of information into Bayeisan infectious disease phylodynamics. We will start by developing a new Bayesian method for estimation of population dynamics directly from genomic data that combines the coalescent process, a powerful tool from population genetics, with modern Gaussian process-based Bayesian nonparametric inference (Aim 1). Our preliminary results show that the new method is more accurate than state-of-the-art Bayesian phylodynamics methods. Moreover, the proposed Gaussian process framework will liberate us from drawbacks of the current methodology and will allow us to extend this approach further to estimate correlations between the population size fluctuations and other time-varying variables of interest (Aim 2). This extension is significant, because estimating such correlations is of paramount importance to infectious disease epidemiologists and because all current phylodynamic methods are incapable of such estimation. We will also develop a new model to confront currently ignored dependence of times at which disease agent sequences are sampled on the disease dynamics (Aim 2). Explicit modeling of these sampling times should improve both accuracy and precision of the phylodynamic inference. In all our modeling efforts, we will pay close attention to computational feasibility of the proposed methods by designing efficient Markov chain Monte Carlo algorithms to perform Bayesian inference. To test our new methodology we will analyze benchmark infectious disease data sets, where available external information about disease dynamics will help us validate our methods. In addition, we will mine publicly available databases in order to perform novel data analysis using our newly developed methodology (Aim 3). One of the main deliverables of this research will be open source software, implementing the proposed new Bayesian phylodynamic methods for integration of infectious disease sequence data with other sources of information. 1

Public Health Relevance

Monitoring infectious disease dynamics is important for timely detection of infectious disease epidemics and for organizing timely public health response to these epidemics. Disease agent sequence data is becoming an important source of information in the infectious disease surveillance programs. We propose a series of new statistical methods for analyzing such sequence data. This new statistical methodology will enable epidemiologists to elucidate population dynamics of infectious disease agents and to integrate sequence data with other data collected during infectious disease surveillance programs. 1

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Research Project (R01)
Project #
5R01AI107034-02
Application #
8664800
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gezmu, Misrak
Project Start
2013-05-22
Project End
2018-04-30
Budget Start
2014-05-01
Budget End
2015-04-30
Support Year
2
Fiscal Year
2014
Total Cost
$346,673
Indirect Cost
$37,316
Name
University of Washington
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Suchard, Marc A; Lemey, Philippe; Baele, Guy et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol 4:vey016
Ho, Lam Si Tung; Xu, Jason; Crawford, Forrest W et al. (2018) Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol 76:911-944
Dinh, Vu; Tung Ho, Lam Si; Suchard, Marc A et al. (2018) Consistency and convergence rate of phylogenetic inference via regularization. Ann Stat 46:1481-1512
Tolkoff, Max R; Alfaro, Michael E; Baele, Guy et al. (2018) Phylogenetic Factor Analysis. Syst Biol 67:384-399
Crawford, Forrest W; Ho, Lam Si Tung; Suchard, Marc A (2018) Computational methods for birth-death processes. Wiley Interdiscip Rev Comput Stat 10:
Rambaut, Andrew; Drummond, Alexei J; Xie, Dong et al. (2018) Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst Biol 67:901-904
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor et al. (2018) Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza. Stat Med 37:195-206
Faulkner, James R; Minin, Vladimir N (2018) Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors. Bayesian Anal 13:225-252
Holbrook, Andrew; Vandenberg-Rodes, Alexander; Fortin, Norbert et al. (2017) A Bayesian supervised dual-dimensionality reduction model for simultaneous decoding of LFP and spike train signals. Stat (Int Stat Inst) 6:53-67
Dudas, Gytis; Carvalho, Luiz Max; Bedford, Trevor et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309-315

Showing the most recent 10 out of 53 publications