Infectious diseases surveillance programs provide public health workers with important information for predicting and understanding emergence of epidemics, allowing for timely allocation of resources needed to contain the epidemics. Collecting disease agent molecular sequence information is becoming widespread, especially in surveillance of infectious diseases caused by RNA viruses, such as influenza. Phylodynamics is an emerging statistical framework that allows epidemiologists to harness information present in disease agent sequences in order to shed light on spatio-temporal population dynamics of these agents. Although sophisticated Bayesian inferential tools for phylodynamics have emerged in the last decade, these tools concentrate on sequence data alone, failing to integrate other sources of information (e.g. incidence time series data) into the phylodynamic framework. We claim that integrating multiple sources of information will make phylodynamic inference more precise, allowing for sharper predictions of disease dynamics and for statistical testing of scientific hypotheses. To test this assertion, we propose a series of new statistical methods for integration of multiple sources of information into Bayeisan infectious disease phylodynamics. We will start by developing a new Bayesian method for estimation of population dynamics directly from genomic data that combines the coalescent process, a powerful tool from population genetics, with modern Gaussian process-based Bayesian nonparametric inference (Aim 1). Our preliminary results show that the new method is more accurate than state-of-the-art Bayesian phylodynamics methods. Moreover, the proposed Gaussian process framework will liberate us from drawbacks of the current methodology and will allow us to extend this approach further to estimate correlations between the population size fluctuations and other time-varying variables of interest (Aim 2). This extension is significant, because estimating such correlations is of paramount importance to infectious disease epidemiologists and because all current phylodynamic methods are incapable of such estimation. We will also develop a new model to confront currently ignored dependence of times at which disease agent sequences are sampled on the disease dynamics (Aim 2). Explicit modeling of these sampling times should improve both accuracy and precision of the phylodynamic inference. In all our modeling efforts, we will pay close attention to computational feasibility of the proposed methods by designing efficient Markov chain Monte Carlo algorithms to perform Bayesian inference. To test our new methodology we will analyze benchmark infectious disease data sets, where available external information about disease dynamics will help us validate our methods. In addition, we will mine publicly available databases in order to perform novel data analysis using our newly developed methodology (Aim 3). One of the main deliverables of this research will be open source software, implementing the proposed new Bayesian phylodynamic methods for integration of infectious disease sequence data with other sources of information. 1
Monitoring infectious disease dynamics is important for timely detection of infectious disease epidemics and for organizing timely public health response to these epidemics. Disease agent sequence data is becoming an important source of information in the infectious disease surveillance programs. We propose a series of new statistical methods for analyzing such sequence data. This new statistical methodology will enable epidemiologists to elucidate population dynamics of infectious disease agents and to integrate sequence data with other data collected during infectious disease surveillance programs. 1
Suchard, Marc A; Lemey, Philippe; Baele, Guy et al. (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol 4:vey016 |
Ho, Lam Si Tung; Xu, Jason; Crawford, Forrest W et al. (2018) Birth/birth-death processes and their computable transition probabilities with biological applications. J Math Biol 76:911-944 |
Dinh, Vu; Tung Ho, Lam Si; Suchard, Marc A et al. (2018) Consistency and convergence rate of phylogenetic inference via regularization. Ann Stat 46:1481-1512 |
Tolkoff, Max R; Alfaro, Michael E; Baele, Guy et al. (2018) Phylogenetic Factor Analysis. Syst Biol 67:384-399 |
Crawford, Forrest W; Ho, Lam Si Tung; Suchard, Marc A (2018) Computational methods for birth-death processes. Wiley Interdiscip Rev Comput Stat 10: |
Rambaut, Andrew; Drummond, Alexei J; Xie, Dong et al. (2018) Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst Biol 67:901-904 |
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor et al. (2018) Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza. Stat Med 37:195-206 |
Faulkner, James R; Minin, Vladimir N (2018) Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors. Bayesian Anal 13:225-252 |
Holbrook, Andrew; Vandenberg-Rodes, Alexander; Fortin, Norbert et al. (2017) A Bayesian supervised dual-dimensionality reduction model for simultaneous decoding of LFP and spike train signals. Stat (Int Stat Inst) 6:53-67 |
Dudas, Gytis; Carvalho, Luiz Max; Bedford, Trevor et al. (2017) Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544:309-315 |
Showing the most recent 10 out of 53 publications