Beginning with John Snow's investigations of cholera epidemics, understanding and preventing infectious disease transmission has been one of the fundamental goals of epidemiology. Whole-genome sequences from viruses and bacteria are a promising new source of information about disease transmission, but current statistical methods are unable to incorporate these data into the analysis of transmission in households and other close-contact groups. The long-term goal is to develop statistical and epidemiologic methods that use high-resolution transmission data and genetic sequence data to inform rapid and effective public health responses to emerging infections. The goal of the proposed research is to develop ?exible and robust regression models for infectious disease transmission data that can incorporate pathogen genetic sequences. These will be based on a recently-developed semiparametric regression model that can estimate parameters crucial to mathematical models of epidemics and the design of interventions, including hazard ratios for covariate effects on infectiousness and susceptibility and baseline hazards of transmission in infectious-susceptible pairs. To make it a more practical tool for infectious disease epidemiology, this model will be extended to account for external sources of infection, missing data, and small samples. The partial likelihood for this model is a sum over the set of transmission trees consistent with the epidemiologic data on person, place, and time. Since a phylogeny linking pathogen samples from infected individuals constrains the set of possible transmission trees, pathogen genetic sequence data can be combined with epidemiologic data to obtain more ef?cient estimates of transmission parameters. Epidemiologic and genetic data will be combined by developing algorithms to ?nd the set of transmission trees simultaneously consistent with both. These algorithms will be incorporated into Markov chain Monte Carlo or sequential Monte Carlo estimation procedures that will account for missing data and phylogenetic uncertainty. These methods will serve as a theoretical basis for the development of ef?cient case-control and case-cohort study designs for outbreak investigations and vaccine trials. The proposed research is innovative because it synthesizes survival analysis and statistical genetics to analyze infectious disease transmission data. It is signi?cant because it will improve the collection and analysis o data and the evaluation of interventions in epidemics, allowing more effective control of emerging infections.

Public Health Relevance

The proposed research will synthesize survival analysis and statistical genetics to develop novel regression models and study designs for infectious disease transmission data. Incorporating pathogen sequence data into the statistical analysis of transmission in households and other groups of close contacts will al- low a more detailed scienti?c understanding of transmission and more effective public health responses to emerging infections.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Gezmu, Misrak
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Ohio State University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Kenah, Eben; Britton, Tom; Halloran, M Elizabeth et al. (2016) Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees. PLoS Comput Biol 12:e1004869