Overview: Highly mutable RNA viruses, such as human immunodeficiency virus and hepatitis C virus are major causes of morbidity and mortality in the world. The hallmark of RNA viruses is their extremely high genetic diversity that allows them to rapidly establish new infections, escape host's immune system and develop drug resistance. Emergence of next-generation sequencing technologies promises to revolutionize the fields of virology and epidemiology by allowing to sample and characterize millions of intra-host viral variants in thousands of infected individuals. However, our understanding of mechanisms of disease spread and viral evolution are still limited due to the lack of computational methods for processing, integration and analysis of biomedical big data. The overarching goal of this project is to develop a comprehensive family of innovative algorithms and models that allow to describe, analyze, understand and predict complex multidimensional non-linear disease dynamics. Intellectual Merit: The proposed research will be conducted by an interdisciplinary team comprised of biologists, mathematicians, molecular epidemiologists and computer scientists with extensive expertise in the areas relevant to the project. The project will target highly important epidemiological and biomedical problems including development of efficient and scalable computational methods for surveillance of disease spread, modeling of epidemiological dynamics by incorporation of intra-host and inter-host evolutionary dynamics into a single framework and design of computational tools for utilization of data analysis results by health care professionals. Proposed algorithms and models will be validated using massive molecular and epidemiological data generated by project collaborators from CDC and Georgia Tech, as well as available from public sources. The algorithms will be distributed to the researchers and health care workers as free open-source packages and cloud-based online tools. In particular, they will be incorporated in the Global Health Outbreak and Surveillance Technology, a web-based data analysis system currently being developed at CDC. Research findings will be broadly disseminated via journal publications and conference presentations, including the International Symposium on Bioinformatics Research and Applications and Workshop on Computational Advances in Molecular Epidemiology organized by the Pis.

Public Health Relevance

By providing efficient big data analysis algorithms and new data-driven knowledge about epidemiological dynamics, the project will empower epidemiologists, biomedical researchers and public health practitioners worldwide by facilitating development of global disease surveillance and treatment programs, improving capacity for investigation of viral outbreaks, providing novel means for development effective public health intervention and disease eradication strategies. The developed methods will be applicable and adaptable to wide range of pathogens infecting humans, animals and plants.

National Institute of Health (NIH)
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Peng, Grace
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Georgia Institute of Technology
Schools of Arts and Sciences
United States
Zip Code
Mandric, Igor; Knyazev, Sergey; Zelikovsky, Alex (2018) Repeat-aware evaluation of scaffolding tools. Bioinformatics 34:2530-2537
Tsyvina, Viachaslau; Campo, David S; Sims, Seth et al. (2018) Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants. BMC Bioinformatics 19:360
Skums, Pavel; Zelikovsky, Alex; Singh, Rahul et al. (2018) QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data. Bioinformatics 34:163-170
Audano, Peter A; Ravishankar, Shashidhar; Vannberg, Fredrik O (2018) Mapping-free variant calling using haplotype reconstruction from k-mer frequencies. Bioinformatics 34:1659-1665
Batool, Maliha; Caoili, Salvador Eugenio C; Dangott, Lawrence J et al. (2018) Identification of Surface Epitopes Associated with Protection against Highly Immune-Evasive VlsE-Expressing Lyme Disease Spirochetes. Infect Immun 86:
Longmire, Atkinson G; Sims, Seth; Rytsareva, Inna et al. (2017) GHOST: global hepatitis outbreak and surveillance technology. BMC Genomics 18:916
Glebova, Olga; Knyazev, Sergey; Melnyk, Andrew et al. (2017) Inference of genetic relatedness between viral quasispecies from sequencing data. BMC Genomics 18:918