Many of the most important human pathogens including the malaria parasite, HIV, and the pneumococcus, are characterized by extensive genetic diversity generated by recombination. In order to design of effective vaccines and long-lasting drug regimens, it is critical that we understand how this diversity relates to the epidemiological dynamics of disease in human populations. While high throughput sequencing techniques are generating vast volumes of genomic sequence data from these pathogens, the analytical tools capable of making sense of them are severely limited. One of the most pressing problems is the lack of tools that can deal with these pathogens'high rates of recombination, particularly when considering the vast genetic datasets now being generated by high throughput sequencing techniques. This project represents an interdisciplinary collaboration between computer scientists and malaria biologists, bringing together deep expertise on network analysis, computational methods, epidemiology, evolutionary biology and malaria, to develop a new suite of scalable, general computational tools for visualizing and analyzing recombinant gene sequences, using the malaria parasite as a case study. Drawing on recent advances in the field of network science, the project will develop novel methods for accurately inferring "recombination networks" from sequence data, automatically identifying statistically significant "clusters" in these networks, and testing their epidemiological significance. Our approach focuses on alignment-free analysis methods, which naturally accommodate the recombinant shuffling of sequences, allows for the analysis of structural features of the relationships between genes, and provides insights into the effects of recombination on their evolution. In additional to answering important biological and epidemiological questions, this project will produce a novel open-source software platform that will enable researchers to analyze recombinant sequence data from a wide variety of important human pathogens.

Public Health Relevance

Many important pathogens of humans are characterized by significant genetic diversity generated by high rates of genetic exchange, and this allows them to evolve rapidly in the face of biomedical interventions. Currently, the design of vaccines and drugs and an understanding of their likely impacts on public health are hampered by a lack of computational tools that take these high rates of genetic exchange into account. This project aims to develop new computational methods to analyze large datasets of diverse gene sequence data, drawing on recent methodological advances in network science and using the human malaria parasite - an important global pathogen affecting 40% of the world's population - as a case study.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21GM100207-01A1
Application #
8442788
Study Section
Modeling and Analysis of Biological Systems Study Section (MABS)
Program Officer
Eckstrand, Irene A
Project Start
2013-02-01
Project End
2015-01-31
Budget Start
2013-02-01
Budget End
2014-01-31
Support Year
1
Fiscal Year
2013
Total Cost
$209,334
Indirect Cost
$38,153
Name
Harvard University
Department
Public Health & Prev Medicine
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115
Larremore, Daniel B; Clauset, Aaron; Jacobs, Abigail Z (2014) Efficiently inferring community structure in bipartite networks. Phys Rev E Stat Nonlin Soft Matter Phys 90:012805
Larremore, Daniel B; Shew, Woodrow L; Ott, Edward et al. (2014) Inhibition causes ceaseless dynamics in networks of excitable nodes. Phys Rev Lett 112:138103
Larremore, Daniel B; Clauset, Aaron; Buckee, Caroline O (2013) A network approach to analyzing highly recombinant malaria parasite genes. PLoS Comput Biol 9:e1003268