Many of the most important human pathogens including the malaria parasite, HIV, and the pneumococcus, are characterized by extensive genetic diversity generated by recombination. In order to design of effective vaccines and long-lasting drug regimens, it is critical that we understand how this diversity relates to the epidemiological dynamics of disease in human populations. While high throughput sequencing techniques are generating vast volumes of genomic sequence data from these pathogens, the analytical tools capable of making sense of them are severely limited. One of the most pressing problems is the lack of tools that can deal with these pathogens'high rates of recombination, particularly when considering the vast genetic datasets now being generated by high throughput sequencing techniques. This project represents an interdisciplinary collaboration between computer scientists and malaria biologists, bringing together deep expertise on network analysis, computational methods, epidemiology, evolutionary biology and malaria, to develop a new suite of scalable, general computational tools for visualizing and analyzing recombinant gene sequences, using the malaria parasite as a case study. Drawing on recent advances in the field of network science, the project will develop novel methods for accurately inferring "recombination networks" from sequence data, automatically identifying statistically significant "clusters" in these networks, and testing their epidemiological significance. Our approach focuses on alignment-free analysis methods, which naturally accommodate the recombinant shuffling of sequences, allows for the analysis of structural features of the relationships between genes, and provides insights into the effects of recombination on their evolution. In additional to answering important biological and epidemiological questions, this project will produce a novel open-source software platform that will enable researchers to analyze recombinant sequence data from a wide variety of important human pathogens.
Many important pathogens of humans are characterized by significant genetic diversity generated by high rates of genetic exchange, and this allows them to evolve rapidly in the face of biomedical interventions. Currently, the design of vaccines and drugs and an understanding of their likely impacts on public health are hampered by a lack of computational tools that take these high rates of genetic exchange into account. This project aims to develop new computational methods to analyze large datasets of diverse gene sequence data, drawing on recent methodological advances in network science and using the human malaria parasite - an important global pathogen affecting 40% of the world's population - as a case study.
|Larremore, Daniel B; Clauset, Aaron; Jacobs, Abigail Z (2014) Efficiently inferring community structure in bipartite networks. Phys Rev E Stat Nonlin Soft Matter Phys 90:012805|
|Larremore, Daniel B; Shew, Woodrow L; Ott, Edward et al. (2014) Inhibition causes ceaseless dynamics in networks of excitable nodes. Phys Rev Lett 112:138103|
|Larremore, Daniel B; Clauset, Aaron; Buckee, Caroline O (2013) A network approach to analyzing highly recombinant malaria parasite genes. PLoS Comput Biol 9:e1003268|