Recent advances in genome sequencing technologies have enabled the sequencing of bacteria directly from the environment, providing a broader outlook on the diversity of bacteria than ever before possible. Recent studies of environmental samples have revealed complex communities containing many previously unknown species, and uncovered a large amount of genetic variation and diversity even among closely related strains. Characterizing this genomic variation is critical in studies of microbial ecology and evolution, yet currently available computational tools, originally developed for the study of single organisms, are ill-suited for this task.
This proposal aims to develop the theoretical and computational infrastructure for the study of genomic variation within mixtures of organisms. The proposed research relies on both theoretical and empirical analyses of the structure of genome assembly graphs in order to characterize graph signatures that are correlated with intra- and inter- species polymorphisms. A particular focus is placed on understanding and using the information provided by next generation sequencing technologies as well as other high-throughput experimental techniques. The proposed work provides critical analysis tools to help biologists explore the genetic variation within the environment.
Additional information about this project is available at www.cbcb.umd.edu/research/Genomic_Variation.shtml.
This project covered the development of algorithms for analyzing genome assembly graphs with the goal of uncovering signatures of genomic variation, as found, for example, in a mixture of organisms some of which contain specific genes important for their adaptation to the environment. The project has resulted in over 15 publications in peer reviewed journals and conferences, and the related research has contributed to a better understanding on genome assembly and its limitations. In addition, several software packages were developed and made available open-source to the community, including a novel metagenomic assembly pipeline metAMOS - the only tool that can actually discover genomic variation in metagenomic data. This project has also directly and indirectly contributed to the training of six graduate students, several of whom have graduated and pursued academic and industry positions. In addition, this award allowed us to initiate a summer internship program, which is still ongoing, and which has trained over 20 undergraduate and highschool students. To summarize, our project has had a direct impact in our field, both through the development of new ideas, algorithms, and software, as well as to biologists who can use the tools we developed. In addition, our work has had a significant impact in the training of the next generation of scientists at several levels in their academic career, from highschool to post-graduate studies.