Characterization of environmental bacteria, the vast majority of which elude cultivation, is necessary for many applications including detection of biological warfare and biothreats as well as discovery of rare novel species. Recent advances in DNA amplification technology have enabled whole genome sequencing directly from individual cells without requiring growth in culture. These methods amplify femtograms of DNA extracted from a single bacterial cell into micrograms of DNA needed for current sequencing platforms. Based on these advances, the PI and colleagues developed a specialized software tool called Euler+Velvet-SC for assembling sequencing reads from single cells and applied it to assembly of two known genomes, E. coli and S. aureus, and an unknown marine genome, SAR324 Deltaproteobacterium. These draft de novo single cell assemblies, with no efforts to close gaps and resolve repeats, identify more than 90% of genes. However, a bacterial community may contain millions of cells. There are high-throughput robotic solutions for amplification of every bacterial genome, but naive deep sequencing of every amplified genome is prohibitively time-consuming and costly. Moreover, there are often biological replicates in the environment whose deep sequencing would be redundant. Current cell sorting methods have limited capability in detecting replicates; therefore, efficient adaptive or compressive sequencing strategies that capture almost all species in a bacterial community are needed. The goal is to assemble the genomes of all species in a sample with minimal asymptotic sequencing effort. In this project, adaptive algorithms that iteratively determine the proportion of sequencing from each single cell, co-assemble the sequencing datasets, and compare and cluster them to identify species will be developed. In each iteration, the belief state is updated based on new sequencing data, and this process ends once the belief state reaches a steady state.
This project is expected to enable high-throughput environmental research at single cell resolution, which will have significant impact on renewable energy and public health among top national priorities. W omen-in-bioinformatics events will increase the participation of women in computer science. This project will provide Michigan Louis Stoke Alliance Minority Participation (MI-LSAMP) minority scholars with research experiences through the Summer Research Academy (SURA), an effort targeted primarily to first and second year undergraduate students at Wayne State. A demonstration of genomics concepts will be offered annually to middle-school students who participate in the Computer Science summer camp on the Wayne State University campus. Additionally, the results will be disseminated by providing educational movies posted on YouTube, articles added to Wikipedia, and also regular scientific communications in journals and conferences. The open-source tools developed in this project are expected to enhance the national education and research infrastructure.