Less than 15 years after the first complete sequencing of a bacterial genome, sequence analysis has now become an integral part of nearly all research areas in biology. Recently, sequencing expenses have dropped sharply due to the affordability of second generation sequencing technology leading to the establishment of an increasing number of small genome sequencing facilities. Despite the increased rate of sequence generation, there has not been a commensurate increase in access to computational resources to support high-quality sequence processing and analysis. In particular, some of the investigators new to the field who are now obtaining next generation sequencing platforms could be insufficiently prepared to take full advantage of their own high-throughput sequencing devices. This project by Drs. W. Florian Fricke and Owen White (Institute for Genome Sciences, Univ MD School of Medicine) is increasing the accessibility of state-of-the-art sequence analysis software to researchers without extensive bioinformatics resources. This project is developing a portable and stand-alone software package, using Virtual Machines (VM), that is incorporating readily available, open source tools for genome analysis. The VM design will provide two main advantages, allowing users to 1) circumvent complex software installations and 2) avoid performance bottlenecks of local computing networks. The VM package will include fully operational bioinformatics pipelines within a single executable file that is compatible with all computer operating systems and makes further software installations unnecessary. The processing of large sequence data can be outsourced to large distributed computing networks called compute clouds. The analysis protocols provided on the VM package will replicate and extend established bioinformatics protocols and include tools for whole genome and metagenome annotation and comparative analysis, including sequence assembly, gene prediction, functional annotation, metabolic pathway reconstruction, and phylogenetic classification. The availability of this open source and cloud-enabled VM package will increase the usability of microbial genome sequencing to a broad user community. The VM package will be made available as a work in progress with at least two trial and four production releases. It will be available for download as an open source software tool through the project webpage (http://clovr.igs.umaryland.edu/).
The VM package is being extensively advertised and documented through publications, conference presentations, the project website and an online blog. In addition, an online seminar (webinar) is being offered through the World Wide Web to teach the basics of microbial genome analysis using a test set of sequence data distributed together with the VM package. Genome analysis can provide significant benefits to many areas of microbial research. The release of next-generation sequencing technologies promotes a new model of affordable, de-centralized microbial sequence analysis with benefits for the entire scientific community. This portable, open source, microbial sequence analysis package is contributing to the success of this model.