Next-generation sequencing (NGS) has revolutionized how research is carried out in many areas of biology by allowing researchers to sequence genomes and transcriptomes on a routine basis. However, the analysis of NGS data remains a major obstacle to the efficient utilization of the technology. While substantial effort has been invested on the development of software dedicated to the individual analysis steps of NGS experiments, insufficient resources are currently available for integrating the individual software components into automated workflows capable of running the analysis of most types of NGS applications from start-to-finish in a time-efficient and reproducible manner. This Development project will address this need by enhancing systemPipeR, a popular R/Bioconductor software package. The project will have significant impact on a wide range of scientific communities and our society at large by allowing to translate NGS data into biologically relevant knowledge in a time-efficient and reproducible manner. This will accelerate many research and discovery projects in academia and industry, where NGS technologies play an important role. Extensive educational resources for interdisciplinary training at the intersect of genome and computational biology will be provided. Training will be offered to scientists, postdoctoral researchers, graduate and undergraduate students. Members of underrepresented groups will participate in all aspects of this project while supporting diversity. Extensive online tutorials will be provided to maximize the educational outreach of the activities.
The specific aims of this project are: (AIM 1) Enhancements to systemPipeR?s user interface and the workflow design framework will greatly simplify the process of running workflows, generating automated reports and designing new workflows. It will also improve user-friendliness to make systemPipeR equally useful for R and non-R users, as well as biologists without any expert knowledge in bioinformatics. The execution plan of this aim includes the design of a central workflow control user interface and the adaptation of new community standards to further increase the reproducibility of analysis workflows. (AIM 2) Automated analysis workflows will be developed for a wide range of additional NGS applications. Most of these workflows will be designed in collaboration with experts of the corresponding NGS application areas. Suggestions from the community will be incorporated as well. Sample templates will be provided for the supported NGS applications to create workflow instances with a single command fully populated with all input data and environment settings. (AIM 3) The project will also have a strong focus on community integration and performance evaluations provided by its current and future users. This includes options for users to contribute code or entire workflows, and extensive training of the target audience to analyze NGS data with systemPipeR and related resources. The URL of the systemPipeR project website is: http://girke.bioinformatics.ucr.edu/systemPipeR.