Modern biomedical research is increasingly making use of genome-scale data from next-generation sequencing platforms, including Illumina HiSeq and MiSeq machines and Pacific Biosciences SMRT. These platforms make it possible for individual labs to quickly and cheaply generate vast amounts of genomic and transcriptomic data from de novo sequencing, resequencing, ChIP-seq, mRNA-seq, and allelotyping experiments. Despite this ability to quickly generate large data sets, biologists are rarely traine in the computational and statistical techniques necessary to make sense of this data. Thus, many researchers must rely on others - often computational scientists with little biological training - to design and implement appropriate data reduction and data mining techniques. Moreover, most institutions do not have access to the substantial computational resources necessary to run these analyses. We will continue to help bridge this gap with a short, two-week intensive summer course, by teaching biomedical researchers to (1) run analyses on remote UNIX servers hosted in the Amazon Web Services cloud; (2) perform mapping and assembly on large short-read data sets; (3) tackle specific biological problems with existing short-read data; and (4) design computational pipelines capable of addressing their own research questions. This will be accomplished by in-depth hands- on practical training in the relevant techniques. Our experience, confirmed by assessment, is that this practical training leads to a substantial improvement in the basic computational sophistication of participants. We believe that in the long term our cadre and those of other courses will contribute to a significant improvement in the general area of data-driven biology. This short course will continue to help train the current and next generation of independent biomedical researchers in basic computational thinking and procedure, as well as teaching them how to make use of scalable Internet computing resources for their own research. Moreover, we will continue to develop and extend our extensive online materials, which are freely available online and widely used. Our end goal is increase the efficiency and sophistication with which biomedical researchers make use of novel sequencing technologies. For this renewal, we propose to continue offering the course at a low cost; expand our RNAseq discussion; address student needs by expanding the available materials for learning programming and UNIX; and increase our statistics component significantly.
Many biomedical researchers are not trained in computational tools that would help them make use of genomic and other bioinformatics data. We propose to continue teaching a two-week short course for advanced researchers that will help train them to take advantage of sequence data in their research.