Modern biomedical research is increasingly making use of genome-scale data from next-generation sequencing platforms, including Illumina HiSeq and MiSeq machines and Pacific Biosciences SMRT. These platforms make it possible for individual labs to quickly and cheaply generate vast amounts of genomic and transcriptomic data from de novo sequencing, resequencing, ChIP-seq, mRNA-seq, and allelotyping experiments. Despite this ability to quickly generate large data sets, biologists are rarely traine in the computational and statistical techniques necessary to make sense of this data. Thus, many researchers must rely on others - often computational scientists with little biological training - to design and implement appropriate data reduction and data mining techniques. Moreover, most institutions do not have access to the substantial computational resources necessary to run these analyses. We will continue to help bridge this gap with a short, two-week intensive summer course, by teaching biomedical researchers to (1) run analyses on remote UNIX servers hosted in the Amazon Web Services "cloud";(2) perform mapping and assembly on large short-read data sets;(3) tackle specific biological problems with existing short-read data;and (4) design computational pipelines capable of addressing their own research questions. This will be accomplished by in-depth hands- on practical training in the relevant techniques. Our experience, confirmed by assessment, is that this practical training leads to a substantial improvement in the basic computational sophistication of participants. We believe that in the long term our cadre and those of other courses will contribute to a significant improvement in the general area of data-driven biology. This short course will continue to help train the current and next generation of independent biomedical researchers in basic computational thinking and procedure, as well as teaching them how to make use of scalable Internet computing resources for their own research. Moreover, we will continue to develop and extend our extensive online materials, which are freely available online and widely used. Our end goal is increase the efficiency and sophistication with which biomedical researchers make use of novel sequencing technologies. For this renewal, we propose to continue offering the course at a low cost;expand our RNAseq discussion;address student needs by expanding the available materials for learning programming and UNIX;and increase our statistics component significantly.

Public Health Relevance

Many biomedical researchers are not trained in computational tools that would help them make use of genomic and other bioinformatics data. We propose to continue teaching a two-week short course for advanced researchers that will help train them to take advantage of sequence data in their research.

Agency
National Institute of Health (NIH)
Type
Education Projects (R25)
Project #
5R25HG006243-04
Application #
8728301
Study Section
Ethical, Legal, Social Implications Review Committee (GNOM)
Program Officer
Wellington, Christopher
Project Start
Project End
Budget Start
Budget End
Support Year
4
Fiscal Year
2014
Total Cost
Indirect Cost
Name
Michigan State University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
City
East Lansing
State
MI
Country
United States
Zip Code
48824