Current research in biology is producing increasingly large and complex sets of data. These data could represent, for example: DNA sequences, images of the brain, or number of species in an ecosystem. In each case, unlocking the information within these big data sets requires sophisticated mathematical and computational approaches. The standard curriculum for graduate students in biological sciences was designed well before this data deluge. As a consequence, today's graduate students are not being adequately trained for their future careers. At the same time, there are growing concerns that scientists are sometimes unable to reproduce published findings. This inability often results from poor data analysis strategies. The future success of the US biological research mission hinges on training students to use data analysis approaches that are both rigorous and reproducible. This National Science Foundation Research Traineeship (NRT) award in the Innovations in Graduate Education (IGE) Track to the University of Chicago seeks to meet this need by developing a new and effective approach to the training of early stage graduate students in the quantitative analysis of biological data.
The overarching goal of this program is to teach students to critically evaluate quantitative analysis methods in the scientific literature, and to acquire good programming habits that support reproducibility and rigor in their own research. An interdisciplinary team of quantitative biologists will direct and lead the program, exposing students to the faculty that can advise them in future work. The training program begins with an intensive residential week-long boot camp that brings together students across diverse sub-fields of biology to promote teamwork and prepare them for interdisciplinary research. The boot camp includes introductory tutorials in computer programming, statistics, and modeling in modern biology, as well as more advanced tutorials in statistical approaches to large data sets and practical lessons in organizing and sharing code and data. The boot camp is capped off with a series of workshops in which students apply what they have learned to real biological data spanning a wide range of fields. A subsequent on-campus course builds on and reviews these concepts, and integrates training in rigor and reproducibility with concepts of responsible research. We hypothesize that this program will produce trainees who are well-prepared for the future scientific workforce. We will evaluate the impact of this intervention through quizzes, surveys, and targeted interviews. All teaching materials and data sets used in the workshops will be shared online so that any university can implement a similar training module on their own campus.
The NSF Research Traineeship (NRT) Program is designed to encourage the development and implementation of bold, new, potentially transformative models for STEM graduate education training. The Innovations in Graduate Education Track is dedicated solely to piloting, testing, and evaluating novel, innovative, and potentially transformative approaches to graduate education.