Enormous datasets have become a major foundation for biological discovery. As one example, the complete DNA codes of thousands of species of bacteria, plants, and animals have been sequenced over the past 20 years, reshaping the course of fields as diverse as biotechnology, ecology and evolutionary biology, genetic counseling, forensics, and medicine. However, providing comprehensive data-science training for the bioscience workforce has been challenged by the interdisciplinary nature of the field. The National Science Foundation (NRT) award to the University of Colorado Boulder will address this need by producing scientists who are skilled at acquiring large datasets, writing code to interrogate them, modeling the inherent biological principles, and collaborating effectively to apply knowledge across a range of domains. The project anticipates providing hands-on, personalized training to 40 PhD students, including 22 funded trainees, from 12 fields of study including computer science, applied math, physics, engineering, and multiple biological disciplines. The program will foster an open, interdisciplinary, and diverse community of researchers. The trainees will also engage industrial and academic partners to strengthen local outreach while they enhance collaborative data-science research.

Trainees will tackle interdisciplinary research themes that require harnessing complex genomic, RNA science, proteomic, ecological, and social science datasets. They will learn data-driven approaches (data measurements, manipulations, visualizations), computational approaches (automation and simulation), and scientific approaches (causality and inference). The program will include modular curricular elements, cross-discipline laboratory rotations, and a team practicum. The technical data-science curriculum will be complemented by training in interdisciplinary collaboration, including leadership, ethics, collaborative platforms, and cross-discipline communication. The curriculum is tailored to serve students based on their individual backgrounds and technical knowledge, and it is built to transition students from being mentees and participants to mentors and collaborative research leaders as they advance in their graduate career. NRT-funded trainees will be co-advised, with faculty advisors trained in effective co-mentorship. The overall goal of the Integrated Data Science Traineeship is to train each graduate student to be a data producer, a data modeler, and a data collaborator, proficient in the complete life cycle that is essential to generate and understand complex biological data.

The NSF Research Traineeship (NRT) Program is designed to encourage the development and implementation of bold, new potentially transformative models for STEM graduate education training. The program is dedicated to effective training of STEM graduate students in high priority interdisciplinary or convergent research areas through comprehensive traineeship models that are innovative, evidence-based, and aligned with changing workforce and research needs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Graduate Education (DGE)
Type
Standard Grant (Standard)
Application #
2022138
Program Officer
John Weishampel
Project Start
Project End
Budget Start
2020-09-01
Budget End
2025-08-31
Support Year
Fiscal Year
2020
Total Cost
$3,000,000
Indirect Cost
Name
University of Colorado at Boulder
Department
Type
DUNS #
City
Boulder
State
CO
Country
United States
Zip Code
80303