The aim of this proposal is to develop, pilot test, and deliver a new open access modular educational curriculum, Enabling Data Science in Biology (eDSB). eDSB will cover concepts, approaches and requirements for developing and managing the full data pipeline for a curated public archive of biological experimental data contributed by large community of data providers. The Program Director and Program Faculty responsible for developing and delivering the eDSB curriculum are experienced structural biologists, data scientists, and educators drawn from the senior ranks of the Research Collaboratory for Bioinformatics (RCSB). The RCSB develops and manages a number of data resources including the Protein Data Bank (PDB, with international partners), EMDataBank (with international partners), the Structural Biology Knowledgebase, and the Nucleic Acid Database, and develops educational materials and curricula to promote data resource usage. The RCSB built the infrastructure for these data resources and has successfully managed them over the past 20 years, during the course of a rapid expansion in the area of Structural Biology. eDSB will make best practices recommendations for data resource management based on the extensive experience accumulated by the RCSB team. The RCSB team is highly motivated to transfer its knowledge to new data resource builders and providers. The intended eDSB audience includes librarians and information specialists, who will be able to use the materials as a basis for training and services offered by their organizations, and scientists for self-instructin. In addition, the eDSB curriculum will allow the RCSB and its international partners to catalyze formation of a proposed federated system of model and data archives that will accelerate progress in the realm of Integrative Structural Biology. The eDSB curriculum will be divided into eight Modules that can either be studied separately or assembled into a complete set as an open online course.

Public Health Relevance

The Protein Data Bank (PDB) is a public data archive that has yielded powerful advances in diagnostics and treatments for a wide range of diseases. This project will create an educational curriculum and course that is based on PDB management practices that will train information and research scientists to create and maintain new data archives in a manner that will interoperate and enable further advancements to human health and disease.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Education Projects (R25)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Rutgers University
Schools of Arts and Sciences
United States
Zip Code