An award is made to Rutgers, the State University of New Jersey to manage the Protein Data Bank (PDB), an international repository and primary source for information about the structure of biological macromolecules. The PDB is a key research resource and a central component to our understanding of living systems. It serves a broad community of experimental, structural and computational scientists, and educators at all levels. PDB access is provided through primary web and ftp sites (www.pdb.org, ftp.pdb.org) or via multiple mirror sites distributed worldwide. The goal of the Research Collaboratory for Structural Biology (RCSB) in managing PDB is to provide a single, searchable archive of accurate and well-annotated data on experimentally determined macromolecular structure. Systems developed by RCSB are currently being used for data processing, archiving, distribution and query, as well as maintenance of the physical archive. In addition to continued operation of the primary data archiving, ingestion, and dissemination functions, the project will implement several enhancements to the existing infrastructure. First, they will introduce view interfaces to the PDB data aligned to specific user community interests such as model organisms, protein families, or ligand binding sites. Second, they will contribute to standards for quantitative representation of structural biology data via advanced data formats, metadata standards, software design, and data dissemination. Third, they will build upon the strengths of the international wwPDB partnership by introducing integrated data processing frameworks to improve efficiency and automation of data curation across the wwPDB partnership. Finally, the RCSB will continue to work with diverse communities to ensure the PDB resource best serves the interest if science, medicine, and education. The result of this effort will be a significant improvement in the utility and value of the Protein Data Bank for both specialists and non-specialists alike.

Mechanisms for increasing the broader impacts of this work include the following: a series of workshops and meeting sessions, active participation in scientific meetings, a regular newsletter, the on-going use of a help desk to increase the access and utility of PDB for the specialist and non-specialist research community, and use of focus groups to insure maximum usability of PDB data. Participation in graduate and undergraduate coursework design and implementation and in structured research opportunities is being used to increase the impact of PDB on university-level education. Participation in teacher training workshops and a collaborative effort to launch a teacher development program provide opportunities to bring PDB resources to the K/12 environment. The general public is being made aware of PDB efforts through a traveling art exhibit and through frequent news releases and informational programs. Online-based activities like the Molecular Anatomy Project and Molecule of the Month will reach audiences nationally and globally. Collectively, these efforts will extend the benefits of PDB resources and activities to the broadest possible community.

Project Report

The RCSB Protein Data Bank (http://rcsb.org) provides a global resource for the advancement of research and education in biology and medicine by curating, integrating, and disseminating biological macromolecular structural information in the context of function, biological processes, evolution, pathways, and disease states. At its core is the Protein Data Bank (PDB) archive, a key repository of information describing proteins, nucleic acids, and other important biological macromolecular machines that help inform our understanding of biology and medicine. Since 1971, the PDB has provided free access to these important structural data. Along with our Worldwide PDB (wwPDB) collaborators (Protein Data Bank Europe, Protein Data Bank Japan, BioMagResBank (US)), the RCSB PDB curates, annotates, and makes publicly available the PDB data deposited by scientists around the globe. RCSB PDB then promotes and facilitates access to these data through a rich online resource with searching, reporting, and visualization tools for researchers, teachers, and students studying a variety of fields focused on biology: molecular biology, structural biology, computational biology, pharmacology, and others. This information is then streamlined for students, teachers, and the general public at the educational website PDB-101. The RCSB PDB is jointly managed at Rutgers, The State University of New Jersey and the University of California, San Diego. Data Deposition, Processing, and Annotation During the past five years (2009-2013), more than 36,000 structures were submitted to the PDB archive. wwPDB annotators then carefully validate and curate each entry. Data are released into the public archive weekly. The PDB archive will reach a milestone 100,000 entries by the summer of 2014. The structures being deposited are increasingly more complex and larger in size. New experimental methods, and combinations of methods, are also being used to study these structures. These trends all pose challenges to how data are annotated and represented in the PDB. During this report period, the wwPDB developed the next generation of data processing software that provides support for increasing complexity and for different experimental methods. Since it will in use at all wwPDB sites, it supports greater processing uniformity and improved load balancing. This new system, currently in beta testing, will maximize the efficiency and effectiveness of data handling and support for the scientific community going forward. Data Access, Query, and Reporting RCSB PDB services and PDB data are freely available online. As the archive keeper for the wwPDB, the RCSB PDB maintains the PDB archive at ftp://ftp.wwpdb.org. During this report period, more than 26,400 coordinate files and related experimental data files were released. During 2013, PDB data were downloaded from the wwPDB ftp site at ftp://ftp.wwpdb.org and the RCSB PDB website 312,881,488 times. PDB data are also downloaded from wwPDB member FTP and websites at PDBe and PDBj, for total traffic of approximately 434,800,000 hits. On average, RCSB.org was accessed by about 319,000 unique visitors from about 190 countries, transferring 1924 GB of data from the website each month in 2013. These users access tools for query, reporting, and visualization that are integrated with a database that contains data from the PDB archive, data and links from external resources, and pre-calculated data. Searches range from quick queries (author name, ID) to more complicated biological questions. Improvements that support the internal infrastructure and automate tasks are made regularly to improve performance and reliability. Education and Outreach The RCSB PDB supports a variety of audiences. Our outreach efforts aim to inform users about the RCSB PDB while collecting feedback to help develop a powerful resource for science, medicine, and education. Users include biologists from a variety of specialties, scientists from other disciplines, students and educators at all levels, authors and illustrators, and the general public.While the website serves as the primary tool for outreach, staff interact directly with users at international meetings, workshops, presentations, festivals, and more. The RCSB PDB also has an active presence on Twitter (@buildmodels) and Facebook (RCSBPDB). At Rutgers and UCSD, RCSB PDB leaders teach graduate and undergraduate students how to understand and visualize PDB data in the context of biology. Other programs focus on working with students and teachers in middle and high school, such as a recent program Working Together to Visualize that trained NJ high school teachers. In addition, PDB-101 (http://rcsb.org/pdb-101) is an educational view of the RCSB PDB that packages together resources that promote exploration in the world of proteins and nucleic acids for teachers, students, and the general public. It hosts a regular Molecule of the Month column that describes important biological molecules and how their function relates to their structure; a browser to explore structures starting from a high-level biological focus; and educational resources and posters. It also hosts downloadable PDFs that can be used to build 3D paper models of DNA, transfer RNA, HIV capsid and dengue virus.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Cooperative Agreement (Coop)
Application #
0829586
Program Officer
Peter H. McCartney
Project Start
Project End
Budget Start
2009-03-01
Budget End
2014-02-28
Support Year
Fiscal Year
2008
Total Cost
$32,076,399
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901