Structural biology -- the determination and analysis of macromolecular structures in relation to their biological functions -- rests not only on its core experimental methodologies but also on the computational approaches that make it possible to collect and interpret the relevant data. A primary objective of the SBGrid Research Coordination Network (RCN) is to provide a uniform level of computing grid access and software availability to all structural-biology groups, including those in geographically underrepresented communities and smaller universities. The RCN will consolidate the high-performance computing resources of a large number of participating laboratories and build a bridge between the structural biology and the physical-sciences communities, through federation with Open Science Grid. The award will enhance the capacity of individual laboratories to solve challenging and ambitious structural biology problems, particularly ones that require use of multiple experimental technologies. The network will adapt a number of structural?biology applications to take advantage of grid resources and contemporary hardware technologies. It will encourage development of parallelized, grid-compatible software in affiliated laboratories, and will coordinate their installation, testing, and distribution. In this connection, the RCN will also support short courses in computer programming for structural biologists. By enabling access to a wide array of institutions and investigators, this RCN will have significant broader impacts.

Project Report

Structural biology -- the determination and analysis of macromolecular structures in relation to their biological functions -- rests not only on its core experimental methodologies (especially x- ray crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy) but also on the computational approaches that make it possible to collect and interpret the relevant data. SBGrid is an open consortium of 245 research groups that cooperate to create and maintain a homogeneous structural-biology computing infrastructure. SBGrid operates out of Piotr Sliz's research group in the Department of Biological Chemistry and Molecular Pharmacology at Harvard Medical School. On-going SBGrid operations are funded through yearly membership fees from participating research groups. Reliance on membership fees promotes the sustainability of the collaboration, independent of the typical five-year funding cycle, and helps assure SBGrid management remains responsive to community needs (Morin et al, eLIFE, 2013). Exploitation of new resources and exploratory development can be funded through competitive grant applications. In 2008, with support from the National Science Foundation, SBGrid established a Research Coordination Network. Specific aims of this activity included, 1) integration of the SBGrid computing resources with the opportunistic resources of the Open Science Grid (OSG), 2) development of a Science Portal to provide a standard framework that can support specialized, computing-intensive structure-determination computations, 3) outreach and educational activities focused on broadening access to computational resources and creation of new computational tools. The goals defined in the initial proposal were accomplished. In 2009 SBGrid established an OSG Virtual Organization. Subsequently, we developed a robust job scheduling system to support the submission of computational jobs from Harvard Medical School to computing facilities that are federated with OSG. We have been using this system to complete up to 5,000,000 CPU hours per year and, once deployed, the system required fairly minimal oversight on our part. We also developed a Science Portal that can submit computations to OSG. Two grid-enabled web services were developed. The Wide-Search Molecular Replacement (WSMR; Stokes-Rees and Sliz, PNAS, 2010) service provides a novel method for macromolecular structure determination and completes structure determination by comparing an existing dataset against a database of 100,000 macromolecular structures. The Deformable Elastic Network (DEN, O'Donovan et al., Acta Cryst D, 2011) service can accelerate refinement of macromolecular structures at low-resolution. These services are accessible to members of the structural biology community who can register on the Science Portal, upload their data, and submit computations to OSG. The RCN also pursued exploratory projects. To assist developers with selecting appropriate software licenses, we developed a set of guidelines and published them as a quick guide (Morin et al., PLoS Computational Biology, 2012). We also worked with a number of prominent software creators to develop a proposal that would pave out a number of concrete steps that funders, publishers and research institutions should adapt to support community-wide dissemination, sharing, and publication of scientist-created software and source code (Morin et al., Science, 2012). Moreover, in collaboration with synchrotron beamlines, we developed a prototype system for managing experimental data (Stokes-Rees et al., JSR, 2012). Such a system would complement the existing Protein Data Bank repository of macromolecular structures and would address the data management struggles of the structural biology community. In addition to the infrastructure and methods development activities, the RCN completed a number of educational activities that were geared towards students and structural biologists at large. These activities included workshops on animating scientific data, Mac OS X development, Python programming and structure-based lead discovery and optimization. The RCN also co-organized an EMBO course on Scienti?c Programming and Data Visualization for Structural Biology, which took place in Heidelberg, Germany. More recently the RCN established a webinar series demonstrating the functionality of various structural biology applications and published the completed webinars on a YouTube channel. As part of our outreach program we have recruited several high school and undergraduate students to work with members of our team on various RCN projects and made computational resources available to support graduate projects. The activities pursued by the project had a broad impact on the scientific community. We have demonstrated that by collaborating nationally and internationally it is possible to establish a robust, cost effective and sustainable research-computing infrastructure. We have also demonstrated that integration of computational resources from geographically dispersed academic institutions can create a powerful community resource that stimulates inventions and discovery in the field of structural biology.

Agency
National Science Foundation (NSF)
Institute
Division of Molecular and Cellular Biosciences (MCB)
Application #
0639193
Program Officer
David A. Rockcliffe
Project Start
Project End
Budget Start
2007-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2006
Total Cost
$499,730
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138