Structural genomics efforts aim to ultimately provide an experimental structure or good theoretical model of every tractable protein encoded by all sequenced genomes. Major efforts in this direction are now beginning, and the PRESAGE database aids coordination among groups and dissemination of results. PRESAGE records experimental structure determination underway (experimental annotations) and structural predictions and models (prediction annotations). As such, it provides a mechanism for coordination among different researchers without requiring centralization. It also aids dissemination of both experimental and computational structural genomics to a broad audience of biologists. PRESAGE was motivated by the need for scientific communication. While historically, structural biologists have often been reluctant to discuss projects underway, this attitude can disastrous when applied to large-scale projects. Already, there has been a duplication (and almost triplication) of effort in studying one protein; and another protein's structure has been solved because it was not apparent that its structure had already been accurately predicted. Early pre-releases of PRESAGE have shown that the American structural genomics community has been surprisingly receptive to sharing information about their experimental targets and results. Major international groups have also recently committed to submit targets to the system. We propose to reengineer PRESAGE from a """"""""proof-of-concept"""""""" prototype to a robust and reliable system. Most significantly, this will involve rewriting the whole database access system, which comprises most of PRESAGE except the user interface. In addition, several specific new services are planned, including customized systems for data collection (designed in collaboration with structural genomics researchers); family neighboring facilities; broader data collection; and flexible query systems with parseable output. We hope that PRESAGE will thus grow as an international resource for both producers of structural genomics data and for all those biologists who can use these data on genomics and protein structure to aid their research.
Smith, Andrew; Chandonia, John-Marc; Brenner, Steven E (2006) ANDY: a general, fault-tolerant tool for database searching on computer clusters. Bioinformatics 22:618-20 |