The primary goal of this project will be to revitalize the Structural Classification of Proteins (SCOP) and ASTRAL databases, both in order to better serve the needs of both current users and the larger scientific community. Both databases provide carefully curated resources that are widely used by biologists to explore remote homologs of proteins of interest, and by computational biologists as a "gold standard" for benchmarking prediction algorithms. However, neither database has changed its basic design since early releases-15 years ago in the case of SCOP. We will redesign the internals of both databases in order to account for aspects of protein evolution that were not appreciated at the time the databases were first created, such as metamorphic proteins and homologous proteins that have evolved different folds. We will also develop a unified interface to both databases that will allow scientists to easily find and focus on proteins or families of interest, as the current hierarchical view is increasingly unwieldy as more structures are added. Since the process of SCOP curation has become a bottleneck due to the large number of structures being solved today, we plan to build automated tools to assist in the classification. We will also create interfaces to allow biologists to submit sequences or structures for automated classification using the latter tools. In many cases, this would enable structural biologists to gain insight into a protein's evolution or function prior to publication of a newly solved structure.
The SCOP and ASTRAL databases are carefully curated resources that are widely used by biologists to explore the structure, function, and evolution of protein families of interest, and by computational biologists as a gold standard for benchmarking prediction algorithms. Protein structures are essential to many modern studies of pathogens and human diseases, and medical conditions are being rapidly linked to specific mutations. Although the flood of structural information threatens to overwhelm our current capacity for analysis, our proposed changes to the curation procedures for SCOP and ASTRAL, and improvements to the underlying structure of the databases, will allow these resources to continue to yield biological and medical insight for many years to come.
|Fox, Naomi K; Brenner, Steven E; Chandonia, John-Marc (2014) SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42:D304-9|
|Zhi, Degui; Shatsky, Maxim; Brenner, Steven E (2010) Alignment-free local structural search by writhe decomposition. Bioinformatics 26:1176-84|
|Shatsky, Maxim; Hall, Richard J; Brenner, Steven E et al. (2009) A method for the alignment of heterogeneous macromolecules from electron microscopy. J Struct Biol 166:67-78|
|Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind et al. (2008) Structural genomics of minimal organisms: pipeline and results. Methods Mol Biol 426:475-96|
|Andreeva, Antonina; Howorth, Dave; Chandonia, John-Marc et al. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36:D419-25|
|Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B et al. (2007) The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 5:e16|
|Chandonia, John-Marc (2007) StrBioLib: a Java library for development of custom computational structural biology applications. Bioinformatics 23:2018-20|
|Lareau, Liana F; Brooks, Angela N; Soergel, David A W et al. (2007) The coupling of alternative splicing and nonsense-mediated mRNA decay. Adv Exp Med Biol 623:190-211|
|Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc et al. (2007) Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center. J Struct Funct Genomics 8:99-105|
|Chandonia, John-Marc; Brenner, Steven E (2006) The impact of structural genomics: expectations and outcomes. Science 311:347-51|