Given the key role that crystallography plays in structural biology, protein crystallization remains a significant bottleneck affecting a broad range of research programs. We developing a data-mining framework, PROSPERO, that can predict crystallization success and crystal diffraction quality based on characterization of specific protein properties available prior to large-scale crystallization trials. This will increase the efficiency and overall success rates of diffraction studies by individual research programs as well as by genome-scale projects. PROSPERO will perform a meta-analysis of many individual predictors based on statistical and machine-learning methods. A key feature of this framework is that it can dynamically re-estimate success/failure rates based on the current contents of the underlying database, and on the set of physical characterization data provided by individual users. The design will be modular, in that we will define a standard set of application interfaces (APIs) for supplying new categories of data to the core data storage, meta-analysis and prediction components. This will allow use of PROSPERO to be tailored to individual research programs, to target-specific physical properties, and to incorporate new physical characterization techniques. Our long-term goal is to grow a user community that will benefit from the continually improving predictions made by a central PROSPERO web server, that will contribute new input modules based on data produced by standard laboratory protocols and apparatus, and will also contribute to the population of the underlying database of results used for prediction.

Public Health Relevance

X-ray crystallography is a core technique in fundamental research programs that seek to understand disease mechanisms based on the three-dimensional structure of individual proteins, of large multi-protein complexes, and of larger assemblies of proteins and nucleic acids into key components of the cell. It is also a core technique in highly targeted research programs such as the design of new drugs. This work will increase the efficiency and overall success rate of these research programs by ameliorating a key bottleneck, the difficulty of obtaining high-quality crystals of the biological entity being studied.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21GM088518-01
Application #
7707893
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Edmonds, Charles G
Project Start
2009-08-01
Project End
2011-07-31
Budget Start
2009-08-01
Budget End
2010-07-31
Support Year
1
Fiscal Year
2009
Total Cost
$195,000
Indirect Cost
Name
University of Washington
Department
Biochemistry
Type
Schools of Medicine
DUNS #
605799469
City
Seattle
State
WA
Country
United States
Zip Code
98195
Zucker, Frank H; Kim, Hae Young; Merritt, Ethan A (2012) PROSPERO: online prediction of crystallographic success from experimental results and sequence. J Appl Crystallogr 45:598-602
Zucker, Frank H; Stewart, Christine; dela Rosa, Jaclyn et al. (2010) Prediction of protein crystallization outcome using a hybrid method. J Struct Biol 171:64-73