Macromolecular crystal structures are highly reliable scientific information, but they can still often contain local errors that hurt their crucial role or applications such as understanding enzymatic reactions, molecular machines, drug design, or subtle details of molecular specificity. The criteria and services of the MolProbity web site and related resources are considered by the crystallographic community to be the current best and most constructively critical system for validating and improving macromolecular structure accuracy. All four model-validation criteria on the prominent new summary plots in wwPDB web pages and reports for journal referees are from MolProbity (clashscore, Ramachandran, rotamer, and RNA backbone). MolProbity is now fully integrated into the widely used Phenix crystallography software system, and many of the criteria and underlying programs are part of interactive rebuilding (Coot) and of combo validation servers run by structural genomics or industrial groups. With this vote of confidence and the resulting wide usage, new deposits to the PDB now score >50% better than before 2003 on MolProbity's unique diagnostic of all-atom clashscore. It really matters to lower clashscore, because each clash is a physically impossible error in the model, and until it is successfully corrected you can't know whether the causative error was trivial or major. All of this puts us in a position of challenging responsibility, since alse positives pollute the database and hurt science, while false negatives are unfair criticism. Doing this job truly well will require extreme attention to detail, responsiveness to changing needs, and innovative further development -- the goals of this proposal. Better accuracy at low resolution is an urgent need as atomic-level molecular structure reaches upward in size, complexity, and dynamics to cell biology, control-system interactions, and molecular machines. We propose to develop transformative new protocols tuned for low resolution, such as diagnosis of sequence misalignments, of disguised secondary structures and motifs, and of 7-D RNA backbone conformers from the most reliable model features. Now that automated software for structural biology has become increasingly proficient and more blindly relied upon, we are catching new sources of systematic error or bias. These are unexpected consequences of reasonable coding choices, or of data ambiguities unrecognized by users, and can potentially effect thousands of structures and the conclusions drawn from them. We propose to identify a problem, flag its outliers, figure out the cause, and then find a means of fix-up, preferably by avoiding it in the first place. The worldwide Protein Data Bank (wwPDB), the single repository for protein and nucleic acid 3D structures, is in the process of enhancing their official structure validation for both X-ray and NMR. That demands ongoing changes and coordination for MolProbity, and the provision of easily accessible guidance on user interpretation of validation results, especially fo journal referees.

Public Health Relevance

The 3D crystal structures of proteins and nucleic acids, and of their complexes that form big dynamic 'molecular machines', provide transformative information both for fundamental biology and for modern biomedicine. Validation and improvement of the many thousands of experimental 3D structures produced each year is one crucial component of that research effort, especially for its assurance of reliability to the many end-users. Over the last decade, general adoption by the international structural biology community (including the worldwide Protein Data Bank) has defined our MolProbity web service as the current state of the art and the most helpful form of model validation -- handing us a complex and pivotal responsibility that we need and want to fulfill robustly, creatively, and fairly.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM073919-11
Application #
9339700
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Smith, Ward
Project Start
2006-07-01
Project End
2019-08-31
Budget Start
2017-09-01
Budget End
2018-08-31
Support Year
11
Fiscal Year
2017
Total Cost
Indirect Cost
Name
Duke University
Department
Biochemistry
Type
Schools of Medicine
DUNS #
044387793
City
Durham
State
NC
Country
United States
Zip Code
27705
Richardson, Jane S; Williams, Christopher J; Hintze, Bradley J et al. (2018) Model validation: local diagnosis, correction and when to quit. Acta Crystallogr D Struct Biol 74:132-142
Williams, Christopher J; Headd, Jeffrey J; Moriarty, Nigel W et al. (2018) MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci 27:293-315
Richardson, Jane S; Williams, Christopher J; Videau, Lizbeth L et al. (2018) Assessment of detailed conformations suggests strategies for improving cryoEM models: Helix at lower resolution, ensembles, pre-refinement fixups, and validation at multi-residue length scale. J Struct Biol 204:301-312
Hintze, Bradley J; Richardson, Jane S; Richardson, David C (2017) Mismodeled purines: implicit alternates and hidden Hoogsteens. Acta Crystallogr D Struct Biol 73:852-859
Richardson, Jane S; Videau, Lizbeth L; Williams, Christopher J et al. (2017) Broad Analysis of Vicinal Disulfides: Occurrences, Conformations with Cis or with Trans Peptides, and Functional Roles Including Sugar Binding. J Mol Biol 429:1321-1335
Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S et al. (2016) BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design. J Comput Biol 23:413-24
Hintze, Bradley J; Lewis, Steven M; Richardson, Jane S et al. (2016) Molprobity's ultimate rotamer-library distributions for model validation. Proteins 84:1177-89
Jain, Swati; Richardson, David C; Richardson, Jane S (2015) Computational Methods for RNA Structure Validation and Improvement. Methods Enzymol 558:181-212
Zhou, Huiqing; Hintze, Bradley J; Kimsey, Isaac J et al. (2015) New insights into Hoogsteen base pairs in DNA duplexes from a structure-based survey. Nucleic Acids Res 43:3420-33
Kapral, Gary J; Jain, Swati; Noeske, Jonas et al. (2014) New tools provide a second look at HDV ribozyme structure, dynamics and cleavage. Nucleic Acids Res 42:12833-46

Showing the most recent 10 out of 21 publications