Macromolecular crystal structures are highly reliable scientific information, but they can still often contain local errors that hurt their crucial role or applications such as understanding enzymatic reactions, molecular machines, drug design, or subtle details of molecular specificity. The criteria and services of the MolProbity web site and related resources are considered by the crystallographic community to be the current best and most constructively critical system for validating and improving macromolecular structure accuracy. All four model-validation criteria on the prominent new summary plots in wwPDB web pages and reports for journal referees are from MolProbity (clashscore, Ramachandran, rotamer, and RNA backbone). MolProbity is now fully integrated into the widely used Phenix crystallography software system, and many of the criteria and underlying programs are part of interactive rebuilding (Coot) and of combo validation servers run by structural genomics or industrial groups. With this vote of confidence and the resulting wide usage, new deposits to the PDB now score >50% better than before 2003 on MolProbity's unique diagnostic of all-atom clashscore. It really matters to lower clashscore, because each clash is a physically impossible error in the model, and until it is successfully corrected you can't know whether the causative error was trivial or major. All of this puts us in a position of challenging responsibility, since alse positives pollute the database and hurt science, while false negatives are unfair criticism. Doing this job truly well will require extreme attention to detail, responsiveness to changing needs, and innovative further development -- the goals of this proposal. Better accuracy at low resolution is an urgent need as atomic-level molecular structure reaches upward in size, complexity, and dynamics to cell biology, control-system interactions, and molecular machines. We propose to develop transformative new protocols tuned for low resolution, such as diagnosis of sequence misalignments, of disguised secondary structures and motifs, and of 7-D RNA backbone conformers from the most reliable model features. Now that automated software for structural biology has become increasingly proficient and more blindly relied upon, we are catching new sources of systematic error or bias. These are unexpected consequences of reasonable coding choices, or of data ambiguities unrecognized by users, and can potentially effect thousands of structures and the conclusions drawn from them. We propose to identify a problem, flag its outliers, figure out the cause, and then find a means of fix-up, preferably by avoiding it in the first place. The worldwide Protein Data Bank (wwPDB), the single repository for protein and nucleic acid 3D structures, is in the process of enhancing their official structure validation for both X-ray and NMR. That demands ongoing changes and coordination for MolProbity, and the provision of easily accessible guidance on user interpretation of validation results, especially fo journal referees.
The 3D crystal structures of proteins and nucleic acids, and of their complexes that form big dynamic 'molecular machines', provide transformative information both for fundamental biology and for modern biomedicine. Validation and improvement of the many thousands of experimental 3D structures produced each year is one crucial component of that research effort, especially for its assurance of reliability to the many end-users. Over the last decade, general adoption by the international structural biology community (including the worldwide Protein Data Bank) has defined our MolProbity web service as the current state of the art and the most helpful form of model validation -- handing us a complex and pivotal responsibility that we need and want to fulfill robustly, creatively, and fairly.
Showing the most recent 10 out of 21 publications