A detailed and accurate understanding of the structure of proteins is one cornerstone of modern biomedical research, and an explicit goal of the NIH is to define the structure of all proteins either by accurate experimental determination or comparative model-building. The most successful structure prediction approaches employ empirical knowledge-based energy terms derived from features of known protein structures - most notably single-residue ???-distributions, backbone-dependent side chain rotamer preferences, and tight packing criteria. One known unrealistic feature of these prediction programs is the assumption of a fixed ideal geometry for the backbone. The driving hypothesis behind this proposal is that there exists a largely unappreciated but real, systematic, significant and pervasive variation in backbone bond angles and peptide planarity that occurs as a function of backbone torsion angles, and accounting properly for this variation will be required to achieve X-ray crystal structure quality for comparative models. The overall goal of this work is to generate accurate empirical values for this covalent variation that will lead to tangible improvements in the accuracy of structures produced by comparative modeling and de novo structure prediction as well as by X-ray crystallography. We propose to achieve this overall goal by pursuing the following three specific aims: 1) to design, develop, and make available a flexibly-searchable database containing bond lengths, bond angles, and torsion angles for all structures known at better than 1.75 ? resolution (currently ~500,000 residues);2) to use conventional query-based and modern machine learning approaches to derive accurate empirical information from the database about the systematic correlation of local conformation with variations in covalent geometry;and 3) to create a modular conformation-dependent expected covalent geometry library and to facilitate its incorporation into leading applications for comparative and crystallographic protein structure modeling. With the dramatically increased number of ultrahigh-resolution resolution crystal structures now known, the time is ripe for construction of this Protein Geometry Database that will provide facile access to a massive treasure trove of reliable and detailed empirical information about protein structure. To be done well, this work will require painstaking attention to detail and an intimate familiarity with the limitations of crystallographic refinement and the principles of protein structure. Dr. Karplus is well-suited to lead this work as he has a 20+-year track record of quality crystallographic structure determinations combined with contributions of more general insights into protein structure, among them being the pioneering characterization of the conformation-dependent variations in covalent geometry that serves as this project's foundation. Collaborations with world-leading groups in structure prediction, in crystallographic refinement and structure validation, and in knowledge-based library development ensure a rapid and effective translation of the gleaned information into improvements in protein modeling.

Public Health Relevance

Proteins are responsible for carrying out most of the processes of life and their function depends exquisitely on their structure, even on the tiniest structural details. For this reason, determining accurate structures of proteins is a cornerstone of modern biomedical research. This work is aimed at leading to a universal improvement in the accuracy with which protein structure can be built.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-CB-M (90))
Program Officer
Hagan, Ann A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Oregon State University
Schools of Arts and Sciences
United States
Zip Code
Hollingsworth, Scott A; Lewis, Matthew C; Karplus, P Andrew (2016) Beyond basins: ?,? preferences of a residue depend heavily on the ?,? values of its neighbors. Protein Sci 25:1757-62
Sharaf, Naima G; Brereton, Andrew E; Byeon, In-Ja L et al. (2016) NMR structure of the HIV-1 reverse transcriptase thumb subdomain. J Biomol NMR 66:273-280
Moriarty, Nigel W; Tronrud, Dale E; Adams, Paul D et al. (2016) A new default restraint library for the protein backbone in Phenix: a conformation-dependent geometry goes mainstream. Acta Crystallogr D Struct Biol 72:176-9
Brereton, Andrew E; Karplus, P Andrew (2016) On the reliability of peptide nonplanarity seen in ultra-high resolution crystal structures. Protein Sci 25:926-32
Li, Wenlin; Kinch, Lisa N; Karplus, P Andrew et al. (2015) ChSeq: A database of chameleon sequences. Protein Sci 24:1075-86
Brereton, Andrew E; Karplus, P Andrew (2015) Native proteins trap high-energy transit conformations. Sci Adv 1:e1501188
Karplus, P Andrew; Diederichs, Kay (2015) Assessing and maximizing data quality in macromolecular crystallography. Curr Opin Struct Biol 34:60-8
Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew (2015) Residue-level global and local ensemble-ensemble comparisons of protein domains. Protein Sci 24:1528-42
Moriarty, Nigel W; Tronrud, Dale E; Adams, Paul D et al. (2014) Conformation-dependent backbone geometry restraints set a new standard for protein crystallographic refinement. FEBS J 281:4061-71
Diederichs, K; Karplus, P A (2013) Better models by discarding data? Acta Crystallogr D Biol Crystallogr 69:1215-22

Showing the most recent 10 out of 24 publications