Complexes of proteins and nucleic acids are the macromolecular machines at the heart of modern structural biology. We believe that structures of large macromolecular complexes can be solved with less experimental data and at higher throughput. Structures are still solved using methods invented decades ago and model- dependency causes severe problems for large systems solved at low-resolution. Preliminary studies done during the previous funding period show that a way forward is to build a very large number of different models and then test these models directly against the experimental data. This approach has allowed us to assign sequence to a known backbone using much less data than is the norm. Preliminary results show that with suitable built-in statistical controls, this unbiased approach works well for both low-resolution X-ray data as well as mass spectrometry with a small number of experimental cross-links. Our approach is innovative and it determined the detailed atomic structure of chaperonin CCT/TRiC, a 950 kilodalton, 8-gene quasi- degenerate system that could not be solved by conventional methods of cryo-EM or X-ray crystallography. Driven by the central hypothesis that """"""""unbiased methods solve structures with less information and at higher throughput"""""""", we have 3 specific aims: 1. Facilitate structure determination by cross-linking and mass spectrometry (XL+MS). With optimized protocols, XL+MS will be applied to the PIC, RIG-I and RdRp systems studied by colleagues at Stanford. 2. Determine and refine spatial-arrangement of macromolecular domains and subunits with cryo- electron microscopy (cryo-EM). After calibrating methods on open form chaperonin CCT, they will be applied to the systems above to simultaneously fit both mass spec and cryo-EM data. 3. Position side chains with R-value exploration of low-resolution X-ray data. All-atom combinatorial homology models will be generated using best practices consistent with the need to generate millions of models. The fit of calculated model X-ray data and that observed (the R-value) will be optimized in an attempt to assign amino acids not seen in low-resolution structures to backbone C-alpha positions. Given the central role of structural biology in medical science, our work if successful, could produce useful structures at higher throughput. With its strong reliance on computational resources, which continue to drop exponentially in cost, these results would be obtained with fewer resources and in less time. Our work would also advance detailed functional and biological studies that are hampered by lack of confidence in side chain positions. Positive impact could be broader in that other problems in structural and systems biology could benefit from the key principles of our approach, namely: eliminate bias by examining millions of possible models that are all equivalent and built to the same consistent specifications. This set of structures then provides a statistical sanity check, showing how much better the best model is than the next best one.

Public Health Relevance

Complexes of protein and nucleic acid molecules are the major machines of living cells, endowing them with clear biomedical importance as drug targets and key cellular control points. Solving their structures is hard and our approach combines computer generation of millions of atomic models that are examined against data from cross-linking mass spectrometry, cryo-electron microscopy and X-ray crystallography. Looking at many models consistently and at the same time avoids the bias of a favorite model and provides a sound statistical framework. We aim to establish that X-ray structures can be solved at low-resolution to reveal atomic detail missed by conventional methods. We expect to be able to determine structures of macromolecular complexes using less data and at higher throughput.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Edmonds, Charles G
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Scaiewicz, Andrea; Levitt, Michael (2015) The language of the protein universe. Curr Opin Genet Dev 35:50-6
Yanover, Chen; Vanetik, Natalia; Levitt, Michael et al. (2014) Redundancy-weighting for better inference of protein structural features. Bioinformatics 30:2295-301
Silva, Daniel-Adriano; Weiss, Dahlia R; Pardo Avila, Fátima et al. (2014) Millisecond dynamics of RNA polymerase II translocation at atomic resolution. Proc Natl Acad Sci U S A 111:7665-70
Khoury, George A; Liwo, Adam; Khatib, Firas et al. (2014) WeFold: a coopetition for protein structure prediction. Proteins 82:1850-68
Schröder, Gunnar F; Levitt, Michael; Brunger, Axel T (2014) Deformable elastic network refinement for low-resolution macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 70:2241-55
Levitt, Michael (2014) Birth and future of multiscale modeling for macromolecular systems (Nobel Lecture). Angew Chem Int Ed Engl 53:10006-18
Minary, Peter; Levitt, Michael (2014) Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A 111:6293-8
Kalisman, Nir; Schroder, Gunnar F; Levitt, Michael (2013) The crystal structures of the eukaryotic chaperonin CCT reveal its functional partitioning. Structure 21:540-9
Kolodny, Rachel; Pereyaslavets, Leonid; Samson, Abraham O et al. (2013) On the universe of protein folds. Annu Rev Biophys 42:559-82
Murakami, Kenji; Elmlund, Hans; Kalisman, Nir et al. (2013) Architecture of an RNA polymerase II transcription pre-initiation complex. Science 342:1238724

Showing the most recent 10 out of 47 publications