Modeling three-dimensional structure of protein molecules is of clear biomedical importance, driven by two powerful forces. First is the realization that proteins carry out almost all essential functional and structural tasks in living systems by virtue of their folded shape;almost all drugs depend on a small molecule inhibiting a malfunctioning protein through shape complementarity in three-dimensions. Second is the rapid determination of genomic protein sequence data, doubling in the past 28 months, and complemented by equally rapid determination of novel protein structural data;structural coverage of sequences (percentage of sequences with some structural information) is over 50% and is increasing thanks to structural genomics initiatives. This proposal continues previous aims by developing and improving methods for accurate homology modeling (have known structure of a related sequence). Current aims extend to the general problem of ab initio structure prediction (no structure of any related sequence). Such extension is possible due to recent progress and a realization that both homology modeling and ab initio structure prediction share a common philosophy rooted in the decoy / discriminate paradigm we pioneered in 1995. Specifically, both protein modeling and structure prediction have four inter-related stages: (a) Formulation of energy functions, (b) Application of move sets, (c) Generation of decoy structures and (d) Assessment of predicted structures. These four steps are iterated to improve both decoys and energy functions so as to obtain ever better predicted structures. Analysis of experimentally determined sequences and structures goes hand in hand with this planned modeling to give as an over-view of the extent of the problem and the progress made in the field. Drawn to such an analysis in the previous funding period, we expect to continue this activity with particular focus on the 'dark matter', those sequences for which we have least information. We are well-aware that these are ambitious aims but are encouraged by recent progress. Our methodology uses knowledge-based or statistical energy functions, but our philosophy is very rooted in the physical nature of the systems. As such, our work will have far-reaching applications to theoretical studies of molecular function including ligand binding modeling, protein-protein interaction modeling and more general simulation of protein function. Our five specific aims are: (1) Better knowledge-based energy functions, (2) General and novel move sets, (3) Decoy generation by uniform sampling and powerful search and (4) Assessment of structures to reveal deficiencies and (5) Analysis of uncharacterized sequence in terms of clustering sequence domains into new families. Achieving these aims will advance our fundamental understanding of the molecular structure: predicted molecular structure can guide experiments and lead to further understanding of molecular mechanisms.

Public Health Relevance

Modeling three-dimensional structures of protein molecules is of clear biomedical importance: (1) proteins carry out almost all essential functional and structural tasks in living systems by virtue of their folded shape (almost all drugs depend on a small molecule binding to and inhibiting a malfunctioning protein through shape complementarity in three-dimensions);and (2) the rapid growth of genomic protein sequence data, doubling in the past 28 months. This proposal continues previous aims by developing improved methods for accurate homology modeling (have known structure of a related sequence) and also extends the aims to the general problem of ab initio structure prediction (no structure of any related sequence).

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM063817-12
Application #
8312540
Study Section
Macromolecular Structure and Function D Study Section (MSFD)
Program Officer
Edmonds, Charles G
Project Start
2001-08-01
Project End
2013-07-31
Budget Start
2012-08-01
Budget End
2013-07-31
Support Year
12
Fiscal Year
2012
Total Cost
$331,748
Indirect Cost
$124,405
Name
Stanford University
Department
Biology
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94305
Scaiewicz, Andrea; Levitt, Michael (2015) The language of the protein universe. Curr Opin Genet Dev 35:50-6
Yanover, Chen; Vanetik, Natalia; Levitt, Michael et al. (2014) Redundancy-weighting for better inference of protein structural features. Bioinformatics 30:2295-301
Khoury, George A; Liwo, Adam; Khatib, Firas et al. (2014) WeFold: a coopetition for protein structure prediction. Proteins 82:1850-68
Schröder, Gunnar F; Levitt, Michael; Brunger, Axel T (2014) Deformable elastic network refinement for low-resolution macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 70:2241-55
Silva, Daniel-Adriano; Weiss, Dahlia R; Pardo Avila, Fátima et al. (2014) Millisecond dynamics of RNA polymerase II translocation at atomic resolution. Proc Natl Acad Sci U S A 111:7665-70
Minary, Peter; Levitt, Michael (2014) Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A 111:6293-8
Levitt, Michael (2014) Birth and future of multiscale modeling for macromolecular systems (Nobel Lecture). Angew Chem Int Ed Engl 53:10006-18
Kalisman, Nir; Schroder, Gunnar F; Levitt, Michael (2013) The crystal structures of the eukaryotic chaperonin CCT reveal its functional partitioning. Structure 21:540-9
Murakami, Kenji; Elmlund, Hans; Kalisman, Nir et al. (2013) Architecture of an RNA polymerase II transcription pre-initiation complex. Science 342:1238724
Kolodny, Rachel; Pereyaslavets, Leonid; Samson, Abraham O et al. (2013) On the universe of protein folds. Annu Rev Biophys 42:559-82

Showing the most recent 10 out of 47 publications