The Protein Data Bank (PDB) is the single global archive of three-dimensional (3D) structures of large biological molecules. Despite a steady increase in its holdings, the growth of the PDB is far outstripped by the growth in the available protein sequence data. Resources like Genome3D (genome3d.eu), funded by the BBSRC, aim to fill the gap in structure coverage of the protein sequence space with reliable predictions of structures. These approaches largely model proteins that are closely related to a protein of known structure. The Rosetta method for predicting protein structures, a world-leading approach developed by the Baker lab in the USA, was recently enhanced with information derived from evolutionary analyses of protein sequence data, yielding reliable models even for cases where sequence identity between the model and the available experimental structures is very low. This project will integrate Rosetta models into Genome3D to expand the coverage of structural data for important organisms for health and food security. It will also enrich both the experimentally determined and computationally predicted structures with valuable functional annotations, such as information pertaining to surface interfaces, a key ingredient in understanding how proteins interact with each other and with other biological molecules. By focusing on proteins dissimilar to those with known structures, this portal will help fill the gaps in structure coverage of the protein sequence space and will make structure data much more readily available and accessible. Finally, novel visualization tools integrating the presentation of the predicted and experimentally determined structures will be developed, maintaining a clear distinction between what is predicted and what is experimentally determined.

The expanded set of 3D models derived from this project will in turn help to expand the coverage of sequence space even further, since these models can be used to guide the experimental determination of protein structures being obtained by powerful new structural biology techniques like cryo-Electron Microscopy (EM). This project will also endeavor, where possible, to improve the assembly of individual protein structures into macromolecular complexes which can be analyzed to determine their biological role. Scientists in both academia and industrial sectors will benefit from access to such an integrated portal, assisting them in designing new medicines, understanding the mechanism of disease, or in designing proteins with novel properties. Recent advances in Electron Microscopy allows near routine determination of structures of large molecular machines and is in need of a large repertoire of "building blocks" in interpreting the experimental results, a need which will be partially addressed by the new portal and its provision of expanded domain structure libraries. The portal will also have ways to access the assembled data programmatically, benefiting power users: software developers and maintainers of other resources.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1937533
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2019-08-01
Budget End
2022-07-31
Support Year
Fiscal Year
2019
Total Cost
$473,564
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195