Structural biology has entered an era of computational modeling. Computational models serve as vehicles for studying the structure and dynamics of complex biological macromolecules, such as proteins, to better understand the properties and mechanisms of cells. Computational protein modeling, due to its efficiency and scalability, can be used on a genome-wide scale to predict atomic-level three-dimensional protein structures from sequences when experimental structure determination techniques are not feasible or practical. However, computational models often do not reach biologically relevant experimental accuracy, the so-called native states. Computational structure refinement aims at improving these moderately accurate protein models by driving them towards experimental quality. However structure refinement methods often fail to bring models close enough to the native state, and worse, sometimes drive them away from native. This project will develop novel computational and data-driven methods to substantially improve protein structure refinement, bringing protein models closer to the native states. An open access bioinformatics research infrastructure will be developed and publicly disseminated, advancing basic biological research. Additionally, this interdisciplinary project has a deep commitment to enriching knowledge in biomolecular simulation and refinement, benefiting researchers and students in multiple communities at the interface of computing and biology.

This project aims to address the dual barriers of sampling and scoring in structure refinement by exploiting reciprocal coupling of data-driven sampling and deep learning-based scoring. Specifically, new data-driven sampling methods guided by residue-specific and inter-residue restraints with generalized ensemble search will be developed to bias conformational sampling towards the native state. Additionally, novel side-chain oriented high- and intermediate-resolution scoring functions powered by deep learning will be formulated to significantly improve the recognition of native-like conformations. An open access bioinformatics cyberinfrastructure for structure refinement will be developed and deployed by integrating the new sampling and scoring methods, enabling worldwide community of life science researchers to apply these advanced refinement protocols, thereby multiplying the impact of the project on basic biological research. The project facilitates simulation-based learning through the development of PolyFold, a visual simulator for interactive protein structure manipulation and refinement, with an inclusive commitment to engage general public in science and technology. Results of this project, including the open access bioinformatics research and educational resources, can be found at www.eng.auburn.edu/~dzb0050/.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1942692
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2020-08-01
Budget End
2025-07-31
Support Year
Fiscal Year
2019
Total Cost
$447,827
Indirect Cost
Name
Auburn University
Department
Type
DUNS #
City
Auburn
State
AL
Country
United States
Zip Code
36832