Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

Shirts, Michael; Chodera, John

Abstract

The study of biomolecular interactions and design of new therapeutics requires accurate physical models of the atomistic interactions between small molecules and biological macromolecules. Over the least few decades, molecular mechanics force ?elds have demonstrated the potential that physical models hold for quantitative biophysical modeling and predictive molecular design. However, a signi?cant technology gap exists in our ability to build force ?elds that achieve high accuracy, can be systematically improved in a statistically robust manner, be extended to new areas of chemistry, can model post-translational and covalent modi?cations, are able to quantify systematic errors in predictions, and can be broadly applied across a high-performance software packages. In this project, we aim to bridge this technology gap to enable new generations of accurate quantitative biomolec- ular modeling and (bio)molecular design for chemical biology and drug discovery.
In Aim 1, we will produce a modern, open infrastructure to enable practitioners to rapidly and conveniently construct and employ accurate and statistically robust physical force ?elds via automated machine learning methods.
In Aim 2, we will construct open, machine-readable experimental and quantum chemical datasets that will accelerate next-generation force ?eld development.
In Aim 3, we will develop statistically robust Bayesian inference techniques to enable the auto- mated construction of type assignment schemes that avoid over?tting and selection of physical functional forms statistically just?ed by the data. This approach will also provide an estimate of the systematic error in predicted properties arising from uncertainty in parameters or functional form choices?generally the dominant source of error?to be quanti?ed with little added expense.
In Aim 4, we will integrate and apply this infrastructure to produce open, transferable, self-consistent force ?elds that achieve high accuracy and broad coverage for modeling small molecule interactions with biomolecules (including unnatural amino or nucleic acids and covalent modi?cations by organic molecules), with the ultimate goal of covering all major biomolecules. This research is signi?cant in that the technology developed in this project has the potential to radically transform the study of biomolecular phenomena by providing highly accurate force ?elds with exceptionally broad chemical coverage via fully consistent parameterization of organic (bio)molecules. In addition, we will produce new tools to automate force ?eld creation and tailoring to speci?c problem domains, quantify the systematic error in predictions, and identify new data for improving force ?eld accuracy. This will greatly improve our ability to study diverse biophysical processes at the molecular level, and to rationally design new small-molecule, protein, and nucleic acid therapeutics. This approach will bring statistical rigor to the ?eld of force ?eld construction and application by providing a means to make data-driven decisions, while enhancing reproducibility by enabling it to become a rigorous and reproducible science using a fully open infrastructure and datasets.

Public Health Relevance

Scientists use computer simulations of proteins, DNA, and RNA, at atomic detail, to learn how these molecules of life do their jobs. They also use simulations to help design new medications ? compounds that can bind and in?uence the behavior of these molecules of life, and thereby block diseases at the molecular level. We aim to greatly increase the utility of all of these simulations by improving the accuracy of the formulas they use to compute the forces acting between atoms.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM132386-01A1
Application #: 9887804
Study Section: Macromolecular Structure and Function D Study Section (MSFD)
Program Officer: Lyster, Peter

Project Start: 2020-03-01
Project End: 2024-02-29
Budget Start: 2020-03-01
Budget End: 2021-02-28
Support Year: 1
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Colorado at Boulder
Department: Engineering (All Types)
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 007431505

City: Boulder
State: CO
Country: United States
Zip Code: 80303

Related projects


NIH 2021 R01 GM	Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design Shirts, Michael R.; Chodera, John Damon / University of Colorado at Boulder
NIH 2020 R01 GM	Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design Shirts, Michael R.; Chodera, John Damon / University of Colorado at Boulder
NIH 2020 R01 GM	Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design Shirts, Michael R.; Chodera, John Damon / University of Colorado at Boulder
NIH 2020 R01 GM	Open Data-driven Infrastructure for Building Biomolecular Force Field for Predictive Biophysics and Drug Design Shirts, Michael R.; Chodera, John Damon / University of Colorado at Boulder

Comments

Be the first to comment on Michael Shirts's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: