The study of biomolecular interactions and design of new therapeutics requires accurate physical models of the atomistic interactions between small molecules and biological macromolecules. Over the least few decades, molecular mechanics force ?elds have demonstrated the potential that physical models hold for quantitative biophysical modeling and predictive molecular design. However, a signi?cant technology gap exists in our ability to build force ?elds that achieve high accuracy, can be systematically improved in a statistically robust manner, be extended to new areas of chemistry, can model post-translational and covalent modi?cations, are able to quantify systematic errors in predictions, and can be broadly applied across a high-performance software packages. In this project, we aim to bridge this technology gap to enable new generations of accurate quantitative biomolec- ular modeling and (bio)molecular design for chemical biology and drug discovery.
In Aim 1, we will produce a modern, open infrastructure to enable practitioners to rapidly and conveniently construct and employ accurate and statistically robust physical force ?elds via automated machine learning methods.
In Aim 2, we will construct open, machine-readable experimental and quantum chemical datasets that will accelerate next-generation force ?eld development.
In Aim 3, we will develop statistically robust Bayesian inference techniques to enable the auto- mated construction of type assignment schemes that avoid over?tting and selection of physical functional forms statistically just?ed by the data. This approach will also provide an estimate of the systematic error in predicted properties arising from uncertainty in parameters or functional form choices?generally the dominant source of error?to be quanti?ed with little added expense.
In Aim 4, we will integrate and apply this infrastructure to produce open, transferable, self-consistent force ?elds that achieve high accuracy and broad coverage for modeling small molecule interactions with biomolecules (including unnatural amino or nucleic acids and covalent modi?cations by organic molecules), with the ultimate goal of covering all major biomolecules. This research is signi?cant in that the technology developed in this project has the potential to radically transform the study of biomolecular phenomena by providing highly accurate force ?elds with exceptionally broad chemical coverage via fully consistent parameterization of organic (bio)molecules. In addition, we will produce new tools to automate force ?eld creation and tailoring to speci?c problem domains, quantify the systematic error in predictions, and identify new data for improving force ?eld accuracy. This will greatly improve our ability to study diverse biophysical processes at the molecular level, and to rationally design new small-molecule, protein, and nucleic acid therapeutics. This approach will bring statistical rigor to the ?eld of force ?eld construction and application by providing a means to make data-driven decisions, while enhancing reproducibility by enabling it to become a rigorous and reproducible science using a fully open infrastructure and datasets.
Scientists use computer simulations of proteins, DNA, and RNA, at atomic detail, to learn how these molecules of life do their jobs. They also use simulations to help design new medications ? compounds that can bind and in?uence the behavior of these molecules of life, and thereby block diseases at the molecular level. We aim to greatly increase the utility of all of these simulations by improving the accuracy of the formulas they use to compute the forces acting between atoms.