Molecular simulation is a powerful tool to predict the properties of biomolecules, interpret biophysical experiments, and design small molecules or biomolecules with therapeutic utility. However, a number of obstacles have impeded the development of quantitative, cloud-scale research work?ows involving biomolecular simulation. Two main ob- stacles are the insuf?cient accuracy of current atomistic models for biomolecules and small molecule therapeutics and the lack of interoperability in simulation toolchains used in both academic and industrial biomolecular research. Our original R01, ?Open Data-driven Infrastructure for Building Biomolecular Force Fields for Predictive Bio- physics and Drug Design,? seeks to solve the ?rst problem. It helps fund our effort, the Open Force Field Initiative (https://openforce?eld.org) to develop open, extensible, and shared software and data infrastructure, implementing statistically robust methods of parameterizing force ?elds and choosing new force ?elds in a statistically sound manner. This work is designed to create not just a new generation of force ?elds, but an open technology to continue advancing force ?eld science. However, even with improved molecular models, putting together complete work?ows of biomolecular simulations involves interfacing substantial numbers of different tools. However the majority of the existing molecular simulation work?ows are mutually incompatible, with differing representations of the molecular models. The Open Force Field Initiative effort already includes the development of molecular data structures that we can ex- port into existing molecular simulation tools. We propose to extend the existing scope of our R01 to create an extensible common molecular simulation representation and translators to and from this representation. Such a set of tools will immediately make it signi?cantly easier to combine the disparate work?ows developed for different sets of molecular simulation tools. Researchers will be able to set up and build the biophysical simulations using their usual tools, but run and analyze them with currently incompatible tools, enabling better matching of computational resources and methods to problems. It will help avoid trapping in a single software framework, and enable combinations of functionalities previously impossible without substantial developer time and effort. We will (Aim 1) work with partners to generalize our modular, extensible object model for representing parameterized biomolecular systems in a manner that accommodates the force ?eld terms currently supported by most popular biomolecular simulation packages. We will engineer it to be extensible to advanced interaction forms, such as polarizability and other multibody terms, and machine learning models for intermolecular forces. We will (Aim 2): enable easy conversion between components of molecular simulation work?ows by allowing other molecular simulation packages to easily store their representations in this data model, developing converters that can import/export this object model to multiple popular ?le formats, focusing initially on OpenMM, AMBER, CHARMM, and GROMACS. We will demonstrate the utility of this interface in cloud-ready work?ows.
Scientists use computer simulations of proteins, DNA, and RNA, at atomic detail, to learn how these molecules of life carry out their functions and to design new medications. We aim to greatly increase the utility of all of these simulations by improving the accuracy of the formulas they use to compute the forces acting between atoms. This supplement will make it much easier for molecular simulation work?ows to interoperate with each other in large-scale work?ows.