As molecular-based computer simulations of both naturally occurring and man-made (synthetic) materials become increasingly used to predict their properties, the reproducibility of these simulations becomes an increasingly important issue. These simulations are complex, require large amounts of computer time, and are usually performed manually - i.e., put together one at a time, from all the components that go into such a simulation, including the models for how molecules interact with each other (known as forcefields). In addition, there has been much interest in being able to perform such computational simulations on large sets of different but related systems in order to screen for desirable properties, leading to the discovery of new materials and their incorporation into applications twice as rapidly and at half the cost of existing, primarily experimental, methods. This ambition is the basis for the national Materials Genome Initiative (MGI), making reproducibility even more important. In this project, nine research groups from eight universities are combining their expertise to create a software environment, called the Molecular Simulation Design Framework (MoSDeF) that will enable the automation of molecular-based computer simulations of soft materials (such as fluids, polymers, and biological systems) and will enable MGI-style screening of such systems. MoSDeF is open source and the use of MoSDeF will enable reproducibility in molecular-based computer simulations, because all simulation steps, all input data, and all codes used will be publicly accessible to anyone to reproduce a published simulation. MoSDeF will contribute to reproducibility through standardization and maintaining the provenance of forcefields, one of the most common sources of irreproducibility in molecular-based simulations.
Reproducibility in scientific research has become a prominent issue. Computational scientists, along with the rest of the scientific community, are grappling with the central question: How can a study be performed and published in such a way that it can be replicated by others? Answering this question is essential to the scientific enterprise and increasingly urgent, as reproducibility issues faced in small-scale studies will only be compounded as researchers look to harness the ever expanding computational power to perform large-scale Materials Genome Initiative (MGI) inspired screening studies, thus growing the number of simulations by orders of magnitude. Addressing the issues of reproducibility in soft matter simulation is particularly challenging, given the complexity of the simulation inputs and workflows, and the all-to-common usage of closed-source software. In this proposal, nine leading research groups (from Vanderbilt, U Michigan, Notre Dame U, U Delaware, Boise State U, U Houston, Wayne State U, and U Minnesota), representing a broad range of expertise, and an equally broad range of science applications, simulation codes, algorithms and analysis tools, along with computer scientists from Vanderbilt's Institute for Software Integrated Systems (ISIS), are committing to invest their expertise and capabilities to transform the mindset of molecular simulationists to perform and publish their simulations in such a way as to be Transparent, Reproducible, Usable by others, and Extensible (TRUE). Most of the investigators are recent or current holders of grants from the software program (i.e., S2I2, SSI or SSE grants); thus, the project builds upon, and brings synergy to, an existing large investment in molecular simulation software by NSF. To drive the community towards performing simulation that are TRUE, new software tools to facilitate best practices will be developed. Specifically, this will be achieved by expanding the capabilities of the open-source molecular simulation design framework (MoSDeF), which was initiated at Vanderbilt with support from two NSF grants. MoSDeF is a modular, scriptable Python framework that includes modules for programmatic system construction, encoding and applying force field usage rules, and workflow management, allowing the exact procedures used to setup and perform a simulation to be capture, version-controlled, and preserved. Continued development of the existing MoSDeF modules will be performed to support a wider range of force fields, molecular models, and open-source simulation engines. The creation of a plugin architecture for community extension, and the development of new modules for force field optimization, free energy calculations, and screening, will further allow MoSDeF can achieve these goals.
This project is supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer & Information Science & Engineering and the Division of Materials Research and the Division of Chemistry in the Directorate of Mathematical and Physical Sciences.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.