Software tools for reproducibly building biomodels

Karr, Jonathan

Abstract

Despite substantial effort, we cannot comprehensively predict the behavior of biological systems. Consequently, we cannot explain how genotype influences phenotype, design cells, or treat many diseases. Improved dynamical models are needed to understand biology and accelerate bioengineering and medicine. Model building is one of the bottlenecks to better models because our existing model building tools require extensive manual input and obscure the data and assumptions behind models. As a result, authors cannot precisely describe how they constructed models, readers cannot review this information, and models cannot be reproduced. This makes it hard to understand and extend models and, in turn, build accurate models. Recently, we piloted a method for transparently and reproducibly building whole-cell models from diverse genomic and other data. Further work is needed to extend and generalize this method for other domains. We will develop the first software tool for reproducibly and transparently building SBML-compatible dynamical biochemical models of intracellular pathways. The tool will include modules for aggregating model input data, organizing this data for model design, and designing models from this data. The tool will make model building reproducible by tracking every data source and assumption. We will use biochemical models as a test bed for developing broadly-applicable methods for reproducibly building biomodels. This approach will allow us to leverage the large amount of data available to build biochemical models, concretely test our ideas, and integrate our tool into the center's reproducible biochemical modeling workflow. To enable future support for other domains, such as multiscale modeling, electrophysiology, and ecology, we will make our tool as modular and extensible as possible. To ensure that our tool advances biomodeling, we will develop our tool in conjunction with several CPs which aim to develop whole-cell models of bacteria and human cells. These CPs will push us to develop practical tools for constructing models, and we will pull the CPs to construct models that are understandable, reusable, and extensible. To help researchers use our software, we will work with TR&Ds 2 and 3 to combine our software into a reproducible modeling workflow. We will also extensively document our software and distribute it open-source. In addition, as part of the Training and Dissemination Core, we will develop tutorials and organize workshops. We anticipate that our tool will help researchers build more predictive models, and we anticipate that these models will help scientists discover new biology by enabling them to perform unprecedented in silico experiments with complete control, infinite resolution, and unlimited scope; help physicians interpret personal genomic data and personalize therapy; and help bioengineers rationally design microorganisms for a wide range of industrial and medical applications such as detecting disease and synthesizing drugs.

Public Health Relevance

TECHNOLOGY RESEARCH AND DEVELOPMENT 1: PROJECT NARRATIVE Despite extensive effort, we do not have mechanistic dynamical models that predict phenotype from genotype. This project will develop the first software tools for transparently and reproducibly constructing dynamical biomodels. The tools will help researchers develop better models, including large whole-cell models, that could help scientists understand biology; help physicians interpret personal genomic data and personalize therapy; and help bioengineers design microorganisms for a wide range of industrial and medical applications such as detecting disease and synthesizing drugs.