Over the past decade, systems and synthetic biology approaches provided novel mechanism to enhance the production of diverse chemicals and biofuels from renewable resources in laboratory settings. However, it is still rare for synthetically modified strains to meet the production requirement for commercialization. Strain development falls into the tedious and costly design-build-test-learn cycle because existing modeling approaches failed to capture the complicated metabolic responses in such engineered cells. This proposal will explore an alternate, data-driven approach that has the potential to predict the productivity of synthetic organisms by leveraging the vast array of microbial cell factory publications. Using Artificial Intelligence approaches such as Machine Learning and Knowledge Representation, one can abstract "previous lessons'' hidden in published data to facilitate a priori estimations of the metabolic output by engineered hosts given a set of specific genetic instructions and fermentation growth conditions. The resulting platform can assist current constraint-based models to design the most effective strategies for producing value-added chemicals. On the educational front, this proposal will offer educational and research training opportunities in synthetic biology, computer programming, and artificial intelligence for graduate students to provide them with a non-conventional career pathway.

Synthetic biology relies on extensive genetic modification and pathway engineering, which often result in unexpected physiological changes or metabolic shifts that reduce the productivity and stability of the hosts. The investigators conceived of a creative, multidisciplinary approach that relies on artificial intelligence-inspired methods for predicting the performance of two distinct unicellular cell factories (Escherichia coli and Saccharomyces cerevisiae). These platforms can be used to quantify the factors that govern microbial productivity (yield, titer, and growth rate), including the type and availability of metabolic precursors; the elements that constitute a biosynthetic pathway; fermentation conditions; and the specific genetic modification to optimize the system. By extracting and classifying information derived from referenced publications within the last 20 years, one can construct a ''knowledge base'' containing sufficient samples of bio-production assemblies. This information will then inform the building of cellular factories using supervised machine learning and non-monotonic logic programming to estimate the productivity of hosts. The data-driven platform will also be integrated into genome scale models to project physiological changes of specific mutant strains. This novel approach will reduce the need for costly design-build-test bench work. Key outcomes from this project include: (1) a database to standardize synthetic biology studies, (2) machine learning models to recognize lessons and patterns hidden in published data, and (3) integration of machine learning with flux balance models, leading to the design of strains with high chances of success in industry settings.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2017-10-05
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$230,672
Indirect Cost
Name
Iowa State University
Department
Type
DUNS #
City
Ames
State
IA
Country
United States
Zip Code
50011