The scientific challenge for this project is to accelerate discovery and exploration of the synthetic biology design space. In particular, many parts used in synthetic biology come from or are initially tested in a simple bacteria, E. coli, but many potential applications in energy, agriculture, materials, and health require either different bacteria or higher level organisms (yeast for example). Currently, researchers use a trial-and-error approach because they cannot find reliable information about prior experiments with a given part of interest. This process simply cannot scale. Therefore, to achieve scale, a wide range of data must be harnessed to allow confidence to be determined about the likelihood of success. The quantity of data and the exponential increase in the publications generated by this field is creating a tipping point, but this data is not readily accessible to practitioners. To address this challenge, our multidisciplinary team of biological engineers, machine learning experts, data scientists, library scientists, and social scientists will build a knowledge system integrating disparate data and publication repositories in order to deliver effective and efficient access to collectively available information; doing so will enable expedited, knowledge-based synthetic biology design research.

This project will develop an open and integrated synthetic biology knowledge system (SBKS) that leverages existing data repositories and publications to create a single interface that transforms the way researchers access this information. Access to up-to-date information in multiple, heterogeneous sources will be provided via a federated approach. New methods based on machine learning will be developed to automatically generate ontology annotations in order to create connections between data in various repositories and information extracted from publications. Provenance for each entity in SBKS will be tracked, and it will be utilized by new methods that are developed to assess bias and assign confidence scores to knowledge returned for each entity. An intuitive, natural-language-based interface and visualization functionality will be implemented for users to easily access and explore SBKS contents. Additionally, as ethics is necessarily a part of synthetic biology research, data from text sources related to ethical concerns in synthetic biology will also be incorporated to inform researchers about ethical debates relevant to their search queries. Finally, to test the SBKS API, a new genetic design tool, Kimera, will be developed that leverages the knowledge in SBKS to produce better designs. The proposed SBKS will accelerate discovery and innovation by enabling researchers to learn from others' past experiences and to maximize the productivity of valuable experimental time on testing designs that have a higher likelihood of working when transformed to a new organism. This research thus provides the potential for transformative research outcomes in the field of synthetic biology by leveraging data science to improve the field's epistemic culture. For more information please see https://synbioks.github.io.

This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity, and is jointly supported by the HDR and the Division of Biological Infrastructure within the NSF Directorate of Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1939885
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2019-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2019
Total Cost
$271,519
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093