The goal of this project is to create novel methods that help researchers gather and integrate existing data sets in order to better inform the design of future experiments. Ideally new experiments include some replication, fill in gaps and produce new knowledge, but this balance is hard to achieve when existing data sets are in different locations or organized in very different ways. Thus to achieve the goal, two aims are proposed: 1) develop methods for creating cohesive biological datasets from public experiments such that they are suitable for training computational models; 2) develop methods that indicate what experimental conditions should be used to collect new datasets so that are the most likely to yield important information about the study organism's biological properties, like structure and behavior. Achieving this goal will enable an understanding of important rules of life for organisms more efficiently and economically, by focusing on the experiments that give us the most value for the funds spent.

This exploratory project will focus on data arising from genome-wide transcriptional profiling methods (e.g. microarrays, RNA-Seq), building a computational foundation for later expansion. First, optimal data processing techniques for creating integrated compendia will be assessed, in order to select the best method for building training datasets for machine learning methods. Second, data-driven computational models will be trained on the data compendia and evaluated for success in describing and microbial behavior. Third, given the normalized compendia (in the transcriptomics data space) an optimal experimental design methodology will be prototyped, to recommend the best set of experiments to perform to yield the complete set of data needed to fit and test the biological model. The experimental design methodology will be benchmarked using synthetic data, and then evaluated by exploring the effect of design- recommended combinations of antibiotics and antiseptics (10 in all) on microbial behavior. This will be compared to the outcomes of experiments designed by methods currently used. Success metrics will focus on how quickly the required information in the experimental space is gathered and what level of uncertainty in a model remains after each experiment is completed.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1743101
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2017-09-15
Budget End
2020-08-31
Support Year
Fiscal Year
2017
Total Cost
$300,000
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618