While this is the age of big data, there is still a question of whether more data translates to more knowledge. Particularly when generating data is expensive or time consuming, as it is often the case with clinical trials and biomolecular experiments, the problem of identifying information-rich data becomes crucial for creating models that can reliably predict the outcome of future experiments. Few results have been published on the amount of necessary data, and currently there are no methods for generating specific data sets which would unambiguously identify a predictive model. This research project addresses fundamental mathematical and computational questions in data selection. The theoretical results will advance the fields of design of experiments and network inference through the determination of criteria for selecting data sets to uniquely identify models. The algorithms under development will serve as a guide for experimentalists in determining the data that are needed to identify the structure of a network of interest. Such knowledge has the potential to drastically reduce wasted resources that arise from too much data with too little information. Graduate students will participate at the appropriate level in each component of the project. Such an experience will provide possible topics for M.S. or Ph.D. dissertations and will very likely inspire career-long involvement of the participants in the STEM disciplines.

As a first step towards developing a complete theory, the PIs will focus on models described by finite-valued nonlinear polynomial functions. Finite-state multivariate polynomial functions have successfully been used to model complex networks from discretized data; however, few results have been published on the amount of data necessary for such models, with the majority applying to Boolean models only. The PIs will address the issue of the minimality and specificity of data to uniquely identify discrete polynomial models by developing the appropriate theory, implementing the theoretical results as algorithms, and applying the algorithms to important physical systems. The proposed work will also increase the utility of polynomial dynamical systems as models of complex networks by establishing the minimal amount of the data for unique model identification.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1419023
Program Officer
Leland Jameson
Project Start
Project End
Budget Start
2015-01-01
Budget End
2017-12-31
Support Year
Fiscal Year
2014
Total Cost
$100,000
Indirect Cost
Name
Southern Methodist University
Department
Type
DUNS #
City
Dallas
State
TX
Country
United States
Zip Code
75275