Across science, engineering, medicine and business we face a deluge of data coming from sensors, from simulations, or from the activities of myriads of individuals on the Internet. Furthermore, the data sets we collect are frequently highly inter-correlated, reflecting information about the same or similar/related entities in the world, or echoing semantically important repetitions/symmetries or hierarchical structures common to both man-made and natural objects. This project will assist scientists and engineers working with correlated data sets in getting the most information and value out of their data. Key to the approach is the idea of joint data analysis, the notion that each piece of data is best understood not in isolation but in the context provided by its peers and partners in a collection of related data sets, using the web of relationships referred to above. The key aim is to complement the social networks of scientists and engineers as they exist today with parallel networks that interlink the data they base their work on, using domain-specific semantic links and aiming at mechanisms that allow algorithmic transport of information between data used by scientists working in the same domain. The resulting system amplifies scientific insights by allowing an observation of one scientist on one piece of data to automatically be transported to other relevant data sets and aggregated and also enables the automated discovery of shared structures or common abstractions that can inform multiple data sets.

In order to accomplish this joint analysis this project interconnects data sets into networks along which information can be transported and aggregated. These data set links are based on efficient matching algorithms using domain-specific features. In the associated setting, these matching or maps are used not to estimate distances or similarities but to build operators that can transport information between different data sets. The research team will exploit a functional analytic framework that allows for encoding of information as functions over the data and leads to linear operators for mapping, enabling the use of many powerful tools from linear algebra and optimization. Using inspiration from homological algebra, this team will join multiple related data sets into networks connected through these operators in a way that allows information transport, correction, and aggregation, with the ultimate goal of using the "wisdom of the collection" to provide as much information as possible for specific data sets to specific scientists.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1521608
Program Officer
Christopher Stark
Project Start
Project End
Budget Start
2015-09-15
Budget End
2018-08-31
Support Year
Fiscal Year
2015
Total Cost
$140,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305