Big Data often results from multiple sources, giving collections that contain multiple, often partial, "views" of the same object, space, or phenomenon from various observers. Extracting information robustly from such data sets calls for a joint analysis of a large collection of data sets. The project is developing a novel geometric framework for modeling, structure detection, and information extraction from a collection of large related data sets, with an emphasis on the relationships between data. While this approach clearly applies to data with a clear geometric character (e.g., objects in images), the work is also applied to datasets as diverse as computer networks (identifying common structure in subnets) and Massive Open Online Course homework data (automatically carrying grader annotations to similar problems in other students' homeworks).
The novel framework is based on the construction of maps between the objects under considerations (point clouds, graphs, images, etc...), and on the analysis of the networks of maps that result as a way of extracting information, generating latent models for the data, and transporting or inferring functional / semantic information. These tasks define a new field of map processing between data sets and require tool sets with new ideas from functional analysis, non-convex optimization, and homological algebra in mathematics, and geometric algorithms, machine learning, optimization, and approximation algorithms in computer science. Sophisticated algorithmic techniques for attacking the large-scale non-linear optimization problems that emerge within the framework will also be investigated.