It is surprising how often geometric abstractions help us deal with understanding large systems: molecules become balls and sticks, complex fluid or combustion simulations are shown as contours or isosurfaces, and movies become points in a high dimensional space to allow recommendations based on which other points are near one's favorite movies. Computational Geometry, which develops efficient computer algorithms for problems stated in geometric terms, can thus play a central role in data analytics. Traditionally, the focus in Computational Geometry was on exact algorithms with guaranteed performance on all possible inputs, including worst-case inputs.

This project recognizes that many practical data analysis tasks do not generate worst-case instances, and seeks to identify structural aspects of given problems that allow existing or new algorithms with better guarantees than the worst-case bounds for realistic cases, often using approximation, probabilistic analysis, parameterized complexity, or output sensitivity.

Understanding the huge volume of data from a combustion simulation run on a super computer gives a 3d example: Contour trees, a data structure used to summarize interactions between density or temperature isosurfaces in a simulation, take more than linear time to compute in the worst case, but by parameterizing on tree shape one can show that trees that are balanced can be computed in linear time. Machine learning and clustering problems, like recommendation systems, give higher-dimensional examples in which one desires to extract a smaller and lower dimensional representation of the input, while preserving some feature of interest. A geometric form of this problem is known as extracting a coreset; in the worst case the coreset size can be exponential in the dimension. On real inputs however, there is often hidden low dimensional structure; rather than designing an algorithm whose running time depends on the worst case coreset size, the running time should adapt to the size required by the given instance.

Advancing non-worst-case analysis techniques helps bridge the gap between theory and practice, as there is often a disconnect between running times predicted by worst-case analysis and those seen on real data sets. The investigator will incorporate non-worst-case analysis techniques into his course curricula, as such techniques are essential yet severely lacking in standard algorithms courses. This project will also be used to support student research at the graduate as well as undergraduate levels on this topic.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
1566137
Program Officer
Joseph Maurice Rojas
Project Start
Project End
Budget Start
2016-05-01
Budget End
2019-10-31
Support Year
Fiscal Year
2015
Total Cost
$161,277
Indirect Cost
Name
University of Texas at Dallas
Department
Type
DUNS #
City
Richardson
State
TX
Country
United States
Zip Code
75080