It is surprising how often geometric abstractions help us deal with understanding large systems: molecules become balls and sticks, complex fluid or combustion simulations are shown as contours or isosurfaces, and movies become points in a high dimensional space to allow recommendations based on which other points are near one's favorite movies. Computational Geometry, which develops efficient computer algorithms for problems stated in geometric terms, can thus play a central role in data analytics. Traditionally, the focus in Computational Geometry was on exact algorithms with guaranteed performance on all possible inputs, including worst-case inputs.
This project recognizes that many practical data analysis tasks do not generate worst-case instances, and seeks to identify structural aspects of given problems that allow existing or new algorithms with better guarantees than the worst-case bounds for realistic cases, often using approximation, probabilistic analysis, parameterized complexity, or output sensitivity.
Understanding the huge volume of data from a combustion simulation run on a super computer gives a 3d example: Contour trees, a data structure used to summarize interactions between density or temperature isosurfaces in a simulation, take more than linear time to compute in the worst case, but by parameterizing on tree shape one can show that trees that are balanced can be computed in linear time. Machine learning and clustering problems, like recommendation systems, give higher-dimensional examples in which one desires to extract a smaller and lower dimensional representation of the input, while preserving some feature of interest. A geometric form of this problem is known as extracting a coreset; in the worst case the coreset size can be exponential in the dimension. On real inputs however, there is often hidden low dimensional structure; rather than designing an algorithm whose running time depends on the worst case coreset size, the running time should adapt to the size required by the given instance.
Advancing non-worst-case analysis techniques helps bridge the gap between theory and practice, as there is often a disconnect between running times predicted by worst-case analysis and those seen on real data sets. The investigator will incorporate non-worst-case analysis techniques into his course curricula, as such techniques are essential yet severely lacking in standard algorithms courses. This project will also be used to support student research at the graduate as well as undergraduate levels on this topic.