This proposal studies statistics of large data sets. Such data sets typically arise in applied and foundational probability theory and are the starting point for questions in (topological) data analysis, network theory, as well as surface estimation. How does one use a data set to infer properties about a deterministic surface? How does one use a data set to draw conclusions about the objects generating the data? What is the size, dimension, and entropy of the object producing the data? Does the data have `extreme' points and outliers, and if so, how many? What are the geometric properties of random networks which model real world phenomena? Are the networks connected and are they `efficient'? Up to now, the study of large data sets has largely assumed spatial independence of the underlying point sets, where already the questions in this setting are as challenging as they are important. We propose to study such models, but also the more realistic models where points are not assumed independent. This would encompass structures frequently encountered in physics, computer science, and operations research.
Questions arising in stochastic geometry and applied geometric probability are often understood in terms of the behavior of statistics of large random geometric structures, where `large' means that the randomness involves a growing number of random variables. Problems involving these structures involve understanding the behavior of spatially dependent terms having short range interactions, but complicated long range dependence. Random geometric structures arise in diverse settings and include these fundamental examples: (i) Point processes of dependent points, including those with determinantal, Gibbsian, or Markov random field structure, zeros of Gaussian analytic functions and zeros of solutions of the stochastic Burgers' equation, (ii) Simplicial complexes in topological data analysis, (iii) Geometric networks and geometric graphs on random vertex sets, including those arising in data fusion networks and nearest neighbor graphs used in discerning intrinsic dimension and entropy of data clouds, (iv) Random surfaces which consistently estimate a target surface, (v) Random polytopes generated by random data. Properties of polytopes generated by a large collection of random variables are of interest in convex geometry, average complexity of algorithms, optimization, and extreme statistics, and (vi) Spatial birth growth models, random sequential adsorption models. This proposal studies statistics of the above-mentioned large structures. With large input, one may reasonably draw conclusions about the typical or average behavior of a large number of interesting statistics. This includes finding mean and variance asymptotics as well as central limit theorems.