Large and complex data that are increasingly collected across the sciences and by companies pose novel challenges for statistical analysis, due to their complexity and size. Specific applications that motivate this research come from brain imaging, genomics, and the social sciences. To make sense of such data and extract relevant features, statistical methodology that is suitable for the analysis of large samples of complex random objects is needed. Examples of these objects include networks, distribution functions, and covariance matrices. A challenge is that common algebraic operations such as sums or differences are not defined for such objects. In many instances, objects may also be repeatedly observed over time, and the quantification of their time dynamics is then of interest. In this project, statistical methodology that addresses these basic data analytic needs is developed under minimal assumptions. These developments also include the theoretical foundations of this methodology and computational implementations. This methodology is expected to lead to new insights by quantifying phenomena such as changes in mortality or income distributions over calendar years, or changes in brain connectivity networks with aging to allow researchers to distinguish normal and pathological aging processes. Procedures are also developed to test for significant differences between groups of random objects, for example, comparisons between mortality distributions of countries, including the identification of clusters. The methodology to be developed is based on delicate extensions of basic statistical notions such as population and sample mean, variance, regression and analysis of variance to the case of more complex spaces of random objects.

Over the past decade, there have been rapid advances and substantial developments for functional data, including advanced methods for functional regression. The developments and methodology for Functional Data Analysis are limited to Hilbert space valued random variables, such as square integrable random functions, which limits their applicability. This research is motivated by the increasing prevalence of examples where random objects are not in a Hilbert space. Key objects of interest are distributions, networks and covariance matrices, in addition to general metric space valued random objects. Core concepts that will be applied and appropriately extended to these random objects include Frechet mean, Frechet variance, and Frechet regression. For longitudinally observed random objects, the notion of a general Frechet integral will serve to quantify projections in general spaces. Such projections will be studied for their use in representing time-varying random objects. The tools that will be developed are based only on distances, and are therefore suitable for general metric space valued objects. For special classes of objects such as distributions, additional characterizations such as manifold representations and Wasserstein covariance will also be developed and illustrated in applications.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1712864
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2017-08-01
Budget End
2020-07-31
Support Year
Fiscal Year
2017
Total Cost
$150,000
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618