Big data are increasingly encountered across society and all of the sciences and pose novel challenges for statistical analysis, due to their complexity and size. Challenging data analysis tasks that motivate this research originate in brain imaging, genomics, the social sciences and many other areas of current interest. To make sense of such data and extract relevant information requires principled statistical methodology that is suitable for the analysis of large samples of complex data. Examples include networks or age-at-death distributions for which common algebraic operations such as sums or differences are not defined. In many instances such data objects may also be repeatedly observed over time, and the quantification of their time dynamics is then of great interest. For example, one might be interested to determine whether sudden changes occur and where these are located in time. Statistical methodology will be developed that addresses these data analytic needs, along with theory and efficient computational implementations. This new methodology is expected to lead to substantial new insights. For example, it will be possible to quantify phenomena such as changes in temperature, mortality or income distributions over calendar years, or changes in brain connectivity networks as a function of age, which will aid in distinguishing normal and pathological brain aging. The new methodology will also make it possible to detect differences between groups of complex data, for example between the mortality distribution of countries, including the identification of clusters. The project also provides research training opportunities for undergraduate and graduate students.

The focus of this research is the development of statistical methods and theory for random objects, i.e., metric space valued random variables, including object-valued functional and longitudinal data. Due to the lack of Euclidean structure, existing methods from high-dimensional and functional data analysis are generally not applicable for metric-space valued random objects. This motivates the development of novel approaches that address the challenge of a lack of Euclidean structure. Major lines of inquiry will be regression and change-point models for random objects on one hand and methods for trajectories of random objects including complex functional data on the other. New regression and change-point models to be studied include distributions as predictors; regression models for point processes; inference and single index modeling for Frechet regression; and change-point analysis for sequences of object data under various scenarios. For object-valued functional data, an emphasis will be the development of time warping models for random objects and of models for longitudinal random objects in various spaces, including the case where the data are only sparsely and irregularly observed in time. Tools and theory for principled statistical analysis of random objects to be developed will rely on empirical process theory for M estimators in metric spaces, U statistics and related approaches. These developments will lead to the creation of a toolbox suitable for data analysis of object data and associated freely available software.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
2014626
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2020-07-01
Budget End
2023-06-30
Support Year
Fiscal Year
2020
Total Cost
$300,000
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618