Technological advances allow for the collection of massive data in the study of complex phenomena over time and/or space in various fields. Many of these data involve sequences of high dimensional or non-Euclidean measurements, where change-point analysis is a crucial early step in understanding the data: Segmentation or offline change-point analysis divides data into homogeneous temporal or spatial segments, making subsequent analysis easier; its online counterpart detects changes in sequentially observed data, allowing for real-time anomaly detection. Traditional change-point analyses primarily focus on univariate measurements. There is some literature on multivariate data, but very little on object data. This project considers both offline and online change-point analysis for multivariate and object data, for instance, for temporal analysis of multiple sensor systems, images, and social networks.

The proposed methods and corresponding theory build on previous work of the PI, which adapts nonparametric graph-based two-sample tests to the segmentation problem. The PI has shown that the graph-based approach scales flexibly to high dimensional and object data, and allows for a universal analytic permutation p-value approximation that is decoupled from application-specific modeling. Despite this recent development, many challenges remain. This project identifies these challenges, formulates them into approachable frameworks, and develops appropriate methods and theoretical treatments. In particular, this project will (1) study more sensitive distance-based tests for testing equality of distributions in high dimensional or in non-Euclidean spaces, which will be adapted to the change-point testing and estimation problem, resulting in a more sensitive and accurate detection of general changes; (2) address methodological and theoretical issues in extending the nonparametric graph-based framework on the offline case to the online scenario; and (3) extend graph-based segmentation and online detection to a circular block permutation framework, enabling them to work for multivariate and object data with weak local dependence.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1513653
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2015-07-01
Budget End
2019-06-30
Support Year
Fiscal Year
2015
Total Cost
$341,765
Indirect Cost
Name
University of California Davis
Department
Type
DUNS #
City
Davis
State
CA
Country
United States
Zip Code
95618