Offline or online time series data often involve change points due to the dynamic behavior of the monitored systems. Identifying change points from offline time series data makes parameter estimation and statistical inference efficient by pooling homogeneous observations. Detection of change points from online time series data provides timely snapshots of the monitored system and allows for real-time anomaly detection. Despite its importance, methods available for detecting change points in large-scale offline and online time series data are scarce. This is because a large number of parameters cannot be estimated accurately with a limited number of observations, and parametric models do not fully capture multifarious aspects of data dependence. This project will develop new non-parametric change-point detection methods that incorporate both spatial and temporal dependence without imposing restrictive structural assumptions on large-scale time series data. The proposed methods will span a wide range of topics in applications, including identifying significant genes associated with certain diseases, studying dynamic functional connectivity in resting-state functional magnetic resonance imaging data, and detecting abrupt events such as dissociation of communities, or formation of new communities from social networking platforms. This project will integrate research and education by involving students at different levels, including those from underrepresented groups, and by training the pre-college and high school teachers to improve their knowledge in statistics through new developed courses. The developed methods will be disseminated to biomedical and social scientists through interdisciplinary collaborations and the analysis of first-hand datasets.

This project will develop a general factor model framework for spatial and temporal dependence of large-scale time series data. By integrating the framework, this project will provide hypothesis testing and offline change-point estimation of specific parameters, including the population mean and covariance matrix. The proposed methods can be readily modified to incorporate the advantages of both sum-of-squares-norm and max-norm statistics for hypothesis testing. They can be extended from regular binary segmentation methods to other popular change-point estimation methods, such as circular binary segmentation and wild binary segmentation. This project will also provide new stopping rules for online change-point detection of large-scale time series data. An explicit expression for the average run length (ARL) will be derived, so that the level of threshold in stopping rules can be easily obtained with no need to run time-consuming Monte Carlo simulations. The proposed research will derive an upper bound for the expected detection delay (EDD), the expression of which clearly demonstrates the impact of data dimensionality and dependence. This project will extend the current knowledge about change-point detection. For offline change-point detection, the PI will study the possibility of estimating the change point near the boundary in high dimensional settings. For online change-point detection, a comparison will be made between the stopping rule based on the sum-of-squares-norm statistic and the one based on the max-norm statistic, through the derived ARLs and EDDs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1916239
Program Officer
Huixia Wang
Project Start
Project End
Budget Start
2019-09-01
Budget End
2022-08-31
Support Year
Fiscal Year
2019
Total Cost
$67,033
Indirect Cost
Name
Kent State University
Department
Type
DUNS #
City
Kent
State
OH
Country
United States
Zip Code
44242