ATD: Detection of Clusters in Distributed Systems of Information under Dependence

Arias-Castro, Ery; Politis, Dimitris; Meyer, David

Abstract

The first part of the proposal deals with detecting correlations. A special case is the problem of testing whether the covariance matrix of a multivariate population is the identity matrix. The investigators will reconsider this classical problem assuming that, under the alternative, only a small fraction of the variables are correlated. The second part of the proposal is focused on determining which correlation structures are possible for Bernoulli random variables. The investigators propose to investigate such algebraic constraints on correlations between Bernoulli random variables under constraints on the correlation structure implied by homogeneity (stationarity) and isotropy, in the case of long range dependence. The third part is on applying resampling techniques to change point analysis of time series and random fields. This is particularly important in detection situations where the underlying distribution of the data is unknown. A recently introduced novel way to bootstrap time series, namely the linear process bootstrap, holds some promise and seems extendable to random fields.

The task of anomaly detection is quintessential in surveillance settings where the first step is to detect the presence of an anomaly. Applications range from detecting vehicles that transport hazardous radioactive or bioactive materials, target tracking, man-made object recognition from satellite images, and more. Beyond surveillance, similar detection problems arise in many other areas, such as the detection of fires for satellites, flu outbreaks in an urban area, or the detection of tumors in medical imaging. The investigators will focus on detecting unusual dependencies in the data, on modeling such dependencies and on engineering new ways of calibrating detectors based on carefully designed simulations.

Project Report

Intellectual Merit Several projects partially supported by this award address a wide array of situations, both real and stylized, having to do with dependencies (correlations) in data. Such dependencies in data are rather common, and understanding them, learning them and detecting them are relevant to a number of applications, in signal and image processing, but also other areas like environmental monitoring and socio-economic data in the form of (often multiple) time series. In more detail, one of the problems addressed is that of detecting correlations in data, for example, between several information streams. In some stylized setting (Gaussian assumption), optimal methodology is developed and analyzed. An interesting tradeoff between power and computational feasibility is uncovered and partially addressed. Another thrust is towards a better understanding of dependencies between categorical variables. In particular, data collected from spatially arranged sensors often have positive dependencies between nearby readings. One project addresses how to quantify associations between two categorical variables in a way that compensates for these spatial dependencies. Other projects focus on developing tools and theory for the resampling of time series and spatial data. Resampling is a general approach towards statistical inference that does not rely on overly restrictive (and often unverifiable) model assumptions; an example of a typically inappropriate assumption is the usual textbook assumption that the data have a Gaussian (bell-shaped) distribution. When the data are observed over different spatial locations and/or over time, the aforementioned difficulty of dependence (i.e., correlation) enters. An additional difficulty arises when these spatial locations and/or time events are unevenly spaced. In such a case of irregularly observed data, the spatial/time locations are typically modeled as the outcome of a random point process; at each of these points, a measurement ("mark") is observed, leading to the notion of a "marked point process". Methodology for the statistical analysis of data from a marked point process is addressed including the possibility of a resampling in this context. Broader Impact The various projects partially supported by this award are all motivated by a wide variety of data from such diverse fields as signal and image processing, environmental monitoring, socio-economic data, etc. Consequently, these contributions are likely to have an impact on areas such as satellite imaging, ecology, analysis of socio-economic data, health (e.g., monitoring of epidemics), telecommunications, etc. Impacts on science and technology have deep repercussions on society, and the areas that could potentially be affected by the research supported by this award are numerous and variety, as described above. Practically, this could mean more efficient ways of detecting epidemics, or fires from satellite images, or more accurate analyses of socio-economic data, leading to more effective policies and efficient use of tax-payer money, etc. The projects supported by this grant had concrete impact on the education of a younger generation of scientists. Concretely, one postdoc, several graduate (PhD) students, and two undergraduate students were involved in these projects. Two of the PhD students were partially supported by this award.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Type: Standard Grant (Standard)
Application #: 1120888
Program Officer: Leland Jameson

Project Start
Project End
Budget Start: 2012-03-01
Budget End: 2014-02-28
Support Year
Fiscal Year: 2011
Total Cost: $99,873
Indirect Cost

ATD: Detection of Clusters in Distributed Systems of Information under Dependence
Arias-Castro, Ery Politis, Dimitris Meyer, David
University of California San Diego, La Jolla, CA, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments