The project will study and compare various methods to detect clusters or `hot spots'. It will establish rigorous results about the average likelihood ratio statistic, which has recently been claimed on empirical grounds to be superior to the scan statistic. The investigator will derive guidelines for deciding when one is preferable to the other. The project will also examine how various approximation schemes affect the performance of the average likelihood ratio statistic in terms of power and computational complexity.

The problem of detecting spatial clusters or `hot spots' has received considerable attention in recent years, due to emerging important problems in various areas such as biosurveillance, the detection of radioactive materials, or the detection of illicit container shipments. Recent empirical findings suggest that the statistic that is commonly used for these purposes is suboptimal and can be improved upon by a different criterion. This project will perform a rigorous mathematical investigation of this empirical finding and will derive guidelines for deciding in which cases one methodology is preferrable to the other.

Project Report

The project investigated several statistical methodologies to detect rare events, such as evidence of a bioterrorism attack or certain patterns in DNA sequences. The focus was on theoretical properties of these procedures, i.e. whether they possess optimal detection properties, as well as on computational issues, i.e. whether it is possible to evaluate these procedures in a fast way - an essential property given the very large data that these procedures operate with. One finding of the project is that the scan, a popular technique that is commonly used for this type of problem, does not possess optimal detection power for certain events. However, it was also shown how a simple modification of the scan leads to optimal detection. The project then investigated another detection methodology, the average likelihood ratio. It was shown that this methodology is also not optimal, but the shortcoming arises for different types of events than in the case of the scan. Interestingly, it turns out that optimality of the average likelihood ratio can be restored in a way that also leads to a computationally efficient procedure, i.e. the methodology can be run on a computer in a time that is known to be essentially optimally short. The project also investigated computationally optimal procedures for the scan. Unlike the case of the average likelihood ratio, it turns out that for the scan optimal detection and optimal computation are not intrinsically linked, i.e. they have to be addressed separately. The results of the project show how to do this, and the resulting methods are quite straightforward and easy to implement.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007722
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2010
Total Cost
$256,601
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305