While modern technologies generate multiple, interconnected sources of data, which can be observed in nearly real time, physical constraints and budget limitations require efficient sampling of these data streams. Incorporating such constraints in the design of threat detection systems has the potential to lead to significant savings in resources. The goal of this project is to develop fundamental statistical theory and methods for the efficient, real-time sampling of networks, and the subsequent detection, identification and prevention of threats of different nature, such as terrorist activities and credit fraud. The developed methodologies will be tested on real-world data, where the underlying community structure in times of crisis will be discovered using cell-phone call records. The theoretical knowledge that will be gained in this project will be incorporated into the material of graduate-level courses that cover adaptive experimental design and sequential detection. Two graduate students will contribute significantly in this research. The project will make every effort to include qualified students of underrepresented groups in these research activities.
This research will address two fundamental research questions: 1) how to detect and identify, in real time, anomalous clusters in network data subject to sampling constraints, and 2) how to efficiently allocate limited resources in order to delay or prevent the realization of threats from the identified anomalous clusters. The mathematical formulation of these questions leads to novel problems in network-based, adaptive experimental design and sequential detection, whose solutions require the creative combination of tools from various fields, such as statistical inference, sequential analysis, and information theory. The theory and methods developed in this work will guide the development of threat detection algorithms and will be tested in concrete applications, such as social networks in times of crisis that will be discovered based on cell-phone call data. Overall, this is a multidisciplinary proposal, spanning social sciences, statistics, and engineering, whose goal is to obtain an arsenal of efficient network sampling schemes and novel threat detection algorithms, grounded on a strong theoretical background.