Sampled traffic data has been increasingly used as input for anomaly
detection systems, as the high link speeds make it impossible to
examine each and every packet. This raises an important question of
whether sampling has a (negative) impact on the accuracy/effectiveness
of anomaly detection, and if so how to mitigate this effect.
Intellectual Merit: This project systematically studies the question
mentioned above from the following three angles. First, we will
identify traffic features that are critical for a wide range of
anomaly detection schemes and quantify how much they are distorted by
various sampling schemes. Second, we will design new sampling or
measurement techniques that preserve enough accuracy to support
effective anomaly detection, while being cost-effective and
light-weight. Third, we will study how to correlate the NetFlow
samples obtained at the edge routers with the information-rich data
generated using existing data streaming algorithms, for much better
anomaly detection than pure sampling. The new scientific knowledge
learned through this research will provide us with much better
technologies to monitor large high-speed networks for anomalous
behaviors.
Broader impact: The results will be broadly disseminated through
publications, invited talks and tutorials, and open-sourcing of
software developed for this project. The PIs' collaboration with
tier-1 ISP's will facilitate the transfer of technology from research
environment to actual managing of production networks. Research
results will be incorporated into information security curriculum.
Both PIs have been actively engaging under-represented groups in
research and higher education and will continue and expand these
efforts.