This project, developing a shared network measurement analysis and storage infrastructure called the Datapository, aims at providing a common platform of data analysis and management tools. The instrument serves as a research platform for creating a larger-scale, publicly accessible measurement analysis and storage infrastructure. Collection and analysis of data from real deployments critically challenges the network community, as well as experiments driven by such data. This work aims at reducing the substantial administration time and costs associated with management large amounts of data needed by researchers by building a shared infrastructure from off-the-shelf components, and consequently facilitating the following research efforts: -Creating Internet-scale forensic analysis architectures, -Understanding and improving Internet routing, -Designing and evaluating highly available network architectures, -Evaluating novel data transfer architectures, -Testing worm and intrusion detection algorithms on large network trace collections, and -Enabling several educational outreach projects.
These efforts face a significant challenge of data management, organization, and analysis, requiring substantial hardware and software infrastructure to store and analyze terabytes of network measurement data. The Datapository includes database configuration and setup, schema optimization, data organization and classification, data distribution, hardware and operating system configuration, and the creation of a code to perform basic filtering and processing of the data. The measurement and analysis infrastructure will serve as the base prototype for the development of a large-scale publicly accessible network data repository.