Scientific network research relies heavily on sound, empirical analysis of real-world network traffic. It is often not possible to robustly validate a proposed mechanism, enhancement, or new service without understanding how it will interact with real networks and real users. Yet obtaining the necessary raw measurement data?in particular packet traces including payload?can prove exceedingly difficult. Simply put, the lack of public access to current, representative datasets significantly hinders the progress of our scientific field. Not having appropriate traces for a study can stall the most promising research. There have been extensive efforts by the community at large to change the status quo by providing collections of public network traces. However, the community?s major push to encourage institutions to release anonymized data has achieved only very limited successes. The risks involved with any release still outweigh the potential benefits in almost all environments. The lack of significant progress in this direction?despite extensive efforts?is an undeniable indication that the community needs a new approach. Towards this end, the PIs are developing in a systematic fashion a scheme that has been used informally numerous times over two decades of network research: rather than bringing the data to the experimenter, bring the experiment to the data. Past studie have packaged up an analysis for execution by somebody external who had the privileges to access network traffic out of our reach. These people crunched the traffic with our scripts and then manually verified that the output did not leak any sensitive information before passing it on to us. The PIs are establishing such mediated trace analysis as a standard approach to empirical network research. The aim is to formalize the process sufficiently to facilitate researchers tapping into a potentially broad pool of providers willing to mediate access to traces for research studies. Several large-scale network environments have already confirmed to us that they consider this model a feasible approach, and are willing to participate. The main challenge to overcome is the burden the process imposes on trace providers and on the research ?development cycle?. The basic tenet is that it possible to greatly improve many of the tedious mediation steps by devising a systematic framework that accounts for the legitimate concerns of providers while reducing their effort to such a degree that it becomes practical for them to provide mediated trace analysis on a routine basis.
The key challenge is to automate the common steps of the mediation process without compromising the core requirement of the trace provider maintaining thorough control over the process. Starting with carefully examining the threats that arise, the PIs are devising a formal framework for trace mediations that will include a computational model specifically tailored to the process? unique requirements, along with a powerful suite of tools to provide extensive support for the different elements of the undertaking.
The mediation approach has the potential to broadly improve how scientists tackle network measurement studies?both opening up access to a far greater range of empirical data than is currently viable, and instilling a greater degree of scientific rigor into the process of conducting such research. By making empirical data available to many more teams of researchers than occurs today, this work will significantly broaden efforts within the field.