CRI-CI-ADDO-EN: National File System Trace Repository

One of the most effective ways to study computer file systems is through the use of traces, which are records of the actual operations that a computer performed on a hard disk or a set of files. For example, a trace might show that a user launched a word processor on a document, saved it three times over a period of 20 minutes, used a Web browser to retrieve an image from the Internet, inserted that image into the document, and saved it a fourth time. By analyzing traces of this sort, researchers can determine how to design file systems to provide users with optimal performance and reliability. In addition, researchers can replay a trace to reproduce the activity generated by a live user, allowing them to test new designs without the expense and difficulty of experimenting with real subjects.

However, collecting traces is itself a challenging activity, so once a researcher has acquired a trace, it is desirable to share it with others. Trace sharing also allows diverse researchers to test their systems under consistent, reproducible conditions, so that different ideas can be compared and evaluated fairly. Unfortunately, there has never been a reliable source of traces, so that researchers have often been forced to use ad hoc methods to test their systems, or to use outdated traces that aren't representative of modern computer systems.

To alleviate these difficulties, we are building a national repository of file system traces. The repository is designed so that it can eventually hold every trace that has ever been collected, both historical and modern, and so that it will be easy for researchers to upload, download, share, analyze, and replay any trace in the collection. A standardized format will make traces easy to manipulate, and software tools for that purpose will be made available to researchers.

In the current project, NSF has provided seed funding to allow Harvey Mudd College and its collaborators to create a prototype of the repository, demonstrate its feasibility, and show that researchers will find it useful.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Network Systems (CNS)
Type
Standard Grant (Standard)
Application #
0855238
Program Officer
Krishna Kant
Project Start
Project End
Budget Start
2009-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2008
Total Cost
$100,000
Indirect Cost
Name
Harvey Mudd College
Department
Type
DUNS #
City
Claremont
State
CA
Country
United States
Zip Code
91711