Large-scale applications distributed across thousands of machines over the wide area network are difficult to build and maintain. To enable sharing of data across machines, many applications use specialized storage and data transfer tools such as a DHT, scp or GridFTP. Although file systems are successful in becoming a common building block for cluster applications, it remains unclear if file systems could provide similar benefits to wide area distributed applications. This research describes a novel wide-area file system, WheelFS, that allows distributed application developers to use a generic file system interface to store and share application data easily among wide-area machines.
Two new approaches make WheelFS attractive for use by distributed applications. First, WheelFS provides semantic cues for application developers to express desired tradeoffs among failure resilience, data consistency and file system performance at the granularity of individual files and directories. Second, WheelFS optimizes wide area data transfer by writing application data to local disks and reading a cached copy from a nearby machine whenever possible.
This project demonstrates the uselessness of WheelFS via the experience of building a number of distributed applications, such as a cooperative web cache, a data-intensive Grid application, a distributed digital library and a PlanetLab measurement utility.