CSR: Medium: Collaborative Research: Facets: Exploring Semantic Equivalence of Files to Improve Storage Systems

Reiher, Peter

Abstract

The focus of the proposal is on finding semantically equivalent files in an efficient and scalable manner. If two files are identical, they are candidates for optimizations to reduce storage costs, increase performance, and generally improve the system. Traditionally, two files are only considered equivalent if they are byte-by-byte identical - i.e., byte equivalence. However, this team's preliminary research shows that there are many other files that are essentially equivalent, even though the bytes they contain are not the same. This proposal will investigate how to find such cases and perform optimizations that make use of semantic equivalence, rather than byte equivalence.

This project will design and implement a framework, Facets, which explores new capabilities by applying optimizations to files that are essentially transformed versions of each other. Many optimizations and improvements can be applied to semantically equivalent files, including:

-Ensuring that the security of semantically equivalent files is preserved -Easing backup and maintenance of semantically equivalent files in various formats, fidelities, and versions -Using semantically equivalent files to improve performance, reliability, and availability -Regenerating semantically equivalent files to speed up recovery and network transfer -Selecting which semantically equivalent files to access according to performance or energy constraints

This team's preliminary research shows that 5% of files on a typical user's machine are original content. The rest are copies of files from elsewhere or various derivatives of original content. While leveraging this observation to achieve advantages is not trivial, many significant improvements are possible if one can find these relationships and make proper use of them. These improvements include enhanced security, more efficient backup and restoration, better file caching, more intelligent tradeoffs in performance versus power use, and a host of other possibilities.

Broader Impacts: The code and techniques developed will be released in open source form. The team will take steps (such as applying for supplemental REU grants) to involve undergraduates in the research. They will give talks and recruit at Hispanic-serving institutions. Materials and concepts from the research will be incorporated into classes taught by the principal investigators at their institutions.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Application #: 1065127
Program Officer: Anita J. LaSalle

Project Start
Project End
Budget Start: 2011-08-15
Budget End: 2016-07-31
Support Year
Fiscal Year: 2010
Total Cost: $349,994
Indirect Cost

CSR: Medium: Collaborative Research: Facets: Exploring Semantic Equivalence of Files to Improve Storage Systems
Reiher, Peter
University of California Los Angeles, Los Angeles, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments