This Small Business Innovation Research (SBIR) Phase I project will develop a congruence-based encoding scheme for structured information, such as XML documents. It employs a new labeling scheme that assigns numeric labels to information components like XML document elements. The labels fully encode the hierarchical structure of the information content. The labels alone can be used to determine if one component is an "ancestor" of another in the hierarchy, borrowing a term from genealogy. The encoding scheme is based on the concept of congruence in number theory. The research objectives include (1) the design of efficient algorithms that create optimized labels of small storage overhead and fast processing speed, (2) the characterization of the encoding scheme in relation to alternative methods, and (3) the exploration of its applications in electronic publishing and digital preservation. The research consists of a sequence of steps that include algorithm and software design, the analysis of storage overhead and processing speed, the encoding of different media types, and the design of an archival file format. It is expected that this congruence-based encoding scheme offers an efficient and elegant solution to many problems in information organization and management.

This encoding scheme has many important applications and commercial potential. It can be used in backend services for XML documents, or as a substitute for XML text encoding in places where storage and processing efficiency is critical. It can be used for digital object packaging, such as packaging for electronic books, as well as in the construction of media-independent archival formats for digital preservation. In addition, the congruence-based encoding scheme contributes to the understanding of information organization and enables further discoveries in information-based technologies. For example, the encoding scheme can be used to assign class code to classification systems, such as the Library of Congress Classification System and the North American Industry Classification System. Furthermore, the congruence-based encoding scheme can be used to design identification numbers, such as credit card numbers and currency serial numbers, that contain structured features for identification and security.

Project Start
Project End
Budget Start
2005-01-01
Budget End
2005-06-30
Support Year
Fiscal Year
2004
Total Cost
$99,649
Indirect Cost
Name
Rich E Books Company
Department
Type
DUNS #
City
Beaverton
State
OR
Country
United States
Zip Code
97007