Research is proposed to investigate how copies of a data object in a distributed database system should be placed on the nodes of a network, in order to maximize data availability and reliability and reduce communication cost. This work is an extension of work that has been done by the investigator and colleagues in the more specific cases of ring, tree, and hypercube networks. Although the general problem cannot be efficiently solved for arbitrary networks, many solvable, open sub-problems remain. There are two related questions that must be studied. The first is, "Where should the copies of data objects be placed (i.e., at what sites of the distributed database), in order to maximize the probability of a successful read-only, write-only, or read-write transaction? A ring, tree, hypercube, or other well-structured network, such as a toroidal mesh, can be studied to suggest solutions to the more general problem of optimal placement of copies on an arbitrary network. Another question primary to the solution of the placement problem is, "What is the optimal number of copies of a data item to maximize the probability of a successful transaction or minimize communication cost?" In fact, these two questions must be studied together. This research will explore the more general question of how many copies of a particular data item should be placed on the network and where they should be placed. The research will use combinatorics, probability theory, linear algebra, and database concepts to study these questions, in order to determine how to maximize the availability of data requested by transactions, where the objects requested by the transactions and the sites at which the transactions originate are defined by two (usually different) probability distributions. The two major thrusts of this work will be to utilize the concept of eigenvalues of the adjacency matrix of a graph to study data availability on general networks and to employ the conjecture that well- structured networks (rings,trees, hypercubes, etc.) can be used to suggest heuristics for general networks. The primary applications of the work proposed are in the area of information retrieval, specifically digital libraries and remote sensed data and mobile computing. The solution to these problems will decrease the average time required to access information in a distributed database and increase the probability that a successful transaction occurs when links in the network fail.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
9408392
Program Officer
Sanya N. Clark
Project Start
Project End
Budget Start
1994-08-01
Budget End
1995-07-31
Support Year
Fiscal Year
1994
Total Cost
$18,383
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401