This is a collaborative proposal between The University of New Mexico (UNM) and The University of Tennessee-Knoxville (UTK) for modeling, optimization and testing of a innovative load balancing strategies in large-scale, distributed-computing systems consisting of geographically-distant computational elements (CEs).
Intellectual Merit: With the emergence of large-scale distributed-computing systems that utilize a shared communication medium between the CEs, there is a need for accurately understanding the effect of delay in information transport on the functionality and control of these systems. Such distributed systems may include networks of mobile CEs, CEs that are connected through the Internet, or CEs that are distributed over different states or countries (representing various data bases, for example). Due to the physical distance between nodes in large-scale distributed systems, communication and load-transfer activity among the nodes is infested with tangible, random delays. This behavior is unlike what is ordinarily assumed in localized distributed systems, for which the constituent CEs are within proximity of each other, benefiting from a dedicated fast communication medium. Both intuition and our Monte-Carlo simulation definitively indicate that the presence of such communication and load-transfer delays in distributed systems can lead to the failure of traditional load balancing algorithms. Thus, a delay-inclusive analytical framework is needed for the dynamics of task computing, and it is within such a framework that the development and optimization of delay-inclusive load-balancing policies can be realized.
The objectives of this program are to develop a general analytical framework for modeling the stochastic dynamics of delay-infested distributed systems and utilize it to develop load-balancing strategies that mitigate the performance degradation or failure caused by communication and load-transfer delays. The modeling is developed within a novel, regeneration-based queuing framework, and the load-balancing optimization will be carried out by means of statistical learning and stochastic prediction. The load-balancing strategies developed in this program will be tested in a physical distributed-system environment with realistic delays. To do so, a distributed-computing test-bed will be developed and deployed connecting existing data searching computers at UTK with a miniature clone system to be developed at UNM.
Broader Impact: The work proposed here is motivated by (but not limited to) practical, pressing issues arising in the current work done by the Co-PIs (at UTK) on The Federal Bureau of Investigation (FBI) National DNA Index System (NDIS) and its Combined DNA Index System (CODIS) software. The projected growth of this NDIS database and in the demand for searches of its contents necessitates migration to a parallel computing platform, and potentially to large-scale distributed systems, where the database is distributed or duplicated over geographically distant centers which are connected by means of a bandwidth-limited shared communication medium. The outcomes of this program will not only benefit the above systems but also a broad range of public, private and government database systems that perform searches over distributed sites. The proposed research will also be utilized to improve the operation of large-scale virtual laboratories. The Electrical and Computer Engineering Department at UNM has created an efficient way for the remote control of instruments and simulations over the Internet. This approach is applicable to both industry and distance education, and it is currently operational and accessible for designated users. However, this platform has not been tested with a large number of users in a real, distributed environment. The methodology in this research will be applied to enhance the performance of the existing educational platform and scale up its reach to a large network of institutions worldwide. This activity will also provide tremendous training opportunities in state-of-the-art information technology for graduate and undergraduate students.