NeTS: Small: Network Failure Recovery Employing Network Coding

Ayanoglu, Ender

Abstract

Facilities used in long-distance communications networks incorporate mechanisms to overcome failures, however, these result in the use of spare capacity (redundancy) to achieve relatively short restoration times (about 50 ms). Therefore, some providers have implemented proprietary network failure restoration techniques with a smaller degree of redundancy and with a larger restoration time. Restoration from network failures is particularly difficult for networks that employ routers and the Internet Protocol. This research is applying using network coding techniques for hitless (transparent to the user) restoration of services in the networks of future. The research is investigating topology and code design algorithms for actual networks, to understand their performance, to compare restoration time and extra capacity requirements of the new approach with conventional techniques, and to discover new protocols to implement the new technique. In this work, failure recovery design using network coding will be implemented in two steps. In the first step, for a given network, a failure recovery network will be designed using a linear program. In the second step, the needed Galois field size will be determined. The new hitless restoration technology will be compared with conventional failure recovery techniques using disjoint paths in terms of spare capacity, restoration speed, and operation complexity. The new technique will also enable packet loss recovery in the network, as opposed to on an end-to-end basis as performed conventionally. A special implementation for wireless networks is also under development. The successful outcome of this research can have a broad impact on future networks.

Project Report

In this project, a number of algorithms were developed for recovery from link failures in communication networks of arbitrary topology. These algorithms employ diversity coding, or network coding. They do not require any feedback signaling. As a result, they are very fast. We have calculated the speed of recovery with our algorithm to be about 30 microseconds in long distance networks. Conventional algorithms for this goal achieve recovery in tens of milliseconds in the same kind of networks. Therefore, there are about two-three orders of magnitude improvement with our algorithms against conventional techniques. In addition to speed, such algorithms are measured on the basis of spare capacity requirement. This is typically measured in terms of fiber miles and the goal is to achieve small extra capacity. We have shown that although our algorithms result in orders of magnitude improvement in the speed of recovery, they only pose a modest increase in the spare capacity needed. The algorithms developed are as follows. The first is a heuristic algorithm which initially showed that the increase in extra capacity remains limited. The second is an algorithm we call Diversity Coding Tree. This algorithm is based on the principles of Linear Programming (LP). For a given network topology and a traffic demand matrix, it calculates an optimum solution for network failure recovery against link failures. These types of failures occur the most often and therefore our emphasis has been on them. With this algorithm, we can achieve the stated goal of sub-millisecond recovery with only a modest increase in spare capacity. We have also shown that the required synchronization and buffering can be achieved in a simple manner. These two algorithms treat network links as unidirectional. We have also developed a version of the algorithm called Coded Path Protection. With this technique we treat the network link as bidirectional. We begin with an existing solution for Shared Path Protection, which is one of the conventional algorithms. Coded Path Protection uses LP to convert this solution to a diversity coding based solution with only a minimal increase in spare capacity. However, the bidirectional nature of this algorithm results is significant increase in restoration time. Therefore, we concentrated on the improvement of the Diversity Coding Tree algorithm. We made significant improvements. First, we designed a version of the algorithm where coding is performed in nonsystematic manner, in other words, coding is carried on the primary paths as well as protection paths. This reduces the extra capacity requirement. In addition, we used basic techniques from LP implementation to reduce the complexity of the design algorithm. This most recent development enables us to complete the design of large networks by using LP up to the optimum final solution on a regular personal computer within a short time. In the practice of LP, there is a number that the solution calculates with respect to the optimum solution, known as the optimality gap. By using these techniques, we know that the LP solution reaches zero optimality gap. We have published two workshop papers, have published or received acceptance of six conference papers, have received acceptance of a journal paper (available on IEEE Xplore as an early access paper). We are pursuing other journal papers.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Network Systems (CNS)
Application #: 0917176
Program Officer: Joseph Lyles

Project Start
Project End
Budget Start: 2009-08-01
Budget End: 2013-07-31
Support Year
Fiscal Year: 2009
Total Cost: $350,000
Indirect Cost

NeTS: Small: Network Failure Recovery Employing Network Coding
Ayanoglu, Ender
University of California Irvine, Irvine, CA, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments