Distributed systems are collections of computational devices that work together to solve problems. These systems play an increasingly important role in modern society - from data centers serving popular websites to GPU clusters tackling deep scientific problems. A key obstacle in designing these systems is that due to many different unpredictable factors individual devices might behave in a faulty manner. Standard strategies to cope with this possibility assume that only a bounded fraction of the devices might suffer from faults (which can be catastrophic), while the rest execute perfectly. This project, by contrast, explores novel modeling and analysis techniques for studying systems in which every device might be faulty to some degree, but the faults are assumed less severe than the typical definitions. By doing so, it tackles the following fundamental question: when and how is it possible for a collection of locally unreliable devices to work together to reliably solve a global problem? Answers to this question can support the development of more cost effective and robust distributed systems. They will also provide potential insight into coordination problems in nature where the underlying computational "devices" (be they cells, ants, or bees) execute much less precisely than digital processors. In addition to these large scale impacts, this project will also have a local impact in the classroom. The PI will include insights from this work into a course that includes a module on novel ideas in the theory of distributed systems, and the project will help fund a graduate student to work on the topic.

In more detail, the canonical approach to formally modeling a distributed system is to represent the distributed processes as state machines that interact through shared objects or network channels. This project describes local faults (called "computational noise" in the following) as state machine transition functions that might probabilistically deviate from their specification. There are many different ways to instantiate this general idea. This project will investigate two main approaches: one which describes noise by allowing bounded adversarial modifications to the system's state machines, and another which describes noise as injections of random offsets to the underlying multi-dimensional vector of property values describing each device's current configuration. The project will apply these two approaches to two representative and well-studied problems: symmetry breaking on a shared channel and threshold detection with population protocols. The goal is to produce new results of immediate application to existing areas of computer science research, as well as to develop a foundation of models and tools on which a long-term investigation of this topic can be built.

Project Start
Project End
Budget Start
2016-09-01
Budget End
2018-08-31
Support Year
Fiscal Year
2016
Total Cost
$72,322
Indirect Cost
Name
Georgetown University
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20057