This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
Parallel, distributed, and Internet-based computing, communication, and information systems are heterogeneous mixtures of machines and networks. They frequently experience degraded performance due to uncertainties, such as unexpected machine failures, changes in system workload, or inaccurate estimates of system parameters. It is important for system performance to be robust against uncertainties. What does it mean for a computer system to be ?robust?? How can robustness be described? How does one determine if a claim of robustness is true, or if a system will fail? How can one decide which of two systems is more robust? These are the types of issues we address in this project, with our team of faculty, graduate students, and undergraduate students from Colorado State University and the University of Colorado, and colleagues in industry (DigitalGlobe) and a national laboratory (NCAR, National Center for Atmospheric Research). We are designing models, metrics, mathematical and algorithmic tools, and strategies for (1) deriving system resource management schemes that are robust, and (2) quantifying the probability of meeting performance requirements given uncertainties. We are validating our research by working with DigitalGlobe, which supplies images to Google Maps and Microsoft Virtual Earth, and NCAR, whose research activities include the prediction of severe and catastrophic weather. The robustness concepts being developed have broad applicability, and will significantly contribute to meeting national needs to build and maintain robust information technology infrastructures.