This project aims to develop an information-theoretic approach to communication-constrained statistical learning problems involving multiple learning agents located at the nodes of a large network. This approach will build on the recently introduced coordination paradigm within network information theory, which looks at multiterminal problems in terms of optimal use of communication resources in order to establish some desired statistical correlations between the nodes of a network. The main theoretical goal is to explicitly identify the effect of bandwidth limitations, losses, delays, and lack of central coordination on the performance of statistical learning algorithms over networks. The project will systematically explore the fundamental limits of learning in multiterminal settings and design efficiently implementable and robust coding/decoding schemes. The theory developed under this project will be a novel synthesis of probabilistic techniques from machine learning (such as empirical process theory) and of multiterminal information theory (such as distributed lossy source coding).

As a broader impact, this project will provide key enabling technologies for large-scale, distributed applications of machine learning in such domains as smart grids, health-care informatics, transportation networks, and cybersecurity. Statistical machine learning is emerging as a dominant paradigm for making accurate predictions on the basis of empirical observations in the presence of significant model uncertainty. Most of the research activity in this field, however, has taken place in isolation from the realities of complex networks and all the attendant limitations on information transmission and processing: it is frequently assumed that the data needed for learning are available instantly, with arbitrary precision, and at a single location. However, given the fact that most data fed to machine learning algorithms are increasingly generated, exchanged, stored and processed over large-scale networks, there is a pressing need to dispense with this assumption and thus take network effects into consideration. The theory and the algorithms developed as part of this project will ensure that the relevant data are delivered over the network to the right decision-makers, while securing accurate decisions made on the basis of the received information. The research component of the project is tightly integrated with an education and outreach plan, including development and teaching of new courses on machine learning aimed specifically at engineering students.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1254041
Program Officer
Phillip Regalia
Project Start
Project End
Budget Start
2013-02-01
Budget End
2022-01-31
Support Year
Fiscal Year
2012
Total Cost
$518,435
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820