In a world increasingly shaped by data-driven machine learning (ML), one of the emerging challenges is that data are often collected and stored in a distributed manner -- across multiple datacenters or devices. On the other hand, due to security and privacy concerns, there are often low levels of trust between the data owners. To this end, federated ML enables ML with distributed data, while avoiding the transfer of private data from distributed devices to a central datacenter. Towards the goal of democratizing ML, this project will design and implement new techniques to make federated ML secure and private. Of particular interest are new system designs that enable federated ML on devices with limited computational power or communication bandwidth e.g., smartphones, smart health monitors, and smartwatches, among others. The ideas, software, and results of this project will directly impact industry and real-world applications. This project will include curriculum development for federated ML and plans to involve participation by graduate students from underrepresented groups.

This project creates a transformative new direction for federated machine learning (ML) research, by enabling ML on devices that are untrusted or weak, and across organizations and for users who would like to maintain the privacy of their data. This project will include new work on theoretical foundations, systems design, implementation, and integration with popular ML software. Concretely, this project tackles three challenges in federated ML. The first challenge is fault-tolerant ML algorithms, i.e., new techniques to perform ML when workers act in arbitrarily malicious manners (called Byzantine failures) -- in particular, this project will show that by leveraging natural noise-tolerance in ML, it is possible to tolerate significantly more Byzantine workers than indicated by the traditional distributed computing literature. The second challenge is to develop privacy-preserving ML algorithms which introduce noise from workers to preserve the privacy of data owned by participants while leading to correct and fast ML at the global level. The third challenge is to investigate resource-constrained ML scheduling by including new techniques to allow large neural network models to run across multiple devices which have memory constraints. In addition to developing the algorithmic and theoretical frameworks for these directions, this project will also build and release open software.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2019-08-01
Budget End
2022-07-31
Support Year
Fiscal Year
2019
Total Cost
$450,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820