Recent years have witnessed tremendous growth in the need of distributed information processing, where the data are collected, stored, and processed in and through a large network of distributed agents. Examples include training better machine intelligence using mobile devices to improve user experiences, monitoring environments using a network of sensors to reduce and mitigate wildfire, improving the coordination of unmanned aerial vehicles in surveillance. This project embarks on a new framework to address emerging challenges in processing the ever-growing large-scale datasets in a resource-efficient manner that have been unaddressed until now. The investigators will actively recruit and train students with diverse backgrounds including underrepresented minorities and women in STEM through long-term mentoring and outreach activities.

This project will substantially advance the algorithmic practice of statistical learning and inference from high-dimensional distributed data, by developing resource-efficient distributed statistical inference algorithms in network environments that provably achieve the optimal trade-offs in statistical error, computation and communication costs, thereby enabling scalable processing of decentralized and heterogenous data. Calling for a tight integration of high-dimensional statistical inference and large-scale decentralized optimization, this project consists of three major thrusts: (i) develop resource-efficient decentralized statistical estimation algorithms with provable convergence guarantees; (ii) develop systematic treatments to ensure algorithmic convergence even in the presence of highly unbalanced and heterogeneous data, as well as promote diversity for heterogeneous agents using graph regularization; (iii) promote optimal and adaptive early stopping criteria for decentralized nonparametric estimation. The tools and techniques developed herein will further foster the interplay between a broad range of fields including high-dimensional statistics, large-scale optimization, statistical signal processing, and machine learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
United States
Zip Code