One of the grand challenges in Artificial Intelligence (AI) and Machine Learning (ML) is building intelligent systems that can learn from data in real time. To learn from streaming data, there is need for novel approaches in online optimization and prediction. Current methods assume sequential availability of gradients (or loss), posing a practical hurdle in implementation. We propose two approaches to address this gap using model-based learning. These approaches are aimed at respectively exploiting, a distributed computing architecture (to divide the required computational effort) or a communications network (to efficiently aggregate disparate data). The collaborative online optimization algorithms and theoretic extensions introduced in this work have a broad range of applications domains such as speech recognition and computer vision, autonomous vehicles, transportation, neuroscience, and business analytics.
Most of classical ML algorithms have been developed under the assumption that data sets are already available in batch form. Transitioning from offline to online learning faces a major practical hurdle in many application domains where the closed-form of the objective function is unknown to the learner. When dealing with streaming data, this black-box property leads to a natural trade-off between delays (due to data or computation) and the speed and accuracy with which a model can be identified. A distributed computing architecture provides a way to reduce delays to obtain reasonably accurate models in the necessary timescale. We propose to study fast distributed asynchronous stochastic gradient approaches for online learning in which coordination between multiple workers (processors) interacting asynchronously is carefully engineered. Improved accuracy and speed may also be jointly achieved by a network of learners receiving different streams of data. Thus, we also consider decentralized models of online learning with multiple learning agents that communicate over a network. With the ability to share predictions or estimates with other agents in a network, the collective can aggregate disparate information in a way to outperform (in terms of accuracy and speed) any individually identified model. Finally, we consider the case in which data streams have graph structure. Streaming graph structure data arises in diverse application domains such as transportation networks, social networks and other networks found in biology, where the graph captures the correlation in data. The proposal includes the development of a new graduate course aimed at providing engineering students with working knowledge on state-of-the-art distributed online optimization techniques.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.