The goal of this project is to merge advanced tools from multigrid (MG) methods and machine learning (ML) towards the development of a novel class of numerical techniques targeting the data intensive applications emerging in physical, biological and social sciences. Multigrid methods, including both geometric and algebraic multigrid (GMG and AMG) methods, are effective tools for solving linear as well as nonlinear algebraic system of equations arising from scientific and engineering computing. On the other hand, there is a significant advancement in machine learning (ML) techniques, especially convolutional neural networks (CNN), which have successful applications in many areas such as image classification and processing. The proposed project is to explore the resemblances and differences between these two different technologies so that more efficient multigrid methods as well more efficient deep learning models are developed. The existing rich theory of multigrid method is expected to shed new light to the theoretical understanding of deep neural networks whereas the numerous empirical techniques used in the vast and ever-growning deep learning literature can be used to design general multigrid methods with wider range of applications. This interdisciplinary research project is expected to have a direct impact to both the scientific computing community and the artificial intelligence industry.
More specifically, MG and CNN are similar for the use of multilevel hierarchy and the use of many technical components such as smoothers (MG) versus convolutions (CNN), restriction (MG) versus convolution with stride (CNN). But they also have some major differences: CNN has multiple channels of convolutions to be trained whereas MG often has one single smoother given a priori. Such relationships motivate the design of new multigrid methods with more general smoothers and restrictions that are subject training in different ways and, as a result, multigrid methods will become more adaptive and robust in its application to different practical problems. The well-understood MG structure and theory can be adapted to understand and improve the existing deep learning model such as residual neural networks. Furthermore, multilevel iterative techniques used in MG will also be investigated to speed up the stochastic gradient descent method that is now the standard training algorithm for most deep neural networks in machine learning.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.