Recurrent Deep Learning Machines

Lo, James

Abstract

The objective of this research is to develop a new paradigm of deep learning machine - those with a feedback structure. Feedbacks bring to computing nodes current or past information contained in neighboring or larger receptive fields of other computing nodes from the same or higher layers for forming better local representations or features. Such information is required for processing dynamical data and for maximizing generalization capabilities on static data. The approach of this research is to select or design deep and recurrent architectures, develop generative and discriminative learning techniques, and integrating the risk-averting method of convexifying training criteria into training recurrent deep learning machines.

Intellectual Merit: Recurrent neural networks are irreplaceable for applications involving dynamical data and are fundamentally better than feedforward networks even on static data. However, difficulty in training recurrent networks has stifled development and understanding of them. The proposed research is expected to help remove this difficulty, bring forth the full power of recurrent neural networks, and boost interests in neural networks in general, which have unfortunately and undeservedly fallen out of favor in recent years.

Broader Impact: Recurrent deep learning machines are powerful for static and dynamical classification and regression, including image and video recognition, analysis and compression; nonlinear system identification/control; signal processing/filtering; and critical system health/fault monitoring/detection. Therefore, the proposed work will contribute greatly to medical instrumentation, computer/robot/information technology, wireless telecommunication, national defense, and homeland security. Recurrent deep learning machines will ecome an important component in the graduate education in engineering and computer science

Project Report

. A method of convexifying the error criterion was proposed for data fitting but had difficulties with computer shift register overflow and selection of the risk-sensitivity index. The efforts in the project to resolve such difficulties resulted in the followng methods, which are believed to have solved the local minimum problem for all practical purpose: 1. The NRAE method (at a fixed value of λ): The NRAE training method was numerically tested for a large number of values of λ in the range 10?-10¹¹. The success rate of the method is about 50% for λ in the range of 10?-10? and increases to about 75% as λ increases to the range of 10¹?-10¹¹. The method fails to work when λ exceeds 10¹¹. The method and numerical results were reported in "Overcoming the Local-Minimum Problem in Training Multilayer Perceptrons with the NRAE Training Method", Advances in Neural Networks - ISNN 2012, J. Wang, G.G. Yen, and M.M. Polycarpou (Eds.), pp. 440-447, Springer-Verlag Berlin Heidelberg, 2012 (James Ting-Ho Lo, Yichuan Gui and Yun Peng). 2. The NRAE-MSE method: To improve the success rates of the NRAE training method, we developed the NRAE-MSE training method. For a value of λ in the range of 10?-10¹¹, we train the MLP with the NRAE method, but we take excursions from training with C_{λ}(w) to training the MLP with MSE criterion Q(w) from time to time. Once Q(w) ≈ 0 is reached in such an excursion, the training is stopped and declared a success. This method, which is called the NRAE-MSE training method, achieved 100% success rate in all the numerical tests conducted. This method and numerical results were reported in "Overcoming the Local-Minimum Problem in Training Multilayer Perceptrons by the NRAE-MSE Training Method", Advances in Neural Networks - ISNN 2013, Chengan Guo, Zeng-Guang Hou, Zhigang Zeng (Eds.), pp. 440-447 Springer-Verlag Berlin Heidelberg, 2013 (James Ting-Ho Lo, Yichuan Gui and Yun Peng). The NRAE method and the NRAE-MSE method discussed above were described in detail in a journal paper entitled "The Normalized Risk-Averting Error Criterion for Avoiding Nonglobal Local Minima in Training Neural Networks", to appear in Neurocomputing, which is available online at http://dx.doi.org/10.1016/j.neucom.2013.11.056 (James Ting-Ho Lo, Yichuan Gui and Yun Peng). 3. The GDC (gradual deconvexification) method: It was found that the greater λ is, the flatter the lanscape of the NRAE is. At a very large λ, training often stagnant, and it is misinterpreted as falling into a local minimum. This observation motivated an alternative method: We the training with a large λ, say 5,000, and gradually decrease it by a certain percentage, say 10%,, whenever the training error is not reduced enough after a certain number of epochs, down to 0. This method, called the GDC method, has the advantage of having a very small chance to fall into a nonglobal local minmum at large values of λ and continuing to improve the training error as λ decreases. Our tests show the GDC method works 100% on all of our test examples. The GDC method and numerical results were reported in "Overcoming the Local-Minimum Problem in Training Multilayer Perceptrons by Gradual Deconvexification", Proceedings of International Joint Conference on Neural Networks, pp. 635-640, Dallas, Texas, USA, August 4-9, 2013 (James Ting-Ho Lo, Yichuan Gui and Yun Peng). 4. The GDC pairwise training method: In the above three training methods, the BFGS optimization method was used. When the dimension of the feature vector input to the MLP is large, the BFGS method requires a very large amount of memory. Hence, the GDC pairwise training method, which used a pairwise gradient descent optimization method, was developed. The method was tested for recognizing handwritten digits on the well-known MNIST dataset. We selected for comparison the MLP(784:300:10) from Yann LeCun's list of neural networks that was trained on the MNIST dataset without added data with distortion. Five training session with different random initialization seeds were conducted. The resultant five MLPs yielded test error rates of 2.61%, 2.67, 2.7%, 2.73, and 2.88%, respectively. These 5 test error rates are significantly better than the test error rates of those with the same or larger architectures trained on the same MNIST dataset without added distortion in LeCun's list. There are better test error rates in the list, but they are those of learning machines of a different paradigm (e.g., convolutional networks), of a larger MLP architecture, or trained on different data (MNIST with added data with distortion). This GDC pairwise method and numerical results were reported in "A Pairwise Algorithm for Training Multilayer Perceptrons with the Normalized Risk-Averting Error Criterion", Proceedings of5.International Joint Conference on Neural Networks, Beijing, China, July 4-9, 2014 (Yichuan Gui, James Ting-Ho Lo and Yun Peng).

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Electrical, Communications and Cyber Systems (ECCS)
Type: Standard Grant (Standard)
Application #: 1028048
Program Officer: Paul Werbos

Project Start
Project End
Budget Start: 2010-09-01
Budget End: 2014-08-31
Support Year
Fiscal Year: 2010
Total Cost: $295,151
Indirect Cost

Recurrent Deep Learning Machines
Lo, James
University of Maryland Baltimore County, Baltimore, MD, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments