Intelligent processing of complex signals such as images or sound is often performed by a parameterized, nested hierarchy of simple nonlinear processing layers, as in a deep neural net, an object recognition cascade or a speech front-end. Joint estimation of the parameters of all the layers and selection of an optimal architecture is a difficult nonconvex optimization problem, difficult to parallelize, and requiring significant human expert effort, which leads to suboptimal systems in practice. This research will develop a new, general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, applies even when parameter derivatives are not available, easily makes use of parallel architectures, and often provides reasonable models within a few iterations. The PI's research and teaching will introduce undergraduate students to the design of nested machine learning systems, and provide computer science graduate students with skills in optimization, in support of specific areas (machine learning, computer vision/speech, etc.). The PI will broadly disseminate the results of the research by publishing papers in optimization, machine learning, computer vision and speech. The PI will make Matlab and C/C++ code available for the algorithms developed and use it as teaching aid in his courses on optimization and machine learning. The PI will strive to involve a diverse population of students in the research.
MAC could drastically facilitate, by reducing runtime and human effort, the practical design and estimation of complex models currently being developed in data-rich disciplines such as machine learning, computer vision and speech, but also in other areas of engineering and science. It could also obviate the construction of hand-crafted features in trees and other classifiers. It may bring a wide-ranging and timely benefit to society given that serial computation is reaching a plateau and cloud computing is becoming a commodity, and intelligent data processing is finding its way into mainstream devices (phones, cameras, etc.), thanks to increases in computational power and data availability. The research will develop MAC along two aims. Optimization: The PI will explore different ways to define auxiliary coordinates; different algorithms to solve the MAC-constrained problem; and efficient W and Z steps. The PI will also investigate the convergence properties of these approaches and their ability to be parallelized. Machine learning: The PI will apply MAC-based optimization to several existing and new nested models: deep neural nets; best-subset feature selection; learning features for decision trees; dictionary and classifier learning; parametric embeddings; learning hash functions; distal learning; model adaptation; and others. The evaluation plan includes standard benchmarks and articulatory speech modeling tasks. The research is interdisciplinary and potentially transformative: it opens many opportunities for research in optimization and machine learning, promoting thinking in terms of MAC formulations when designing learning systems, and could replace or complement the backpropagation algorithm in learning nested systems both in the serial and parallel settings. The models developed for articulatory speech will also improve our understanding of speech production.