Data is being generated at a tremendous rate in diverse applications, such as health care, genomics, energy management and social network analysis. Indeed, the recent moniker of Big Data emphasizes that massive volumes of data are ubiquitous. Thus, there is a great need for developing scalable and sophisticated methods for analyzing these data sets. This project is aimed towards one aspect of this challenge, namely, developing scalable and state-of-the-art numerical methods for modern problems that arise in machine learning.
This project will aim to develop divide-and-conquer methods for representative, concrete problems that arise in contemporary applications. These include (a) classification: kernel support vector machines, (b) regression: kernel regression and high-dimensional sparse approximation, (c) structure learning: graphical model estimation, (d) spectral approximation: multi-scale SVD computation, and (e) missing value estimation: matrix factorization. The project will develop specialized algorithms for each of these problems, in particular, developing tailored ways of dividing the problem into subproblems, solving the subproblems, and finally conquering the subproblems. Thus, general principles for applying the divide-and-conquer approach to other problems in large-scale machine learning will be uncovered. The project will lead to software for large-scale data analysis that will be efficient on modern multi-core computers. Impact of the new algorithms on various application areas, such as bioinformatics and network analysis, will be studied. Within computer science and applied mathematics, the project will have a broad impact on research in a variety of disciplines, including numerical analysis, numerical optimization, statistics, machine learning, data mining and parallel computing.