Learning concise and informative representations of high-dimensional data is a precursor to the success of modern data analytics. However, recent years have witnessed many non-standard data regimes that impose unprecedented challenges for representation learning. The first scenario is that data are decentralized, that is, they are scattered across different places across which the communication is highly restricted. This is common for international companies that collect data worldwide, but cannot aggregate them due to constraints on network bandwidth or legal policies. The second scenario is that data exhibit significant temporal dependence, as seen in stock prices, traffic flow, and clinical trials. This project will develop novel statistical methods with theoretical guarantees to handle these modern data regimes. It also aims to train the next generation of data scientists under these important problem setups.

The principal investigator (PI) will develop novel methods and theory for subspace and representation learning for decentralized and dependent data. For decentralized data, the PI plans to design and study a new methodological framework for distributed estimation of a general latent variable model. This framework requires only one round of communication of model parameters, adapts to a wide range of complex latent variable models (including those based on deep neural nets) and has been shown to yield superior numerical performance over existing approaches. Another more specific setup that the PI will consider is distributed estimation of singular spaces, with applications to spectral clustering. For dependent data, the PI will focus on learning the top singular space of a low-rank Markov transition kernel to perform state compression and dimension reduction. The PI plans to solve the problem via maximizing the log-likelihood with either nuclear-norm penalty or rank constraint. The statistical rate of the resulting M-estimator will be explicitly derived, and new optimization algorithms will be developed to compute these problems with convergence guarantee.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
2015366
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2020-07-01
Budget End
2023-06-30
Support Year
Fiscal Year
2020
Total Cost
$71,537
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109