Unsupervised learning techniques have been widely used in real-world applications such as searching, ranking, recommender systems, social networks, online advertisements, online transportation, virtual assistants, and so on. In many practical applications of unsupervised learning methodology, nonconvex optimization based estimation methods are convenient due to their scalability to large data. This scalability is crucial in various applications such as computer vision and natural language processing. However, nonconvex optimization is computationally unstable or even infeasible in general due to unfavorable local minima that may exist. In consequence, statistical properties for global minimum based estimates may not provide meaningful guidelines for practitioners. This project aims to study the statistical properties of local minimum based estimates for nonconvex optimization methods in a range of unsupervised learning problems. The proposed research projects are significant in identifying reliable nonconvex frameworks by understanding the trade-off between computational feasibility and statistical efficiency. A new platform will be provided for collaborations across different fields such as statistics, mathematics and computer science. The activity is planned to engage female and underrepresented minority students in the study in Science, Technology, Engineering and Mathematics (STEM) fields through both theoretical and computational training and hands-on data analysis.
Since nonconvex optimization methods are known to be adaptive to missing data and various parameterizations, the proposed projects are focused on the statistical analysis for low-rank factorization based unsupervised learning, such as matrix completion, robust PCA, pairwise ranking and network representation. Recent developments in landscape analysis for nonconvex low-rank factorization reveal that there could be no spurious local minima if (i) the ground truth satisfies strong structural assumptions; (ii) model mismatching is not permitted; (iii) the sample size is large. In contrast, the proposed project is focused on conducting the landscape analysis in more general settings: First, model-free frameworks will be proposed to study the geometric properties of nonconvex optimization without requiring structural assumptions on the data or exact model matching; Second, the effects of model mismatching on the landscape of the nonconvex objective functions will be analyzed; Third, statistical efficiencies for local minima based estimates and their relationship with the intrinsic dimensions and sample sizes will be established; Fourth, algebraic structures of general parameterized low-rank factorization will be exploited in order to establish a unified landscape analysis for a broad class of unsupervised learning problems. Moreover, in order to test the empirical behavior of the proposed methods, the activity is planned to identify appropriate benchmark datasets in learning-to-rank, predictive network analysis and recommendation systems in order to compare the empirical performances of the proposed approaches with baseline methods in the literature.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.