High-dimensional arrays commonly arise from modern scientific and technological research and have been a central topic in modern statistics and data science. Some areas such as genetics, microbiome studies, brain imaging, hyperspectral imaging, etc., yield a large amount of high-dimensional array data; while in some other areas, data can be recast into high-dimensional array form to facilitate analysis. In these situations, the target parameter is often high-dimensional/high-order, but the important information may lie in dimension-reduced subspaces induced by various structural conditions. How to efficiently exploit these subspaces poses significant statistical and computational challenges. This project aims to address these challenges from a perspective of subspace learning. By taking into account dimension-reduced and low-order subspaces, the PI aims to address a series of statistical and machine learning questions by developing new methodologies and theories with statistical and computational advantages.
This project will progress along three major directions: (i) fast estimation and inference for high-dimensional arrays via important subspace sketching; (ii) high-order clustering with theoretical guarantees; (iii) ultrahigh-order tensor singular value decomposition via a tensor-train parameterization. The research will be applicable to a variety of topics involving high-dimensional matrix and tensor data, such as genetics and genomes, reinforcement learning, neuroimaging analysis, material science, recommender design, etc. The PI will also develop user-friendly software packages for the new algorithms and make them available for public use. The PI is committed to training students, especially those from groups underrepresented in STEM, through involvement in the research project.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.