With the recent surge of applications involving large-scale data comes a critical need to develop efficient, robust, and practical methods for data analysis. With more and more applications having multi-modal data (data coming from many distinct sources and of different types, often having a temporal component), the need for mathematical developments to handle and understand this data is critical. The key mathematical object at the heart of such study is the tensor ? a multi-dimensional array that can be viewed as an algebraic extension of the common notion of a mathematical matrix. The mathematics of tensors has received a lot of recent attention, however, there are still many key lacunae in the scientific understanding of these objects as well as their use in modern data analytic techniques. The project focuses on the development of computationally feasible methods to detect patterns within such tensor data as well as the geometric properties of tensors that can be used for compression. The team will partner with the California Innocence Project (CIP), a nonprofit whose goal is to free innocent persons who have been convicted of a crime. The data involved are inmate letters and case files, all of which are highly multi-modal since they include, for example, court documents, interview testimonials, forensic data, images, and more. The goals in this context will be to facilitate the assessment procedure that CIP uses to decide what cases are likely to be successful, what commonalities they share, and what populations may need more attention. The partnership with CIP will serve not only as a means of direct societal impact but also as a feedback mechanism to test and validate the developed mathematical approaches.
We focus on two technical thrusts. The first thrust is centered around methods to detect patterns in tensor data without impossible unfoldings, in an online setting, and allowing for topic structures. The second thrust focuses on dimension reduction, developing geometry preserving reduction maps that act on tensors and map to tensors, along with related important methods that utilize such maps. The first thrust will go beyond existing research in several ways. First, it will provide much improved topic detection in dynamic applications. Second, it will develop provable convergence of features in the stochastic online setting. Third, it will offer improved topic structures using a deep model. The second thrust focuses on the mathematics of tensor dimension reduction and will provide provable guarantees for such, along with analysis of the related algorithms. Such practical techniques and understanding simply do not yet exist for true tensor data. The proposed research program will therefore further mathematical understanding of tensor geometries while also providing practical approaches that can be used in any field needing to analyze multi-modal data. The transition of these results to society will be facilitated through connections between the PI and nonprofits, including the California Innocence Project. The project also includes a novel outreach and educational component, including integration of high school student programs, community change programs, teachers, and future teachers in summer workshops and events throughout the year.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.