The project aims to develop efficient algorithms for big data applications by exploiting its rich structure. Big data often contains higher-dimensional arrays. One dimensional arrays are vectors and two dimensional arrays are matrices. Arrays of dimension three or more are called tensors and have a complex structure. Some of the challenges that the research team will address are noise removal, recovery of missing data by inference and data size reduction by distilling relevant information. The resulting methods will be applied to early detection of sepsis, which contributes to the death of 200,000 people in the United States every year.
Most algorithms for low rank structured tensors are based on the Canonical Polyadic decomposition (also known as CP, Parafac or Candecomp). These algorithms may converge slowly, are numerically unstable and are difficult to scale to big data. The theoretical framework for tensor decompositions that is used is based on an algebraic graphical calculus that utilizes colored Brauer diagrams to obtain explicit formulas. The graphical calculus will also be used to give accurate estimates for the computational and memory complexity of these algorithms. The investigators will design efficient, numerically stable and computationally feasible algorithms for crucial tensor operations that are widely applicable to big data applications. Specifically, the project isexpected to create fast, scalable and reliable algorithms for tensor analysis, and apply them to crucial big data tasks including noise removal, dimension reduction, imputation of missing data and classification in a variety of applications with structured data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.