The "curse of dimensionality" has severely limited human's capability of handling high-dimensional data in many application domains. However, most useful data in existing human knowledge database has certain compressible features. This project focuses on the mathematical description of these compressible features, and develops a novel hierarchical modeling technique to extract these features from high-dimensional datasets in science and engineering applications and process the compressed information efficiently on a hierarchical tree structure. The investigators will develop fast algorithms for high-dimensional integrations involving a truncated multivariate normal distribution that targets the analysis of medical datasets. The techniques developed from this project will provide the scientific community a very powerful tool to handle high-dimensional datasets and at the same time foster the training of researchers with interdisciplinary knowledge.

Multivariate Gaussian distribution is one of the most important continuous distributions in statistics. If some components are restricted to an interval, either finite or semi-finite, it is referred to as the truncated multivariate normal (TMVN) distribution. Many statistical algorithms rely on the evaluations of the probabilities and expectations with respect to a TMVN, especially in the expectation-maximization (EM) type algorithms. Direct computation of the desired expectation is very challenging. A commonly used alternative approach is based on the Monte Carlo simulation by drawing random samples from the corresponding TMVN distribution. However, it is equally challenging to simulate from a TMVN distribution in high dimensional cases. This project will develop new hierarchical algorithms to efficiently compute very high dimensional TMVN probabilities and expectations. The core ideas include the hierarchical data clustering, low-rank and low-dimensional features extraction, and their efficient processing on the hierarchical tree structures. The resulting algorithm can compute the expectations with respect to a class of p-dimensional TMVN distributions in asymptotically optimal O(p) operations in high dimensional cases, which can also be used to tighten the likelihood ratio bound of the target TMVN distribution in the acceptance-rejection method to achieve the highest acceptance probability while avoiding the burn-in period of some competitive algorithms such as the Metropolis-Hastings algorithm. The hierarchical nature of the algorithm allows easy adoption of the recent progress in adaptive, dynamic, and asynchronous runtime systems to efficiently utilize the computing resources at scale. The project provides advanced numerical tools to accelerate the computations and improve the applicability of the EM algorithms to handle large and complex biomedical data, with target applications aimed at improving public health.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1821142
Program Officer
Christopher Stark
Project Start
Project End
Budget Start
2018-08-15
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$50,000
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401