The investigator and his colleagues propose to develop a new class of statistical tools -- called envelopes -- for studying multivariate data. Enveloping is based on novel parameterizations that use reducing subspaces to link a location matrix L with a dispersion matrix D. For instance, the outer envelope is the smallest reducing subspace of D that contains the span of L, while the inner envelope is the largest reducing subspace of D that is contained within the span of L. In multivariate linear regression, the maximum likelihood estimator of the coefficient matrix L based on an envelope model can be substantially less variable than the maximum likelihood estimator under the classical normal model, particularly when the mean function varies in directions that are orthogonal to the directions of maximum variation for the dispersion matrix. It is expected that similar results will hold in other multivariate areas, like discriminant analysis and functional data analysis. Enveloping is a new paradigm for addressing multivariate statistical problems that has the potential to facilitate interpretation, to improve analyses that might otherwise be tenuous and to produce truly massive gains in efficiency relative to standard methods.

Technological advances in many scientific fields have been followed by configurations of multivariate data that strain or are beyond the capabilities of standard statistical theory and methods. More than ever before, understanding experimental evidence and exploring scientific hypotheses require methods to meaningfully study contemporary data. This is particularly true in the life sciences, where the ability to extract the relevant information from a complex body of data is paramount. The investigator and his colleagues plan to study a new class of multivariate statistical methods that are capable of efficiently extracting relevant information for a given purpose from complex data. For instance, the overarching goal in tissue engineering is to gain the ability to replace damaged human connective tissue with viable tissue patches fabricated in vitro. Current technology has failed to reach this goal because tissues grown in vitro lack adequate mechanical integrity for in vivo applications. The mechanical integrity of tissues is controlled by a network of several hundred intercellular signaling proteins that shape long-term tissue growth and can be measured by mass spectrometry. The statistical objective here is to identify the most important stimuli and to extract the relevant information by reducing the signaling proteins to a few key protein indices that can be monitored during in vitro growth and directed by the external stimuli.

Project Report

Statistical methods are often used to study data and subsequently draw conclusions, which inform policy decisions that can impact society, with precise statistical conclusions containing more useful information than relatively imprecise ones. Precise conclusions about efficacy of a new drug are of course more useful than relatively imprecise ones. In the past decades, the computing revolution has produced an unprecedented capacity for data processing and storage, constantly motivated and followed by technical and experimental advances in a number of research fields, including data mining, finance, marketing, precision agriculture, imaging, bioinformatics and biomedical engineering. Statistical tools for analyzing complex multivariate data play a crucial role in fostering contemporary scientific understanding. New statistical models and methods are paramount as a means to eliminate redundancy, identify informational cores and improve efficiency in the analysis of complex data sets. The overarching goal of this grant was to develop a nascent statistical tool – called an envelope – for analyzing multivariate data. Envelopes can substantially reduce the uncertainty in conclusions by separating information in the data that is material to the goals of the analysis from that which is immaterial. Specific goals in this development include refining the methodology itself and, since multivariate data come in various forms depending on the context, extending the theory to encompass a wide variety of statistical settings. The degree of added accuracy offered by envelopes depends on the variation of the immaterial part of the data. If the immaterial information is highly variable then a substantial degree of confidence may be placed in the results of an envelope analysis as compared to those from a standard analysis. The goals of this grant were met. In a series of 18 articles published in scientific literature, we demonstrated both theoretically and in practice that envelope methods can far surpass standard methods in their ability to extract material information from multivariate data. We also developed and made freely available to the public a computer program for the implementation of envelope methods.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007547
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2010-08-15
Budget End
2014-07-31
Support Year
Fiscal Year
2010
Total Cost
$309,922
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455