Multivariate linear regression (MLR) is a paradigm for studying the relationship between two groups of variables, the predictors and the responses. It is broadly applied in many disciplines for explaining the dependence between the responses and the predictors or for conducting prediction of future outcomes. With the development of technology for data collection and measurement, many contemporary problems involve high-dimensional datasets. This implies the possibility that a considerable amount of the response information may be redundant or irrelevant. This redundant or irrelevant part of the data will bring variation into the estimation in MLR, making it inefficient. To address this problem, a new class of models called envelopes was introduced by Cook et al. (2010). It uses dimension-reduction techniques to identify and extract the relevant information in the data, so that the estimation is based on only the relevant part. Up to this point, however, the envelope class is still in its infancy. It has restrictions on the data structure, and its advantages cannot always be realized. This doctoral dissertation research project will study and bring the envelope class to maturity, making it more flexible and achieving further efficiency gains by enriching the class with new models and methods. New models that address scale invariance, heteroscedasticity, and small sample size issues will be developed. New models that lead to further efficiency gains beyond the current models also will be developed. These extensions of the envelope class will make minimal assumptions on data structure and extend the applicability and power of the enveloping idea, making it more appealing.

This research will result in more efficient data analysis methods for sociology, economics, genetics, and many other disciplines in science and engineering. These methods are expected to achieve the same accuracy in analysis with a smaller sample size, making experiments and the data collection process shorter, easier, and less expensive. The project also will link existing statistical tools, such as dimension reduction techniques and methods for estimating large covariance matrix, to the field of MLR in a novel way that opens new frontiers of their application. User-friendly software will be developed that implements the new methodology. As a Doctoral Dissertation Research Improvement award, support is provided to enable a promising student to establish a strong, independent research career.

Project Report

Statistical methods for analyzing complex data composed of many variables play crucial roles in fostering contemporary scientific understanding and in informing policy decisions that can impact society. The best methods strive to eliminate redundancy, identify informational cores and improve efficiency in the analysis of complex data. A class of nascent statistical methods called envelopes has the potential to excel at these tasks and to produce precise statistical conclusions by separating information in the data that is material to the specific goals of the analysis from that which is immaterial. This grant provided support for the development of statistical software that offers original envelope methodology and allied methods, providing for complete multivariate statistical analyses using envelopes. This software, which is written as a MATLAB toolbox, is available to the public at https://code.google.com/p/envlp/ and will serve as a platform for making new advances in envelope methodology accessible to the scientific community. The software is modularized and user friendly, enabling investigators to easily test its capacities. It offers a platform for adding new methods and functions. And it can also be used in graduate and undergraduate courses on multivariate statistics. The website contains background documentation and descriptions of all toolbox functions. It also contains examples that reproduce published results.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1156026
Program Officer
Cheryl Eavey
Project Start
Project End
Budget Start
2012-05-15
Budget End
2013-04-30
Support Year
Fiscal Year
2011
Total Cost
$4,000
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455