This proposal considers multiple regression procedures for analyzing the relationship between a response variable and a vector of covariates. It introduces an approach which deals with the dilemma that with high dimensional data the sparsity of data in regions of the sample space makes estimation of nonparametric curves and surfaces virtually impossible by choosing procedures whose relative variability (noise) is minimized. This is accomplished by abandoning the goal of trying to estimate true underlying curves and instead introducing measures of dependence that may be able to determine important relationships between variables. These dependence measures are expressed in terms of tuning constants that are chosen to maximize a signal to noise ratio. More precisely, the tuning parameter is a vector which gives the window size of local regions in the covariate space where we do local parametric fits of the response variable to the covariates. The signal is a local estimate of a dependence parameter which depends on the window size, and the noise is the standard error (SE, an estimate of the standard deviation) of this estimate. This approach of choosing the window size to maximize a signal to noise ratio lifts the curse of dimensionality because for regions with sparsity of data the SE is very large. It includes model selection where the variables that contribute insignificant signals compared to their SE's are eliminated.

It is proposed to develop procedures that can be used to determine relationships between factors in studies involving a large number of factors and complex relationships. The proposed methodology is applicable generally without any restrictive conditions. It involves the discovery of key relationships in studies where there are a great number of factors that need to be screened for their relevance. The dimension reduction and discovery techniques in this proposal are useful in a variety of scientific and engineering contexts including genetics and bioinformatics. In summary, it is proposed to develop methods that will be useful in studies involving large and complex data sets that are common in many areas including studies of health, the environment, and finance.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
0604931
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2006-07-15
Budget End
2010-06-30
Support Year
Fiscal Year
2006
Total Cost
$140,001
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715