The investigators approach the dilemma that estimation of curves and surfaces such as the conditional mean is virtually impossible with high dimensional data by focusing instead on the estimation of measures of dependence between a response and a set of covariates. These measures of dependence take the form of signal divided by noise. They are used to select the subset of covariates to include in the model, and to choose tuning parameters. The signal measures the strength of the dependence of Y on a set of covariates. The noise is a standard error, that is, an estimate of the standard deviation of the estimated signal. It will be small when too many variables are included in the model and when the tuning parameter sacrifices precision for smaller bias. Choosing variables and tuning parameters that minimize signal to noise leads to procedures that converge at the traditional root-n rate. The investigators use asymptotic and Monte Carlo methods to investigate the properties of such procedures. They also relate them to traditional model and tuning parameter selection procedures and compare traditional procedures with the signal to noise approach.

The last few years have seen the establishment of large databases' containing a large number of variables that are to be related and compared. A good example is the data produced by the human genome project where a great number of genes need to be considered as possible contributers to a certain disease. Dealing with a large number of variables is difficult because many of them will contribute variability (noise) that may drown out potential interesting relationships (signals). The investigators approach this problem by using general flexible model equations to represent important relationships between variables. Then variables and aspects of the equations are selected to minimize the ratio of signal to noise. This procedure automatically weeds out the variables that contribute mostly noise and selects an equation that emphasizes the signals present in the data.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0505651
Program Officer
Grace Yang
Project Start
Project End
Budget Start
2005-07-01
Budget End
2007-06-30
Support Year
Fiscal Year
2005
Total Cost
$50,000
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715