Proposal: DMS 9504924 PI: Leon Gleser Institution: U. of Pittsburgh Title: STATISTICAL THEORY AND METHODS FOR ERRORS-IN-VARIABLES REGRESSION AND OTHER MULTIVARIATE INFERENCE PROBLEMS Abstract: This research compares the mean squared error risks of various estimators of regression slopes that correct for bias due to measurement error in the predictor variables. Included in these comparisons is the naive (uncorrected) least squares estimator that ignores measurement error. Emphasis is placed on bias correction that makes use of knowledge about the repeatability (reliability) of the vector of measurements on the predictors to "correct the slope estimates for attenuation." Various ways of obtaining and analyzing information concerning the reliability of the vector of measured predictors are considered, including the use of reliability studies of individual components and/or subvectors of the predictor vector and the elicitation and use of expert opinion. These separate sources of information are combined to yield frequentist or Bayes point estimators of the reliability matrix of the measured predictors; this matrix is used to correct least squares regression slopes for attenuation. Methods for forming confidence (or credible) intervals for slopes, and also for components of the reliability matrix are derived and compared. Computational algorithms for finding likelihoods and posterior distributions in both these problems, and also in multivariate growth curve and seemingly unrelated regression models, are written and tested on real and simulated data. Much of scientific research concerning complex systems or organisms is concerned with relationships among quantities . For example, the number of successfully hatched eggs of a certain species of marshland wildfowl might be related to the concentrations of one or more pesticides in the marsh. When the rates of response or change of a variable (such as the number of successfully hatched eggs) to changes in other predictor variables (such as various pesticide concentrations) are estimated by conventional statistical methods, it is assumed that the predictor variables are measured exactly. Unfortunately, environmental, biological or psychological variables rarely are measured exactly. If predictor variables are measured with error, the conventional estimates of response rates are biased. To correct for such bias, errors-in-variables regression models assume that each measurement is the sums of the true value of the quantity being measured and a random error. In this case, the proportion of the unit-to-unit variability of a measurement that is due to the variability over units of the true value, called the reliability of the measurement, can be used to correct the bias of the conventional estimates of response rates. When several variables are simultaneously used as predictors of another, more than the reliability of each individual predictor is required for this purpose; it is the reliability of the measurements of the predictors as a whole, or ensemble, that must be ascertained. Because the particular collection of predictors may not have been used before, information about the reliability of the predictors as an ensemble must be pieced together from a variety of sources. The investigator's research is concerned with how best to obtain and combine information from data summaries of prior studies that used some (but not necessarily all) of the predictor variable, data from the current study and expert opinion to obtain the required reliability information and correct bias in the conventional estimates of response rates. Also studied are ways to determine and summarize the accuracy of the resulting estimates. Insights and methodology from this research has broad applicability to questions of combining information from several small studies concerning the interrelationships among variables, not all of which appear in every study. One of the products of the research is computer software that allows information to be combined from several studies on the same subjects (or environmental locations) and then displayed so as to give the relative likelihoods of various statistical models in the light of the evidence presented by the data itself.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
9504924
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
1995-07-15
Budget End
1999-06-30
Support Year
Fiscal Year
1995
Total Cost
$135,000
Indirect Cost
Name
University of Pittsburgh
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213