Portnoy 9703758: Traditional Statistical Linear Modeling seeks to explain a response variable (e.g., wages) in terms of predictor variables (e.g., education, job training, social characteristics, etc.). The classical approach uses "lease squares" to estimate the mean of the responses conditional on the predictors. It is often found, however, that those with higher responses depend very differently on the predictors than those with middle or lower responses. This variability is completely lost by the classical approach. Thus, modern regression quantile methods have become increasingly popular. These methods seek to estimate the conditional quantiles (percentiles) of the response in terms of the predictors. For example, it has been found that high wages depend much more strongly on education than lower wages. That is, not only are high wages associated with more education, but the rate of return on education is significantly higher for high wage earners than for lower wage earners. Similarly, high electricity consumers during the summer show a much greater difference between daytime peak use and nighttime use than lower users (presumably because of air conditioners), and the length of long hospital stays depends more strongly on the severity of the disease than does the length of shorter stays. This ability to distinguish models on the basis of the size of the response is finding extensive application in economics, social sciences, biostatistics, and other areas. Inevitably, successful continued diffusion of these methods is linked closely to the availability of convenient and efficient software. For modestly large problems, existing algorithms require computational effort comparable to least squares. However, as problem size grows, the computational burden of the previous methods becomes heavy. For large problems common in empirical labor economics, large biomedical surveys, and other areas, new and more efficient computational techniques would be highly des irable. Thus, recent research proposes a two-pronged attack that has been shown to yield dramatic improvements in computational efficiency. One prong is the use of recently developed interior point methods for linear programming. The second is a form of stochastic preprocessing which can drastically reduce the effective size of most statistical regression quantile problems. Together, these ideas appear capable of bringing the computational effort down to the level of least squares for problems with sample sizes up to several million observations. In even larger problems, theoretical evidence indicates that regression quantile methods are even faster than least squares. Practical realization of this theoretical improvement would have significant consequences in the area of high performance computation for large and massive data sets.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
9703758
Program Officer
John Stufken
Project Start
Project End
Budget Start
1997-07-01
Budget End
2002-06-30
Support Year
Fiscal Year
1997
Total Cost
$172,402
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820