The research objective of this project is to develop efficient regression algorithms for fitting models that are to be used to guide subsequent decisions. Linear regression algorithms will be designed for static models that support repeated decisions as well as for linear time series models that support dynamic decisions. Logistic regression algorithms will be designed to accommodate use of discrete choice data. The algorithms will take as input decision objectives in addition to data and a model specification.

If successful, new algorithms resulting from this research project will enhance, relative to more standard regression algorithms, the quality of decisions made using regression models when selected features do not perfectly capture relationships present in systems that generate data.

Project Report

A common practice in decision analytics is to build a model from historical data using a statistical learning method such as linear or logistic regression or principal component analysis and then to optimize decisions using the resulting model. This process typically treats estimation of model parameters separately from their use to make decisions. In the case of linear regression, for example, estimation is carried out via ordinary least squares without consideration of the decision objective. The resulting model is then used to optimize decisions. When the regression model does not perfectly capture relationships that generated the data, it is often beneficial to account for the decision objective in the estimation process. This project has produced new approaches to linear regression and principal component analysis that treat model estimation and decision making in an integrated manner. Extensive computational results demonstrate the benefits to decision quality. The work on directed linear regression has appeared in the proceedings of the conference on Neural Information Processing Systems (NIPS), while the work on directed principal component analysis is reported in a paper under review for publication in the journal Operations Research. The latter work required mathematical and algorithmic developments, which as a by-product led to a new estimation method for factor models that improves on prior art for a broad range of pure estimation problems. This is reported in the paper Learning a Factor Model via Regularized PCA, which has appeared in the journal Machine Learning. The body of work described above has produced tools that may benefit a wide variety of application domains in which large quantities of data are available and used to inform decisions. Examples include electronically mediated transaction systems (e.g., search engines, recommendation systems, pricing and revenue management), health care, smart electric grids, and financial markets. It also serves as a starting point for further research on models and methods for integrating estimation and decision-making in value-generating ways.

Project Start
Project End
Budget Start
2010-05-01
Budget End
2013-04-30
Support Year
Fiscal Year
2009
Total Cost
$330,000
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304