Predictive models that generate estimate probabilities for medical outcomes have become widely used in health services research, in health policy, and increasingly, for the assessment of health care and for real-time decision support. Logistic regression models for medical events are central to most probabilistic predictive clinical decision aids and are fundamental to comparative analyses of medical care based on risk-adjusted events. In such applications, inaccurate assessment of patient risk can have significant health care and health policy implications. New computer-based modeling techniques including generalized additive models, classification trees, and neural networks may potentially capture information that regression methods may miss or misrepresent. However, these methods use very local information in model construction and may be overfit to the sample data and thus not transport well to new settings. In years 1-3, we investigated the relative accuracy of predictions made by these modeling methods under a variety of data structures, including the presence of outliers and missing data. For many of these data structures we found that the more """"""""local"""""""" procedures frequently did not generalize to new test data as well as traditional regression methods. However, our results suggest that as sample size and data complexity increases the performance of these procedures may substantially improved. Thus, to test these findings under more general conditions, we now propose two additional years of research to 1) rigorously assess the relative predictive performance and transportability of other new innovative modeling methods and of original hybrid model construction methods; 2) systematically investigate the relative predictive performance and model transportability of modeling methods applied to large and complex data structures; and 3) explore and assess procedures for handling outliers and missing data for classification trees and neural networks. The completion of the proposed work will result in the first systematic exploration of the factors affecting the predictive performance of the major modeling methods used to predict medical outcomes, and the comparative performance of models constructed by these methods on the extremely large data sets of the type that are becoming increasing available to researchers.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM005607-05
Application #
6185210
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Florance, Valerie
Project Start
1995-01-01
Project End
2002-08-31
Budget Start
2000-09-01
Budget End
2002-08-31
Support Year
5
Fiscal Year
2000
Total Cost
$270,618
Indirect Cost
Name
Tufts University
Department
Type
DUNS #
079532263
City
Boston
State
MA
Country
United States
Zip Code
02111