Improving the accuracy and scope of categorical regression modeling performance would be invaluable to alcohol-related and the health care research communities. Epidemiological models that provide better statistical significance and predictive performance enhance the researcher's ability to identify new patterns of alcohol-related symptoms as well as improve analysis of medical or psychiatric conditions in existing databases. Our Phase I theoretical research and empirical results demonstrated that substantial performance improvements in categorical regression models could be achieved by using maximum likelihood (ML) methods to improve data representation (recoding) schemes for user selected explanatory variables (continuous, nominal). A new statistical test for comparing competing recoding strategies and robust methods for standard error estimation in categorical regression models were also developed and evaluated. During Phase II, Martingale Research will develop software that incorporates Phase I findings and utilizes techniques from pattern recognition, classical statistics, and nonlinear optimization to exploit structural relationships between explanatory variables and categorical outcomes in a principled manner. Phase II research will also demonstrate that ML recoding yields new information regarding explanatory variables and improves reliability and validity for categorical regression models. The advanced statistical software package will be delivered to NIH/NIAAA for use in epidemiological and health-related research.
Martingale Research corporation intends to implement ML recoding for categorical regression analysis into a commercial software package. The software is developed to improve the overall performance of categorical models designed to explain data frequently encountered in the health care field. This approach applies as well to other industries that utilize categorical modeling to do financial prediction, risk analysis, and information interpretation and management.