New Methods to reduce Bias and Mean Square Error of Maximum Likelihood Estimators

Senchaudhuri, Pralay

Abstract

Categorical outcomes are ubiquitous in biomedical research, and generalized linear models (GLMs) represent the most widely applied methodology for testing associations between categorical variables and fixed investigative factors. Logistic regression in particular is the most frequently used model for binary data and has widespread applicability in the health, behavioral, and physical sciences. King and Ryan (2002) stated that there were 2,770 research papers published in 1999 in which """"""""logistic regression"""""""" was in the title of the paper or among the keywords. King and Zeng (2001) referred to the use of the maximum likelihood method in logistic regression as """"""""the nearly universal method"""""""". Maximum likelihood estimates (MLE) for logistic regression are based on large sample approximations that are reliable for problems with large samples and when the proportion of responses is not too small or too large. However, it has been known for several years that MLE are not reliable for small, sparse or unbalanced datasets, with the latter referring to a considerable difference between the number of zeros and ones of the response variable. Recent research has suggested a flexible means of correcting MLE bias and improving performance using a penalized likelihood-based approach, but the underlying theory has not been fully applied and implemented for practical use. In this project, we will extend the work begun during Phase 1 with logistic regression by (1) implementing the bias correction approach for a variety of other GLM's that include Poisson, multinomial, negative binomial, and censored survival data;(2) provide new diagnostic procedures that identify potential problems with near separability and MLE bias;(3) implement and evaluate an exact target estimation approach for bias correction in logistic regression;(4) improve the computational algorithms required for Aims 1-3;and (5) additionally implement the procedures in a SAS PROC. Given the ubiquity of categorical regression in public health and biomedical research, the final product of this effort will provide a critical intermediate alternative when analyzing data for which standard large-sample methods are unreliable and small-sample exact methods are infeasible.

Public Health Relevance

Generalized linear models (such as logistic regression) for categorical data have widespread applicability in the health sciences. Maximum likelihood, the nearly universal method for computing estimates in generalized linear regression models, has been known to have high bias and mean square error for small, sparse or unbalanced datasets. We propose to develop commercial software that incorporates several new methods that have lower bias and mean square error in logistic regression and other generalized linear models and Cox proportional hazard models.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 9R44GM104597-02A1
Application #: 8394896
Study Section: Special Emphasis Panel (ZRG1-HDM-R (11))
Program Officer: Swain, Amy L

Project Start: 2009-07-01
Project End: 2014-07-31
Budget Start: 2012-09-01
Budget End: 2013-07-31
Support Year: 2
Fiscal Year: 2012
Total Cost: $481,981
Indirect Cost

Institution

Name: Cytel, Inc
Department
Type
DUNS #: 183012277

City: Cambridge
State: MA
Country: United States
Zip Code: 02139

Related projects


NIH 2013 R44 GM	New Methods to reduce Bias and Mean Square Error of Maximum Likelihood Estimators Senchaudhuri, Pralay / Cytel, Inc	$514,659
NIH 2012 R44 GM	New Methods to reduce Bias and Mean Square Error of Maximum Likelihood Estimators Senchaudhuri, Pralay / Cytel, Inc	$481,981

Comments

Be the first to comment on Pralay Senchaudhuri's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: