The goal of this project is to analyze some existing statistical methods in public health, important relevant ones from other fields, and develop new ones to facilitate the analysis of available data. Opportunities appear ripe in at least three areas. First, this project will develop models to assure the consistency of estimates of epidemiological data within diseases, on measures of incidence, prevalence, mortality and disability, as well as across diseases. A statistical model will be added to the existing approach, which is based solely on deterministic algorithms. Compositional data models will also be employed to ensure logical consistency across the set of variables measuring deaths across causes. Second, this study will explore logistic regression, the most commonly used method in epidemiology and much of public health. Although apparently unknown in the applied literature, when the sample sizes are less than about 2,000 and there are more zeros than ones, logistic regression is biased in predictable directions and is correctable. The bias is large enough to make an important difference in drawing substantive conclusions. Monte Carlo, analytical, and empirical attacks on the problem are proposed. The use of more sophisticated models for binary dependent variables will also be considered, as such models have been shown in other fields to perform far better than logistic regression. The final component of the study will extend methods for ecological inference, the estimation of individual-level relationships when only aggregate data are available, to the types of data and problems common in public health. The development of this methodology is intended to improve estimates of health characteristics in subgroups of populations; for example, ecological inference may be used to develop comparisons of health status in urban versus rural populations where these data are not available directly.
Showing the most recent 10 out of 74 publications