The proposed project will develop and evaluate statistical methods for the analysis of multi-level data when the response is a binary attribute or a count of events. These methods are needed because many outcome variables of interest to social scientists and epidemiologists are binary or discrete counts and because practically all social and health surveys rely on multi-stage clustered samples. Ignoring clustering leads not only to overly optimistic estimates of precision, but can also induce serious biases in the parameter estimates themselves. Moreover, there is growing interest in understanding family and community effects on health outcomes. In these circumstances the hierarchical nature of the data is of primary interest.
The specific aims of this project include (1) developing and evaluating exact maximum likelihood procedures, using numerical integration techniques, for two and three-level variance-component models, where the level of the response depends on the context; (2) developing and assessing improved approximate estimation procedures for more complex random-coefficient models where, in addition to the response level, the effects of observed covariates depend on the context, a situation where numerical integration has so far proved intractable; and (3) assessing the performance of recent Bayesian procedures that avoid the need for numerical integration by taking samples from the posterior distribution of the parameters using Gibbs sampling. The study will use extensive Monte Carlo simulation to establish the properties of the estimators in realistic demographic and health settings. The computing tools developed as part of this project will be made freely available to the research community to allow routine estimation of community, family and other clustering effects on a variety of health outcomes.