Longitudinal and cluster study designs with binary data arise frequently in epidemiologic studies [e.g. cardiovascular disease (CVD) and acquired immune deficiency syndrome (AIDS)]. Longitudinal data are particularly useful for assessing change in health status over time and factors that determine those changes; cluster designs are usually the only feasible way to gather large probability samples. However, analyses of longitudinal and cluster binary data frequently ignore the study designs and use either standard methods, which incorrectly assume statistical independence, or cross-sectional methods, which do not take advantage of the longitudinal structure, to analyze these data. There has been little comparison of statistical approaches for the analysis of longitudinal and clustered binary data and this research will examine the usefulness of existing statistical approaches by evaluating alternative methodologies using several CVD and AIDS data sets. These analyses will focus on the interpretation of the covariate effects measured by the different approaches. We will also develop statistical theory to relate the parameters of alternative approaches. Specifically, we propose to: 1) determine which of the existing approaches provide the most precise estimates of covariate effects; 2) investigate the sensitivity of each approach to violations of their underlying statistical assumptions; 3) develop new statistical methods which will provide efficient estimates of covariate effects and are less sensitive (more robust) to violations of assumptions; 4) develop new statistical methods to analyze data gathered in response-selective sampling plans, as well as multivariate time series with binary response. The research questions to be studied by this project have been motivated by statistical problems arising in studies involving the investigators and address many of the issues raised by the 1986 NHLBI workshop on the state of the art of methods for longitudinal data analysis. The results of this research will allow epidemiologists to have advantage of the benefits of longitudinal and cluster study designs; directly estimate the effects of risk factor change on health status and avoid inappropriate analyses or incorrect interpretations. Findings will help to clarify the substantive questions which epidemiologists can address with each approach, as well as why the covariate effects measured by different approaches may be different. The results will provide clear guidelines to the advantages and disadvantages of alternative approaches for longitudinal or clustered binary data.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
First Independent Research Support & Transition (FIRST) Awards (R29)
Project #
Application #
Study Section
Special Emphasis Panel (SSS (C))
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Francisco
Schools of Medicine
San Francisco
United States
Zip Code
Segal, M R; Neuhaus, J M (1993) Robust inference for multivariate survival data. Stat Med 12:1019-31