The overarching goal of the proposed research is to develop practical modeling tools - including exact regression procedures - for small or sparse samples of correlated categorical data. Such outcomes are common in biomedical research, especially in areas such as genetics, ophthalmology, and teratology. One can encounter correlated categorical data wherever multiple outcomes are measured on an individual over time, or on several different individuals who share common genetic or environmental exposures. A large body of methods has been developed for analyzing correlated categorical outcomes, which conventionally rely on large-sample distributional assumptions (e.g., approximate normality) to justify their inferences. When faced with a small or sparse sample of categorical data investigators have few viable analytic options, and none that allow for exact inferences with regard to estimation. Our proposed work will fill this gap, building on critical recent developments of both appropriate models and computational technology. During Phase I of this project, we will accomplish this by (1) developing an analogue to conditional logistic regression for correlated categorical data; (2) constructing an efficient network graphical algorithm for rapi computation of the exact distribution in Aim 1; and (3) Investigating the feasibility of incorporating these procedures into a SAS PROC. We plan to expand this work in Phase II by incorporating our new tools as a module in the LogXact software package; extending the exact regression procedure to accommodate Poisson and polychromous regression for correlated data; and significantly improving the computational efficiency of these new tools through efficient Monte Carlo sampling and parallel processing. We will also create a module for a SAS PROC, making these methods as widely available as possible to researchers and analysts.
Many biomedical and public health studies make observations that are correlated or related (e.g., when individuals are measured repeatedly over time, or when subjects are sampled from the same family or group). When such samples are small, conventional statistical methods that account for this correlation may be inaccurate. This project will develop new software tools to help investigators more accurately analyze data from studies that involve small samples of correlated data.