Correlated data are very common in health studies. Such data could come from longitudinal studies, community panel surveys, genetic family studies or spatial studies. Typically, linear mixed-effect models are used for modeling continuous response, and generalized linear mixed models are applied to non-Gaussian data. In addition to such likelihood approaches, quasi-likelihood methods based on generalized estimating equations GEE are often used when the distributional assumption is not realistic and not easy to specify. We propose to extend these methods to handle the situations when the covariate effect is non-linear or is not easy to be modeled parametrically. This is similar to generalized additive models, where a smooth curve is used to predict the impact of a covariate on a univariate outcome. The goal of this study is to develop statistical software for correlated data in two areas. The first is the spline smoothing methods for generalized additive mixed models, which combine the semiparametric methods in generalized additive models using smoothing methods and mixed-effect modeling for correlated data. The second is the semiparametric GEE methods, which extend the GEE methods for correlated data with kernel smoothing to model the non-linear impact on health outcome. The research includes statistical methods, algorithm development and application to real health problems. The study requires analytic development on innovative semiparametric statistical methods and algorithm development on computational intensive methods. Currently, there is no software for these areas.
The aim i s to overcome this deficiency and extend the benefits of using smoothing methods to model non-linear covariate effect. The result is a software package, SmoothEffect, for handling correlated data. A comprehensive case study guidebook using problems from longitudinal studies and others will come with the software. Technical reports and simulation studies will also be developed. This study is to develop flexible statistical smoothing methods and softwarefor analyzing correlated data or clustered data such as longitudinal data, panel surveys or spatial data. The focus of interest is to analyze such clustered data where records from the same experimental unit are related and the impact from some predictor on health outcome shows a non-linear smoothing curvature, which is no easy to be parameterized.