Data collected in many social sciences often are characterized by multilevel or nested structures in which a lower-level unit belongs to one and only one higher-level unit; for instance, students attend one and only one school, and patients are treated by one and only one health care provider. Conventional multilevel modeling for nested data is now well understood and frequently applied in different research areas. However, many data structures are multilevel but do not qualify as nested: Students may attend more than one school, and patients may be treated by more than one health care provider. Cross-classified, multiple-membership (CCMM) modeling, a general statistical framework for modeling multilevel, nonnested data, was set forth by Browne, Goldstein, and Rasbash (2001). It has a wide range of potential applications in many research areas, including education, health research and epidemiology, sociology, and human genetics. Though applications of CCMM modeling have started to appear in the literature, the statistical aspects of CCMM modeling have not been investigated extensively, and the practical experiences of model building are still very limited. This dissertation research will evaluate the estimation performance of CCMM modeling and investigate the consequences of ignoring multilevel, nonnested data structures using both real data analyses and Monte Carlo simulation. CCMM modeling will be applied to the Early Childhood Longitudinal Study Kindergarten Cohort (ECLS-K) data to model reading and mathematics growth from kindergarten to fifth grade after incorporating student mobility. Guided by real data analyses, a comprehensive Monte Carlo simulation will be conducted to evaluate the estimation performance of CCMM modeling and consequences of ignoring CCMM data structures under manipulated data conditions that emulate real data structures. User-friendly computer program codes and step-by-step tutorials will be written to facilitate the use of CCMM modeling in applied research.

This research includes not only a state-of-the-art review of advanced statistical modeling for complex data structures and their applications, but also the first systematic investigation of the statistical performance of CCMM modeling for multilevel nonnested data using Bayesian estimation. It will demonstrate the flexibility of CCMM modeling in analyzing multilevel nonnested data and lead to advancements of scientific knowledge regarding appropriate modeling of complex data in real research settings. The real data analyses with ECLS-K will show the applicability of CCMM modeling in applied research and help educators and researchers to better understand how student mobility affects early reading and mathematics development. The Monte Carlo simulation study will provide evidence regarding the statistical performance of CCMM modeling and methodological instructions on CCMM model building for the research community, which eventually will facilitate translating innovative quantitative research methods into rigorous applied research in the social sciences and beyond. As a Doctoral Dissertation Research Improvement award, support is provided to enable a promising student to establish a strong, independent research career.

Project Report

Multilevel data structures are common in all realms of the social sciences as well as many other types of research. The development of powerful statistical packages makes multilevel data analysis more accessible to researchers. Conventional multilevel models for purely nested data structures are now well understood by the research community and being frequently applied in applied research. However, as applications of multilevel modeling increase, researchers realize that data collected in many social settings are multilevel, but often do not qualify as purely nested structures. Therefore, there is a strong need for statistical models that can appropriately model complex multilevel nonnested data structures. In the context of student mobility in educational research, this doctoral dissertation research proposed a reparameterized multiple membership model for multilevel nonnested longitudinal data. The proposed model assumes that school effects on student growth rates are cumulative over time and, therefore, effects of all schools attended by mobile students are weighted to model student growth rates. It has a more tenable assumption than that of the existing cross-classified approach to student mobility and is more parsimonious than the existing cross-classified multiple membership approach to student mobility. In addition to student mobility, the proposed model can be potentially applied to other research situations that involve changing higher-order membership over time. Three studies were conducted to evaluate the statistical performance of the proposed reparameterized multiple membership model. In the first study, the Early Childhood Longitudinal Study Kindergarten Cohort data were analyzed to compare different approaches to modeling student mobility and achievement growth in longitudinal studies. Guided by real data analyses, two Monte Carlo simulation studies were further conducted to evaluate the estimation performance of the proposed model under simulation conditions that emulate student mobility in educational research. Results indicated that Bayesian Monte Carlo Markov Chain estimation can successfully recover various parameters of the reparameterized multiple membership model under various simulation conditions. The intellectual merit of this doctoral dissertation research lies in that (1) it is a comprehensive review of advanced statistical modeling for complex multilevel data structures and their applications and (2) it proposed a reparameterized multiple membership model, a flexible and parsimonious approach to analyze multilevel nonnested longitudinal data. This research not only focused on statistical aspects of modeling multilevel nonnested data, but also provided the research community with empirical examples and methodological instruction. It will have a broader impact on advocating advanced statistical modeling and translating innovative quantitative research methodology into rigorous applied research in the social sciences and beyond.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
1154165
Program Officer
Cheryl Eavey
Project Start
Project End
Budget Start
2012-05-15
Budget End
2013-04-30
Support Year
Fiscal Year
2011
Total Cost
$1,500
Indirect Cost
Name
University of Cincinnati
Department
Type
DUNS #
City
Cincinnati
State
OH
Country
United States
Zip Code
45221