Most research questions in social science involve complex interactions between individual behaviors and the social contexts. For this purpose, linear and generalized linear mixed effect models have been widely used. Social science applications of mixed models often start with a large number of variables. Through assessing the significance of each variable, researchers select the appropriate model. Hence in social science applications, variable selection is an integrate part of mixed effect modeling. However, due to the large number of parameters, the traditional variable selection procedures, such as AIC and BIC, are computationally infeasible. Fan and Li (2001) proposed a class of variable selection procedures via nonconcave penalized likelihood (SCAD). The SCAD penalty has an Oracle property such that the estimators based on the SCAD penalty converge to the true model. The investigators propose to extend the ideas of Fan-Li and study the variable selection procedures via SCAD for linear and generalized linear mixed effect models. The work not only contributes to the estimation and computation of mixed effect models, but also adds to the theoretical understanding of them. The investigators plan to develop variable selection tools for applied researchers to simultaneously select variables and estimate parameters in the framework of mixed effect models.
Measuring students' achievement and its determinants has been one of the central interests in educational research. In this area, data and research questions are usually hierarchical by nature. The typical data structure often seen in educational assessment is that students are nested within schools; sometimes more levels are involved, such as schools nested within geographical areas. It is important to address the impacts of different types of curriculums, availability of resources on individual student's academic performance, etc. For example, is a school's financial program important? Does school policy for parental involvement play a critical role regarding students' assessment? Such features of educational research have made hierarchical linear/generalized linear models the most important statistical tools in this field. Many such models start with a large array of explanatory variables (the National Assessment of Educational Progress (NAEP) has hundreds of variables at both teacher and school levels) and it is of particular interest for researchers to find the significant variables and estimate how important they are. The technique developed in this proposal will answer accurately which variables of teacher and school levels are important and to what degree they are so.