This project is concerned with statistical models whose parameter spaces have singularities. The investigator studies how singularities impact the behavior of existing statistical methods and develops new techniques for adequate assessment of statistical significance. The focus is on algebraic statistical models, that is, models that have (semi-)algebraic sets as parameter spaces. The class of algebraic models comprises many of the singular models employed in practice and can be studied using tools from computational algebraic geometry. Importantly, the well-behaved local geometry of semi-algebraic sets makes it possible to obtain general results without having to assume difficult to verify regularity conditions. The statistical techniques under study include classical procedures from likelihood inference such as likelihood ratio and Wald tests as well as information criteria.
Modern scientific studies often require analysis of data on several jointly observed variables. Statistical models of dependence relationships among the different variables are often formulated using additional variables that are not observable (or hidden). A common feature of hidden variable models is that their statistical properties are not entirely understood because of a lack of smoothness properties that makes them irregular. This is the primary motivation for this project that develops theory and methods that have a bearing on problems such as determining the number and type of unobserved variables to be included in a statistical model. Such problems arise in particular in applications in the social sciences where key concepts such as intelligence are not directly observable, and in computational biology where hidden variables are employed, for example, when DNA of present-day species is used to validate evolutionary theories that involve extinct species. More broadly, the work is relevant for any study, medical or otherwise, in which the existence of influential unobserved variables cannot be excluded.
Modern scientific studies often require analysis of multivariate data that concern several jointly observed variables. Many statistical models for the dependence relationships among the different variables are formulated using additional variables that are not observable or, in other words, hidden or latent. Examples include the factor analysis and structural equation models that form key methodology in the social sciences. A common feature of such hidden variable models is that their statistical properties are not entirely understood because the models' parameter spaces lack the smoothness properties that underlie classical results. This project aimed to shed new light on the properties of singular statistical models by exploiting the algebraic structure that is inherent in many of these models. Particular attention was given to models that fall in the general framework of so-called graphical modeling. The outcomes of the project include results on model selection and the behavior of statistical tests in large-sample limits. Moreover, new state-of-the-art criteria were developed for identifiability and, thus, estimability of parameters. The research led to collaborations between statisticians and algebraists. Several Ph.D. students as well as one undergraduate student were involved in specific parts of the project. The outcomes were disseminated through publications in statistical journals, a monograph and an extension package for the R project for statistical computing.