The proposed research project develops models for multivariate categorical data by mimicking Gaussian models with a desired model structure that can be captured in terms of the non-parametric concept of conditional independence. This method has a long history: graphical log-linear models can be induced in this way by Gaussian models defined by zero constraints on the inverse covariance matrix. The project seeks to greatly extend the scope of the approach. It is proposed to define and study marginal independence models for contingency tables, discrete-valued time series with moving average-like dependence structure, seemingly unrelated regressions with discrete response variables, and discrete graphical models based on the recently introduced AMP chain graphs and ancestral graphs. The main objectives of the study are development of parameterizations, construction and implementation of efficient algorithms for maximum likelihood estimation, and investigation of procedures for model selection. A particular focus of the project will be on employing modern tools from computational algebra in the analysis of the structure of parameter spaces and properties of likelihood functions.
Multivariate statistical models seek to describe the complex relationships between a large set of variables. A particular class of such models, called graphical models, has found wide-spread application in fields like artificial intelligence, bio-informatics, biology, epidemiology, and speech recognition. The models proposed in the project extend the realm of graphical models and it is anticipated that they will be applied in many of these fields. Moreover, the proposed methodology will provide new tools for the analysis of data of public interest such as census data. The researchers also plan to make software tools freely available as part of a larger open source statistical software package called R.