This research focuses on methods of classification in multivariate statistics, studying theoretical, practical, and computational aspects. The investigator explores finite mixture models with an "improper" component, i.e., a component in which each data point has the same density value. This leads to a flexible class of estimators that depend on the setting of a tuning parameter. For the extreme values of the tuning parameter, the resulting methods of estimation correspond to fully parametric maximum likelihood, and to nonparametric likelihood (empirical distribution function), respectively. The estimators are useful for contaminated data, and their performance is compared to traditional robust estimators. The main computational tool is an application of the EM algorithm. In another problem related to finite mixture analysis the investigator studies the question of dimensionality: if interest focuses on a subset of variables measured, should one use only that particular subsetfor the purpose of estimating the parameters of the mixture, or should one use the remaining variables ("covariates") as well? In addition, this research develops the asymptotic distribution theory for maximum likelihood estimators in multivariate models that are usually regarded as untractable by conventional methods (common canonical variates, partial common principal components, the discrimination subspace model, and others), and investigates iterative computational methods needed for estimating their parameters.

Methods of classification play an increasingly important role in areas such as remote sensing, pattern and speech recognition, and taxonomy. The investigator studies methods of estimation and computation in multivariate situations, i.e., when many variables are measured on the same objects. In particular, the finite mixture model used in this research allows us to improve statistical methodology in situations where the data is distorted by errors and outliers. Efficient computational methods developed in this research allow us to exploit this powerful methodology, and to make it applicable to problems in many areas, including biotechnoloy and environmental sciences. Further research topics involve models of allometric growth in biology, improved estimation in unsupervised methods of classification, the development of efficient computational algorithms for recentlycreated multivariate methods of data analysis, and the analysis of periodic phenomena in biology.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
9802522
Program Officer
Joseph M. Rosenblatt
Project Start
Project End
Budget Start
1998-08-01
Budget End
2001-07-31
Support Year
Fiscal Year
1998
Total Cost
$64,205
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401