This project focuses on the development of data representation strategies when the available data are given in (or have been first reduced to) the form of numerically specified proximity information between the objects from a set under study. The representation methods to be emphasized all involve discrete structural models that have at their basis the requirement for identifying some optimal form of object grouping, partition, or object sequence. The fitting of these combinatorially based models can be approached through a general dynamic programming paradigm (GDPP) that allows the construction of recursive optimization routines to solve the wide range of combinatorial tasks that underlie the various representation structures of interest.

This research program will develop and extend the GDPP in four broad topic areas: (a) the placement and evaluation of the results obtained from the use of the GDPP within a statistical context; this will include defining the diagnostic influence of particular objects and the (lack of) uniqueness that may be present in the identification of a specific globally optimal result; (b) the incorporation of alternative measures of fit to be optimized; this will include alternatives to traditional least-squares loss functions and the incorporation of optimal scaling either at the level of the given proximity information, or earlier and before the proximity calculations are taken; (c) generalizations of the formal idea of an ultrametric which characterizes the structure fitted by any hierarchical clustering technique; this will include alternatives allowing partitions in a hierarchy to be less than perfectly nested, or that weaken in other ways the condition of hierarchical representation but which still allow convenient graphical representation; (d) evaluating heuristic extensions of the GDPP when faced with object sets too large to guarantee, within reasonable computational limits, absolute optimality for the identified structures; this will include comparisons to several other widely used heuristic combinatorial optimization approaches, such as iterative (local) improvement methods.

Agency
National Science Foundation (NSF)
Institute
Division of Social and Economic Sciences (SES)
Type
Standard Grant (Standard)
Application #
9814007
Program Officer
Cheryl L. Eavey
Project Start
Project End
Budget Start
2000-01-15
Budget End
2002-12-31
Support Year
Fiscal Year
1998
Total Cost
$130,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820