This project is driven by recent applications of mathematical programming to machine learning, data mining, medical diagnosis and prognosis and include the following problems: FEATURE SELECTION: This is the problem of discriminating between two finite sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible. A step function that appears in the objective of a linearly-constrained mathematical programming reformulation is approximated by a concave exponential on the nonnegative real line. This leads to a very fast iterative linear-programming-based algorithm that terminates in a finite number of steps. Extensions to minimum- support solutions of linear systems and complementarity problems and neural networks with minimal number of hidden units will be investigated. CLUSTERING: The clustering problem of determining k centers in n- space such that the sum of distances of each of m given points to its nearest center is minimized. A 1-norm distance is used in order to formulate the problem as that of minimizing a piecewise- linear concave function on a polyhedral set. A finite successive linear approximation algorithm terminates very quickly at a stationary point which, for breast cancer medical databases, is extremely useful as a mining tool for distinct survival curves. Extensions to clustering around other geometric entities such as planes or convex sets, clustering with feature selection and applications to knowledge discovery in medical databases will be pursued. ROBUST REPRESENTATION: Relations within a database will be modeled in a manner that remains valid when the database changes moderately. A simple linear model shows that if a sufficiently small error is PURPOSELY TOLERATED in constructing the model, then for a broad class of perturbations the model will be a more accurate than one obtained by a conventional zero error tolerance. The effect of feature selection within tolerant training, as well its use to improve generalization in neural networks will be studied.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
9729842
Program Officer
William Randolph Franklin
Project Start
Project End
Budget Start
1998-07-01
Budget End
2002-06-30
Support Year
Fiscal Year
1997
Total Cost
$236,134
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715