Traditional approaches to pattern recognition require access to labeled training data, consisting of known instances of each class under consideration. Such methods are known as "supervised" because the training labels are assumed to be correct. In many pattern recognition applications, however, precise label information is difficult or impossible to obtain. This research examines classification tasks involving contaminated training data, wherein training examples for some or all classes of interest are contaminated by examples of other classes. Applications include document classification, nuclear nonproliferation, network intrusion detection, drug design, and image and video annotation. When standard classification algorithms are applied in these settings, suboptimal classifiers result. Unfortunately, there currently exists no satisfactory theoretical or methodological framework that simultaneously addresses such classification problems characterized by contaminated data.

To address this shortcoming, this research develops a novel framework for the decontamination of mutually contaminated probability distributions, together with associated estimation and classification methods. The decontamination strategy involves projecting observed distributions onto the convex hull of other observed distributions, and recovering the true distributions of interest from the residual of this projection. The projection is with respect to a statistical distance known as the separation distance, and under sufficient conditions on the amount of contamination and purity of the underlying class distributions, this projection procedure successfully recovers the class distributions. This in turn facilitates the design of optimal classifiers. The project examines in detail the problems of classification with noisy labels, anomaly detection, crowdsourcing, semi-supervised learning, domain adaptation, transfer learning, multiple instance learning, and learning from partial labels.

Project Start
Project End
Budget Start
2014-08-01
Budget End
2018-07-31
Support Year
Fiscal Year
2014
Total Cost
$498,210
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109