Supervised machine learning approaches to building predictive models from data traditionally rely on labeled samples. However, in many real-world applications, samples are either unlabeled or labeled imprecisely labeled, i.e., labels are often ambiguous, conflicting, or incomplete. This presents the problem of learning predictive models under imprecise supervision.
This project aims to develop effective algorithms to address three different scenarios that lead to imprecise supervision (1) multiple labelers with varying expertise are employed to annotate samples; (2) annotated labels are associated with a set of samples instead of an individual sample; (3) annotations are derived by modeling multiple expert assessments. The project introduces a general bi-convex programming, minimax optimization, and multi-objective optimization based framework for learning predictive models from imprecisely labeled data. The resulting algorithms will be evaluated on a number of real-world applications.
Broader Impacts: The results of this research are likely to impact a range of biomedical applications, including medical image labeling, longitudinal behavioral studies, genomics, and drug safety. The project offers enhanced opportunities for curriculum development and research-based advanced training of grauduate amd undergraduate students in machine learning and its applications. Dissemination of open source software implementation of algorithms resulting from the project also contribute to the project's broader impact. Additional information about the project can be found at: www.labhealthinfo.uconn.edu/home/MachineLearning.jsp