Data clustering is a widely used tool for organizing data into coherent groups that correspond to the underlying structure in data. In many applications, incorporating domain knowledge into clustering can help enhance both the quality and the utility of the results of clustering. Unfortunately, users who are not data mining experts currently lack effective means of providing such input to guide clustering. Against this background, Dr. Xiaoli Fern of Oregon State University seeks to develop a novel class of algorithms that take advantage of active learning strategies to interactively elicit information from users to drive clustering.

An important aim of this work is the identification of types of input e.g., in the form of must-link and cannot-link constraints, that are both informative and easy to interactively elicit from users to improve the quality and utility of the results of clustering. The study is driven by and evaluated using exploratory data analysis tasks that arise in several application domains (1) ecosystem informatics e.g. exploratory analysis of in-field bird recordings; (2) human-computer interaction (HCI) e.g., analysis of HCI data to understand user behavior; and (3) plant genomics in collaboration with scientists with expertise in each of these domains.

Improved tools for interactive exploratory data analysis benefit a broad range of applications including most areas of science in which such analysis is beginning to play an increasingly important role in extracting knowledge from data. For example, in ecological informatics, such tools can help scientists to better understand the impact of environmental changes on bird species which in turn can help develop better methods for managing ecosystems. Research-based education and training opportunities offered by this project help prepare a new generation of researchers and practitioners in exploratory data analysis as well as the emerging area of Ecosystem Informatics at Oregon State University. Dr. Fern's outreach efforts are aimed at helping draw female undergraduates and K-12 students from under-represented groups to careers in computer science and engineering. Further information on this project can be found at http://web.engr.oregonstate.edu/~xfern/CAREER

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1055113
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-01-15
Budget End
2016-12-31
Support Year
Fiscal Year
2010
Total Cost
$565,000
Indirect Cost
Name
Oregon State University
Department
Type
DUNS #
City
Corvallis
State
OR
Country
United States
Zip Code
97331