Across many scientific disciplines, the availability of very large amounts of data is creating a paradigm shift. The goal of this project is to develop Scolopax, a tool for finding interesting patterns in classification and prediction models trained on large high-dimensional data. These patterns can capture previously unknown relationships between the variables of a complex process, hence are essential for exploratory analysis and scientific discovery.

This project explores several research directions to lay the foundations for Scolopax: (1) Design of a new query language that can express all common pattern search preferences. (2) Algorithms for learning a query so that even non-technical users can formulate non-trivial queries through an interactive process. (3) New rewrite rules and efficient data management approaches to automatically transform queries into fast implementations on a cluster or Cloud. (4) Design of new semi-parametric data mining techniques that are amenable to scalable training, evaluation, and pattern confidence computation.

User-friendly query writing functionality makes Scolopax accessible to scientists and citizen scientists alike. Its planned deployment through popular Web sites, e.g., those hosted by the Cornell Lab of Ornithology, has the potential to enable new scientific discoveries. By letting citizen scientists not only contribute data, but also make their own discoveries using the data, Scolopax also serves as an important enabler and motivator for outreach programs and greater involvement of citizen scientists. For further information see the project web site at the URL: www.ccs.neu.edu/home/mirek/Projects/Scolopax

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1017793
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2010-08-15
Budget End
2014-07-31
Support Year
Fiscal Year
2010
Total Cost
$499,232
Indirect Cost
Name
Northeastern University
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02115