Very few real-life phenomena are ever as simple as A causes B ? a bivariate relationship. Take for example economic forecasting: it is a function of unemployment, consumer confidence, inflation, interest rates, and many other factors. There is not one single variable that can solely predict the state of the economy in the next few months. Similar is true in the study of global warming, in the derivation of gene interactions, in the analysis of customer recommendation systems, and so on. Multivariate relationships are ubiquitous and they have always existed. However, with the growth in sensor technology, whatever the data collection mechanism might be (electronic media, physical devices, etc.); we now have a wealth of data available to study in many domains, small and large. Currently, automated and unsupervised methods often fail once the number of variables (dimensions) grows beyond a dozen or even less; hence visualization techniques for user-assisted analysis play an important role. Responding to this need, this exploratory project develops a novel framework that makes high-dimensional (multivariate) data visualization more accessible to all. It couples powerful data analysis with an intuitive exploration and way-finding paradigm ? akin to a tourist map ? to help users navigate high-dimensional data spaces with ease.

The overall goal of the project is to facilitate intuitive navigation and exploration of high-dimensional data spaces, improving comprehensibility and reducing unnecessary complexity. This is achieved by: (1) unrolling the high-dimensional space into a landscape map; (2) enabling users to navigate the map and local subspaces of the data via an interactive data projection utility controlled by a touchpad interface; (3) allowing users to insert interesting observations (i.e., data projections) into this map; (4) augmenting the map with background overlays depicting informative globally defined data; and (5) conveying the data within a level-of-detail illustrative visualization framework. The system is evaluated and refined via formal user studies, both with domain scientists in interviews and in a crowd-sourced setting over the web.

This novel information visualization approach will provide support to both scientists and casual users to explore high dimensional data spaces in an intuitive navigation paradigm. The project webpage (www.cs.sunysb.edu/~mueller/TripAdvisorND) will be used for results dissemination, including data analysis capabilities within a web-enabled version of the software and also used to invite to participation in evaluation studies. This exploratory research project provides a rich research and educational experience to students.

Project Report

Project director: Klaus Mueller, PhD, Computer Science Department, Stony Brook University Motivation: The growth of digital data is tremendous. Any aspect of life and matter is being recorded and stored on cheap disks, either in the cloud, in businesses, or in research labs. We can now afford to explore very complex multivariate (also called high-dimensional) relationships, with many variables playing a part. But for this we need powerful tools that allow us to be creative, to sculpt this intricate insight from the raw block of data. High-quality visual feedback plays a decisive role here. Yet, when one examines the visualization suites of modern data analysis software, there is very little support for observing these multivariate relationships as a whole. This is a severe drawback since automated and unsupervised methods often fail once the number of variables (dimensions) grows beyond a dozen or even less and so visual support for user-assisted analysis plays an important role. Achievements: Funded by the grant we have devised a framework that makes high-dimensional data visualization more accessible to all. It couples powerful data analysis with an intuitive exploration and way-finding paradigm – a map – to help users to navigate the high-dimensional data spaces with ease. Specifically, we have developed two types of map-based navigation mechanisms, mainly distinguished by the types of objects (or sights) that populate the map. In the first map, the "sights" are interesting data patterns that may indicate trends and relationships, for example, what environmental conditions may cause a warming of the earth’s atmosphere. Here it is often important to see how certain data patterns are related. To navigate this landscape our framework compares high-dimensional space navigation with a sightseeing trip. It decomposes the data exploration activity into five major tasks: 1) Identify the sights: use the map to identify the sights of interest and their location; 2) Plan the trip: connect the sights of interest along a specifiable path; 3) Go on the trip: travel along the route; 4) Hop off the bus: experience the location, look around, zoom into detail; and 5) Orient and localize: regain bearings in the map. We have designed intuitive and interactive tools for all of these tasks, both global navigation within the map and local exploration of the data distributions. In the second map, the sights are the data variables (dimensions) themselves. They form nodes that are connected by a network of edges representing the strength of association – the correlation – between the dimensions. A user then interactively specifies nodes/edges to visit, and the system computes an optimal route, which can be further edited and manipulated in similar ways than the very popular routing tool in Google Maps. The order of variables along the route can be used for side-by-side comparisons of the data patterns of adjacent variables in a different display, called parallel coordinates. This framework can serve both as a data exploration environment and as an interactive presentation platform to demonstrate, explain, and justify any identified relationships to others. A webpage www.cs.sunysb.edu/~mueller/TripAdvisorND offers more detail on the research achieved by ways of this funding. Free web-downloadable software will also soon be available. Demonstration example: Assume a (fictitious) company that would like to analyze the sales strategies of three sales teams, labeled red, green and blue. Each team has 300 sales people. The variables of interest are: # initial leads generated (#Leads), # initial leads won (#LeadsWon), cost expended per such lead (cost/wonLead), and finally # of concrete sales opportunities (#Opps) generated from these won leads. In the figure, on the left the user has zoomed into the landscape of variables and manually specifies a route that seems to capture what is going on – the strategic model of winning the most customers. On the right we see a linked parallel coordinate display with an axes (variable) ordering according to this route. This plot reveals that while the blue team generates and wins fewer initial leads (# leads, # won leads), it expends more funds on each such lead (costWonLead) and this allows them to transform these leads into concrete sales opportunities (#Opp). The other teams take a more shallow approach -- they generate lots of leads but do not spend much money on each and so do not win them over. The results show that this is not a good strategy. Furthermore, there is also much more variation in the red and green sales teams, e. g, thicker bands. There appear to be a few good sales people (upper portion of the bands) but also quite a few ineffective ones (lower portions) which one might want to train or eliminate.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1050477
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2010-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2010
Total Cost
$103,001
Indirect Cost
Name
State University New York Stony Brook
Department
Type
DUNS #
City
Stony Brook
State
NY
Country
United States
Zip Code
11794