This is a collaborative research effort bringing together expertise of Lise Getoor, University of Maryland College Park (0937094), Alex Pang, University of California-Santa Cruz (0937073) and Lisa Singh, Georgetown University (0937070).

In today's linked world, graphs and networks abound. There are communication networks, social networks, financial transaction networks, gene regulatory networks, disease transmission networks, ecological food networks, sensor networks and more. Observational data describing these networks can often times be obtained; unfortunately, this graph data is usually noisy and uncertain. In this research, we propose a formalism which allows us to capture and reason about the inherent uncertainty and imprecision in an underlying graph. We begin by proposing probabilistic similarity logic (PSL), a simple, yet powerful, language for describing problems which require probabilistic reasoning about similarity in networked data. We also introduce the notion of visual comparative analysis of PSL models derived using different evidence and assumptions, and illustrate its utility for the analysis of graphs and networks.

Dealing with noise and uncertainty in complex domains, and conducting comparative analytics are core capabilities required for the Foundations on Data Analysis and Visual Analytics (FODAVA) mission. This research focuses on integrating representation, comparative analysis and visualizations methods into an open source toolkit that supports the representation, comparison and visualization of PSL models. In addition to producing the toolkit, the research team is working with researchers in a variety of interdisciplinary domains to validate the utility of our approach, and also developing tutorial and training materials for the tools.

The key broader impact of the work is that the methods for reasoning about sources of noise and uncertainty in graphs, and understanding their impact on results are general and fundamental to the intelligent analysis of today's rich information sources. Results, including open source software will be distributed via the project Web site ( www.cs.umd.edu/projects/linqs/fodava/ ).

Project Report

In today’s linked world, graphs and networks can be seen everywhere – communication networks, social networks, disease transmission networks, ecological networks, etc. While data about these networks can often be obtained, the graph data is generally noisy and contains uncertainty. In this research, we developed complementary approaches for comparing and understanding graphs containing uncertainty. First, we developed a mathematical framework called Probabilistic Soft Logic (PSL) that incorporates a modeling language for capturing and reasoning about uncertainty in graphs. PSL allows users to easily define templates for graphical models called hinge-loss Markov random fields (HL-MRFs). We have designed new scalable algorithms for inference and learning of HL-MRFs. We have also developed three different learning algorithms, each of which has different objectives and advantages. On two fundamental graph problems, node labeling and link prediction, we showed that HL-MRFs can outperform their ubiquitous, discrete counterparts, Markov random fields, in one percent of the running time. To complement our work on PSL we have developed novel ways of identifying and visualizing graphs with uncertainty. For example, we developed a query language designed specifically for comparing uncertain graphs, where uncertainty can exist at the graph element level, the subgraph level, or at the graph level. This query language contains a number of operators including neighborhood operators and semantic path similarity operators. The implementation also uses a novel service-oriented architecture that allows for simple operators to be used as building blocks for more complex one. We have developed a number of novel ways for visualizing graphs with uncertainty. These involve different layout techniques such as bullseye and comparative column displays, extensions to parallel coordinates and scatter plots, and utilizing empty spaces by using a Voronoi tessellation layout. To produce layouts that capture qualities of both the graph structure (e.g. edge connectivities) and similarities of node attribute values, we developed a new transfer function interface that allows the user to emphasize one or the other property. We also developed visual elements to complement textual presentation of diagnostic test accuracy. Our goal is to help lay users understand and reason with uncertainty in order to make better decisions. Finally, we also developed two visual analytic tools, G-Pare and Invenio-Workflow. G-Pare focuses on comparing the output of machine learning algorithms that predict node labels in a graph, e.g. predict political affiliations of people in a social network or predict the sex of animals observed in the field. G-Pare provides several views that allow the user to obtain a global overview of the algorithms’ outputs, as well as focused views showing subsets of nodes. This visual analytic tool allows algorithm developers to analyze places where two algorithm predictions agree and disagree. Invenio-Workflow is a more general graph visual analytic tool that incorporates our uncertain graph query language and other descriptive data mining algorithms, e.g. clustering. It emphasizes the data exploration process, helping scientists who are less familiar with data exploration to set up workflows to better understand and compare graph data sets. One application of these algorithms, tools, and operators that we investigated was observational scientists studying animal societies. Specifically, we worked with scientists on the Shark Bay dolphin research project to investigate questions about changes in animal sociality over time, similarities of local animal communities to the general animal population, differences among animal subgroups across locations, and observation bias across researchers, to name a few. Outcomes Summary: Development of open-source PSL algorithms Development of visual analytic software (G-Pare) Development of open-source visual analytic tool (Invenio-Workflow) Development of techniques for displaying and visual comparison of attribute uncertainties. A query language for uncertain graph comparison that is based on SQL Application of tools and techniques to other domains, including the Shark Bay Dolphin research project. Along with the four software toolkits and applications, we have produced over 20 scholarly, peer-reviewed publications, including a best student paper award, in the areas of visual analytics, visualization, machine learning, data mining, and databases. We have also given talks in the data mining and machine learning community about our visual analytics research to help spearhead additional research in those sub-disciplines. All software is available open-source. Additional information is available through our project page at: http://avis.soe.ucsc.edu/fodava/index.htm.

Project Start
Project End
Budget Start
2009-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$199,308
Indirect Cost
Name
Georgetown University
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20057