In many areas of professional and social life, people form communities based on some obvious feature, so that members tend to be similar in some aspect of interest or behavior. However, as the number of groups expands, new dimensions of similarity are introduced. This project will focus on the ways to understand many different communities that differ in many dimensions and to present that understanding in ways that help to understand the community structure. The modern web provides a number of alternative data sources for discover communities, with activities ranging from blog comments to tagging of resources. This data can be used to discover and model communities reliably and present the complex data in clear and efficient ways that decision makers and other stakeholders can readily understand. The visualization of the complex data will also be interactive, permitting exploration of the data. The knowledge of the existence of these communities can be used to support recommendation approaches, such as identification of mentors or development of wide scientific collaborations.
The purpose of this project was to explore a range of promising approaches to elicit and visualize latent communities based upon various kinds of data about individuals which are now available on the Social Web. While the problem of community discovery and visualization has been explored in the past, the Social Web has introduced a number of new challenges not addressed in the previous research. Most importantly, prior research primarily explored social data based on a single type of connection between users (such as either e-mail contacts, co-authorship, or communication in discussion forums). The increase in information available on the Web and specifically data available in social systems, makes it possible to discover various kinds of connections and various dimensions of similarity between users. This data provides evidence about multiple types of latent groups and sub-communities existing in the larger user community. New approaches are required to reliably find and visualize these ill-defined, frequently overlapping groups. Understanding the inherently noisy nature of social Web data, we approached the problem of community discovery and modeling as a human-computer collaboration problem. In this context, the community discovery and intelligent clustering algorithms are not expected to produce definitive answers. Instead, their goal is provide data for interactive exploration and decision-making by the user. The communication between a computer and a human takes place through interactive visualizations. Using this visualization, humans should be able to explore and manipulate the results delivered by the algorithms. In turn, the manipulations will be utilized by the algorithms to produce results that are informed by both human and artificial intelligence. Our study was designed as two-stage project. This report summarizes the results of the second stage, which focused on developing and evaluating a framework for community mining using multi-dimensional social data as well as on developing highly interactive community exploration and visualization tools. The primary practical result of the project was an innovative community mining and visualization tool called FAVNet that allows the use of diverse multi-layer connection data and offers a range of options for interactive exploration. This tool has been extensively explored and utilized in our project. Using this tool, we performed several large-scale studies that employed social networking data collected during the first stage of the project and demonstrated how FAVNet could be successfully used for community mining with multi-layer data. In one particular study, we established that fusing several dimensions of data is important to deriving a more precise structure of explored communities. This demonstration could be considered as the main empirical result of the project. From the research innovation perspective, the important results of the project are two innovative interactive exploration and visualization approaches for multi-layer communities that we developed during the second stage. The first approach introduced a POI-based interactive visualization for exploration of multi-layer communities. While a traditional node-link approach to community visualization is not interactive and doesn’t allow one to grasp the structure of multi-layer communities, the POI approach offers an alternative way to view multiple community layers with a substantial level of interactivity. In our project, we developed and explored two alternative implementations of this approach, one of which was embedded into FAVNet. The second interactive exploration and visualization approach introduced an innovative way to engage an expert user into an extended interaction with cluster-based community visualization to elicit critical information about community organizations. The information contributed by the user during this exploration is used by a novel interactive clustering algorithm that immediately updates the community structure to make it fit the information provided by the expert and passes the new data to the visualization component. This highly interactive process results in a more reliable depiction of the community’s structure.