The objective of this project is to develop in-depth understanding of the nature, underlying models, and dynamics of Social Network Services (SNS) with millions and even billions of users. Methods for the analysis of massive SNS graphs using signal processing and information-theoretic techniques are being designed to answer fundamental questions regarding SNS. The goal is to develop new approaches for (i) identifying influential nodes in SNS networks, and understanding how these nodes evolve and interact over time, (ii) inferring people?s social graphs from SNS graphs, and (iii) inferring properties of an SNS network from analyzing another SNS graph. To achieve these objectives, signal-processing inspired and information/graph theoretic approaches are being employed for: (a) projecting information contained in massive SNS graphs onto compact representations; and (b) understanding how different graphs (e.g., representing different time-instances of a social network service or representing different services) relate to each other.
This research enables a spectrum of novel applications that are currently impossible. A deeper understanding of the structure of social networks and how that structure evolves can be applied to a variety of social issues. The project is interdisciplinary in nature and it bridges several different communities including electrical engineering, computer science, and social science; and it fosters interaction and communication among them. To promote education and learning, this project actively engages high school, undergraduate, and graduate students, especially students from under-represented minorities.
Transitivity-based Community Detection Social networks are characterized by the basic attribute of having a large number of "triangles", which is a manifestation of the fundamental observation that if someone has two friends then these two individuals are likely to be friends of each other. This attribute is captured analytically through what is known as "transitivity", which is a measure of the number of triangles in a social network graph. Transitivity is also known as the (global) clustering coefficient of a graph. Communities are often modeled as cliques, corresponding to dense graphical structures with maximum connectivity. A clique represents a social network where everyone knows everyone else. However, in reality, a typical social network does not represent a perfect clique. In other words, intra-community connectivity can diminish, making the detection of communities even more difficult. During the past years, there have been extensive efforts in studying community structure of social networks.In these studies, high transitivity — or high clustering coefficient — has been known as one of the main attributes of social communities. Clearly, using transitivity for finding communities requires a new kernel for spectral graph clustering which was the focus of this work. This work extended our previous efforts in employing transitivity attributes of graphs for social network analysis. Specifically, here we focused on the problem of network community detection, which is still among the most fundamental problems in social network analysis. We propose spectral analysis of the transitivity gradient matrix and compare our framework to the modularity based community detection, which represents the leading approach for community detection. Previously, we showed that the transitivity attributes of social networks can be analyzed within the resolution of individual links, helping analysts in bridging concepts from the micro- and macro- levels of social network analysis. Under this more recent work, we showed that for the problem of network community detection, a key advantage of using transitivity is that it quantifies the degree of community structure independent of the number and sizes of clusters or communities within the network. We employ a Gaussian mixture model in the spectral domain of the proposed transitivity gradient matrix for modeling communities. Performance of the proposed method is compared to the state-of-the-art modularity based community detection over randomly generated networks with social network characteristics such as scale-free degree distributions and high clustering coefficients. We showed that the proposed transitivity based approach for community detection can outperform the state of the art modularity based community detection. Denoising of Social Network Graphs Social networks can contain a great deal of "noise" that can impact the analysis of such networks in a very negative way. For example, massive social networks, such as Facebook or Linkedin, may contain a very large number of links that do not represent true friendships or professional relationships. This highlights the need for robust new approaches to "denoise" social networks. Under this work, we developed a Partial Differential Equation (PDE) based diffusion framework for social network graphs denoising. In particular, we applied diffusion onto network graphs leading to an efficient algorithm for removing anomalous structures from the network topology. The driving force for the diffusion process in our work is the clustering propensity that exists in real social networks. The proposed diffusion enhances the boundaries of communities of which makes it a suitable step prior to community detection. Consequently, we studied the impact of the proposed denoising framework on the problem of community detection. We showed clear improvements in the accuracy of community detection due to the proposed denoising approach. As part of this work, we developed a novel and efficient method for processing and denoising the complex structure of social networks. Our method works by removing the set of sparse bridges between well-connected network subgraphs and strengthening the interior structure of the network communities. The proposed diffusion-based graph denoising algorithm can be used prior to community detection to enhance the detection performance or directly used to partition the graph into fully-connected partitions as a standalone partitioning tool. Some advantages of the proposed diffusion-based graph partitioning is its low-complexity and handling of multi-community networks with variable community sizes.