This project develops novel computational approaches and analytical tools to meet the challenges and opportunities for social network analysis brought by the availability of large-scale longitudinal data generated by the usage patterns of modern communication devices, such as cell-phones. This type of data has several key advantages including the fact that it is statistically extensive (coming from millions of users), purely observational (void of any bias induced by obtrusive measurements), and longitudinal (spanning several years). The extent and longitudinal character of such data brings challenges that can only be tackled by an orchestrated multidisciplinary approach invoking social science, physics methods developed for large-scale interacting particle systems, mathematical statistics and data analysis, and computer science methods for data mining, and agent-based modeling.
The project will focus in particular on generating 1) Novel computational and analytic methods for both cross-sectional and longitudinal analysis of large-scale social network data, based on advanced nonlinear time-series methods, community detection algorithms, and probabilistic relational models; 2) Stochastic mathematical models for network behavior coupled across several levels of analysis, including node, dyad, triad and group levels, and 3) A data-driven stochastic individual-based simulation (SIBS) framework with predictive capability for macro-level system behavior, implementing the dynamic models from 2). The SIBS design will allow it to be used as a hypothesis generation multilevel framework for social dynamics, and as an application, it will be employed to uncover the modalities for efficient, targeted spread of information in large-scale dynamic social networks.
The suite of methods developed, and the SIBS with its design transparency and parallelism provide data-driven, feature extraction tools for addressing social science questions, as well as aiding mechanism-design and decision-making in practical situations. In particular, they are expected to directly impact applications both within the commercial (product delivery, health-care services, etc.) and non-commercial (urban planning, emergency alert systems, etc.) domains.
In today’s increasingly interconnected world, the use of communication technologies by the majority of the population generates massive amounts of digital tracks that can be exploited to learn about social networks and human behavior. In this project, we have exploited such a longitudinal dataset involving more than one billion communication events between more than 7 million users in a period of 65 days. The analysis of this real-world social-network dataset presented several challenges, both theoretically and algorithmically. Starting from a dyadic level, we have used a large number of graph theoretical measures to characterize massive social networks, and discover the ways in which they differ from other real-world massive networks (e.g., infrastructure networks). With this study we have been able to confirm previous social network theories, and improve or develop others. In particular we confirmed both the degree-assortative and strongly transitive nature of social networks, and have shown that the typical neighborhood size indeed supports Dunbar’s theory. We presented novel and more accurate, weighted measures to characterize communication reciprocity and connected it with nodal, dyadic and embeddedness features including gregariousness, degree, degree-assortativity and clustering. Using the Taylor’s fluctuation scaling analysis we have shown that while there is considerable variability and specificity in the small subgraphs embedding a node or a dyad, at larger scales (beyond 3 hops) network neighborhoods are becoming increasingly uncorrelated. Centrality measures (betweenness, load, etc.) are used to identify key substructures within social networks and provide a decomposition of a social network into subgraphs of varying structural importance. We have extended this notion by using influence or range limitation, and have shown that centrality measures obey universal scaling laws in large networks (hence in social networks as well) akin to a central limit theorem and exploited this property to provide an efficient/fast algorithm to measure centralities in large networks, where the original algorithms would be computationally unfeasible. We extended our method to weighted networks as well. Using data mining and machine learning methods, we have developed the mathematics and algorithms for both unsupervised and supervised (classification) link prediction in social networks. Link prediction has a wide range of applications from inferring hidden links to charting the most likely structures that a network will evolve into over time. Our methods, validated across several datasets including cell-phone records include both single and multi-relational link prediction and have been extended to heterogeneous and multi-layer networks as well. The latter exploits the notion of transfer learning, that is using information across different types of social networks the individuals are part of. These methods can shed light on the intrinsic mechanisms by which links form and influence one another across different social networks. We have also developed a novel and efficient method for community detection based on random walks. The ultimate test for the understanding of a social network’s structure and dynamics is in our ability to model its behavior and predict features that have not been apriori introduced into the model, but are the result of the mechanisms on which the generative model is based. By adopting the maximum entropy principle approach developed within physics (Jaynes), we have designed a Stochastic, Individual-Based Simulation (SIBS) for social networks. We have shown that the main mechanism for the assortative nature of social networks is the tendency of the individuals towards reciprocity in social communications. Using a single fitting parameter (inverse social temperature) our generative model is able to capture all global structural properties of the real-world social network inferred from cell-phone logs. Broader and translational impact: Many of the algorithms, methods and theories have already been translated and applied by our team to real world networks in other domains, such as climate interaction networks, health-care (biomedical informatics) networks, privacy preserving social networks based medical recommendation systems, international agro-food trade networks, and brain neuronal networks.