With hundreds of millions of users worldwide, social networks provide incredible opportunities for social interactions, entertainment, learning, and political and social change. Hence, there is a growing interest in understanding information diffusion over online social networks. Because many social interactions currently take place in online networks, social scientists have access to unprecedented amounts of information about social interaction. Prior to the advent of such online networks, these investigations required resource-intensive activities such as random trials, surveys, and manual data collection to gather even small data sets. Now, massive amounts of information about social networks and social interactions are recorded. This wealth of data can allow social scientists to study social interactions on a scale and at a level of detail that has never before been possible. However, Social scientists are not traditionally trained in techniques to deal with the massive amounts of data produced by online social networks. Computer scientists that specialize in databases and knowledge discovery have experience with querying and analyzing enormous amounts of data in a scalable fashion, but they may not be aware of the types of information that are most relevant to understanding social processes. Moreover, it is often necessary to alter theories and models developed in research of traditional social networks to incorporate new features of interactions in online networks, or even develop entirely new models of social processes. To create and validate these new models requires familiarity with social science techniques for modeling of social interactions, as well as knowledge of techniques for modeling and analysis of complex networks that can scale to the size of millions or even billions of users.
The project brings together an interdisciplinary team consisting of computer scientists with expertise in databases and data mining, network modeling and analysis, and social media led by Dr. Divyakant Agrawal at the University of California-Santa Barbara to develop computational approaches to model and predict a number of important phenomena in social networks: information diffusion, opinion formation, etc. The team is developing new algorithms and analytical and computational tools that can effectively cope with the massive size of social networks and the data produced within such networks. These tools are being designed account for the complex nature of human behavior by incorporating the spatial, temporal, and relationship-based aspects of social interactions. The long-term goals of this project are to develop tools that help better understand social interactions in online networks, to develop reliable and scalable models to predict the outcomes of such social processes, and to create applications that can shape such outcomes. The project advances the current state of the art in: Querying and analysis of massive datasets, Modeling and analysis of complex networks, and Analysis of social media and social interactions, including in particular, the interplay between multiple simultaneous information diffusion processes. This is a high-risk, potentially high payoff research effort due in part to the challenges associated with obtaining and analyzing data from social networks and social media. In order to model social network entities and interactions, the research team needs access to datasets from online social networks to build, verify and validate models. Similarly, discovering information, knowledge, and user behavior in online social networks require access to social network datasets. In general, acquiring such datasets is a significant challenge due to privacy issues and the proprietary nature of many social network sites.
This Early Concept Grant for Exploratory Research (EAGER) project provides a rich set of data repository and evaluation metrics for conducting large-scale investigations involving social networks and social media. The project addresses the data challenge by assembling large data sets from weblog postings and Twitter messages from millions of users, as well as appropriately anonymized data that capture the interactions between participants in social networks such as Facebook. The broad dissemination of the resulting datasets and tools will lower the barrier to (and reduce the risk associated with) entry into Social Informatics and Computational Social Sciences for researchers with diverse backgrounds and expertise. The resulting datasets and tools are likely to stimulate fundamental advances in several subdisciplines within Computer Science including algorithm design, network modeling and analysis, and data mining and knowledge discovery (among others). The project provides unique opportunities for broadening the participation of underrepresented minorities and women in Computer and Information Sciences, and especially those motivated by real-world applications in social sciences (e.g., understanding social interactions). The results of the project will be disseminated through the project web pages at http://cs.ucsb.edu/~dsl/?q=content/data-driven-framework-analyzing-user-interactions-social-media.