The Problem: Large-scale social media and social interaction data, such as tweets, blogs, discussion forums, are becoming increasing available. The patterns of information diffusion across social networks and social media are generally hidden. Modeling these historical social interactions, promises great potentials for the understanding and optimization of information diffusion in social networks. Such models also have practical impacts such as. promoting activities in health care discussion forums and accelerate the dissemination of ideas in scientific communities. However, most previous approaches for social network analysis focus on qualitative and macroscopic explanatory analysis of the network behavior, rather than quantitative and microscopic predictive models. It is difficult to make use of these models for subsequent optimization and management of information diffusion. Thus there is a great need for a robust and predictive modeling framework leveraging the large-scale historical social interaction data and can adapt to the complexity and heterogeneity of social interactions. , Aim: The goal of this project is to develop a set of robust machine learning methods for modeling and optimizing the information diffusion processes, based on the complex and noisy interaction data. It consists of a pipeline of four components: (i) develop a novel probabilistic framework for modeling and reasoning about cascades of events in social networks;(ii) develop nonparametric kernel methods to capture the complexity and heterogeneity of social interaction;(iii) develop efficient online/batch optimization algorithms fr estimating the diffusion models from large datasets;and (vi) optimize information diffusion and promote social interaction using the predictions of the estimated models. Technical Innovation and Merit: We will make novel use of event history analysis typically used for medical data analysis in the social network context. This provides us a principled and over-arching framework for addressing all four aspects in our project. The combination of event history analysis and kernel methods also reveals the connection between the information diffusion modeling problem and the grouped lasso statistical estimation problem, allowing us to bring in recently developed sparse recovery theory into social network problems such as discovery of information diffusion channels and formally study the conditions and statistical guarantees for such recovery.
Our proposed research has wide-ranging applications in health discussion forum;TuDiabetes which is operated by the Diabetes Hands Foundation will be a testbed. This project has the potential to improve the engagement of people in the discussion forum and foster better social goods for diabetes patients. The proposed research also bring together several research areas, such as event history analysis, kernel methods, graphical models, and sparsity recovery theory, to study social network problems.