The Problem: Large-scale social media and social interaction data, such as tweets, blogs, discussion forums, are becoming increasing available. The patterns of information diffusion across social networks and social media are generally hidden. Modeling these historical social interactions, promises great potentials for the understanding and optimization of information diffusion in social networks. Such models also have practical impacts such as. promoting activities in health care discussion forums and accelerate the dissemination of ideas in scientific communities. However, most previous approaches for social network analysis focus on qualitative and macroscopic explanatory analysis of the network behavior, rather than quantitative and microscopic predictive models. It is difficult to make use of these models for subsequent optimization and management of information diffusion. Thus there is a great need for a robust and predictive modeling framework leveraging the large-scale historical social interaction data and can adapt to the complexity and heterogeneity of social interactions. , Aim: The goal of this project is to develop a set of robust machine learning methods for modeling and optimizing the information diffusion processes, based on the complex and noisy interaction data. It consists of a pipeline of four components: (i) develop a novel probabilistic framework for modeling and reasoning about cascades of events in social networks;(ii) develop nonparametric kernel methods to capture the complexity and heterogeneity of social interaction;(iii) develop efficient online/batch optimization algorithms fr estimating the diffusion models from large datasets;and (vi) optimize information diffusion and promote social interaction using the predictions of the estimated models. Technical Innovation and Merit: We will make novel use of event history analysis typically used for medical data analysis in the social network context. This provides us a principled and over-arching framework for addressing all four aspects in our project. The combination of event history analysis and kernel methods also reveals the connection between the information diffusion modeling problem and the grouped lasso statistical estimation problem, allowing us to bring in recently developed sparse recovery theory into social network problems such as discovery of information diffusion channels and formally study the conditions and statistical guarantees for such recovery.

Public Health Relevance

Our proposed research has wide-ranging applications in health discussion forum;TuDiabetes which is operated by the Diabetes Hands Foundation will be a testbed. This project has the potential to improve the engagement of people in the discussion forum and foster better social goods for diabetes patients. The proposed research also bring together several research areas, such as event history analysis, kernel methods, graphical models, and sparsity recovery theory, to study social network problems.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Marcus, Stephen
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Georgia Institute of Technology
Other Domestic Higher Education
United States
Zip Code
Yan, Junchi; Cho, Minsu; Zha, Hongyuan et al. (2016) Multi-Graph Matching via Affinity Optimization with Graduated Consistency Regularization. IEEE Trans Pattern Anal Mach Intell 38:1228-42
Perry, Thomas Ernest; Zha, Hongyuan; Zhou, Ke et al. (2014) Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. J Am Med Inform Assoc 21:e136-42
Du, Nan; Liang, Yingyu; Balcan, Maria-Florina et al. (2014) Learning Time-Varying Coverage Functions. Adv Neural Inf Process Syst 2014:
Daneshmand, Hadi; Gomez-Rodriguez, Manuel; Song, Le et al. (2014) Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm. Proc Int Conf Mach Learn 2014:793-801
Du, Nan; Liang, Yingyu; Balcan, Maria-Florina et al. (2014) Influence Function Learning in Information Diffusion Networks. Proc Int Conf Mach Learn 2014:2016-2024
Farajtabar, Mehrdad; Du, Nan; Rodriguez, Manuel Gomez et al. (2014) Shaping Social Activity by Incentivizing Users. Adv Neural Inf Process Syst 27:
Du, Nan; Song, Le; Gomez-Rodriguez, Manuel et al. (2013) Scalable Influence Estimation in Continuous-Time Diffusion Networks. Adv Neural Inf Process Syst 26:3147-3155
Li, Liangda; Zha, Hongyuan (2013) Dyadic Event Attribution in Social Networks with Mixtures of Hawkes Processes. Proc ACM Int Conf Inf Knowl Manag :1667-1672