Understanding and analyzing the way our world is connected is a critical but new challenge in today's world, thanks to the technological advances of personal computers, mobile devices, as well as local and global Internet connections. Most current methods in the area of social media analysis, inference and understanding are based on textual data. However, the image data makes an increasingly large proportion of data in social media. Hence, there is an urgent need for tools that can effectively use image data to extract important information to infer patterns and activities of people, communities and society at large.

This project combines advances in computer vision, machine learning, and social networks in novel ways for understanding and analyzing large-scale social media data. The proposal brings together computer vision and machine learning research in novel ways to develop new methods for analyzing large-scale social media data. It pursues 4 inter-related aims: (i) Establishing a large-scale visual concept ontology and structures for the web-image world via crowdsourcing, taxonomy induction, and nonparametric learning methods; (ii) Understanding activity in social networks by analyzing image contents in the context of social media in large-scale and with connectivity; (iii) Inferring the structure of social networks and communities from image contents and activity of individuals in social networks; (iv) Discovering and analyzing dynamic social media trends.

Anticipated products of this research include new tools for analysis and modeling of socially generated content, with special emphasis on image data. The resulting methods provide potentially useful insights that characterize users, communities and societies, in a broad range of applications. The project offers enhanced research-based advanced training opportunities for graduate as well as undergraduate students and involves development of new courses on related topics at both Stanford University and Carnegie Mellon University.

Project Report

This proposal focuses on the problem of using large-scale image and video data for analyzing and understanding online social media. Specifically, our proposal aims to: 1) establish large-scale visual concept ontology and structures for the web-image world; 2) understand nodal activity by analyzing image contents in the context of social media in large scale and with connectivity; 3) make link and role predictions to infer social networks and communities from image contents; and 4) discover and analyze dynamic trends of social media using images. Supported by this award, the laboratory of statistical artificial intelligence and integrative genomics (SAILING Lab) at Carnegie Mellon University led by PI Professor Eric Xing has carried out a comprehensive and in-depth investigation of the problems listed above, via mathematically well-founded new machine learning models and algorithms, and software systems for visual analysis. These research works have led to a significant array of scientific, educational, and utility outcomes: 1) It has produced a large body of scientific results in the form of: new algorithms for objection detection in mage based on sub-modular formalisms, unusual event detection in video based on dynamic sparse coding, image trend discovery based on time-varying Bayesian networks, etc.; and new computer software available to the public for processing large scale image and video data. These results have led to about 15 publications in top peer-reviewed conferences in Computer Vision (e.g., CVPR, ICCV, ECCV), Data Mining (e.g., KDD, WSDM, WWW), and Machine Learning (e.g., NIPS). 2) It lays the computer vision foundation of a vibrant research lab – the SAILING Lab, at CMU, which is now hosting about 15-20 Ph.D. students, and many postdocs at any time. In particular, this grant has either completely or partially sponsored the work of about 3 Ph.D. students overall, who after finishing their training at CMU will become either faculty at leading institutions worldwide, including Seoul National University, etc., or entrepreneur in Computer Vision Startups (i.e., PanOptus Inc.). 3) It has also resulted in a big collection of open source software available to the research community. And the results have also been disseminated through tutorials and keynotes lectures in various research workshops, conferences, and university colloquium. In particular, our work has generated a significant public awareness, and was widely covered by the news media. Specifically, Bin Zhao's video summarization work called LiveLight has been covered in CMU News, CBSNEWS, WIRED.CO.UK, VOA, WESA, DISCOVER and many others. Gunhee Kim's work on Visualizing Brand Associations from Web Community Photos has been covered in CMU News, ScienceDaily, The Register, POP City, Futurity, etc. Most previous research on image understanding has focused on medium-scale data sets, involving a few thousand objects from dozens of categories. There is recently a growing consensus that it is necessary to build general-purpose object recognizers, large-scale image and video summarizer to digest the explosion of video information. The sheer size of the video data and the diversity of contents therein present unique challenges for machine learning and computer vision. Our work paved the way to develop intelligent systems possessing the ability of unlock the massive content from visual data. Parallel to the research plan, we have also proposed to pursue an education agenda that promote close interaction between computer vision and machine learning and social media research. So far, this goal has been achieved well. The methodological advancements resultant from this grant have been well integrated into three courses taught at CMU: Graduate Machine Learning, Advanced Machine Learning, Graphical Models, which over years have been attended by hundreds of graduate students, and have had far reaching influences on the students’ research, and on later teachers of these courses. All these courses originally developed by PI Eric Xing at CMU have now become the required course at CMU graduate programs. In addition to basic research, we have also pursued technique transfer and commercialization of the invented technology. Gunhee Kim’s work on image storyline summarization has now been used by the Disney Lab to develop their theme park tour guide. Bin Zhao’s LiveLight system for automatic summarization of videos has now been pushed for commercialization through a startup called PanOptus, which will provide intelligence systems for automatically summarize long, boring consumer videos from mobile or wearable devices, surveillance videos from public and private surveillance systems, and military intelligence videos from drones. We are seeking funding further funding from NSF through the SIRB program. In summary, with funding from this Award from NSF, we have achieved the original goals proposed in our proposal, and have made satisfactory contribution in scientific discovery, methodology development, tool production, and education outreach. We would like to thank NSF for the strong support throughout the duration of this project.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Carnegie-Mellon University
United States
Zip Code