The Social Web is changing the way people create and use information. Sites like Flickr, Del.icio.us, Digg, and others, enable users to publish and organize content and participate in communities. The information they create while interacting with content and other users is called social metadata. Tags, one example of social metadata, were introduced as a means for individuals to organize their content by assigning freely-chosen keywords to it. Some Social Web sites now also allow users to organize content hierarchically. The photosharing site Flickr, for example, allows users to group related photos in sets, and related sets in collections. Although social metadata lacks formal structure, it captures the collective knowledge of the community. Once extracted from the traces left by many users, such collective knowledge will add a rich semantic layer to the content of the Social Web. This project will develop a probabilistic framework to combine diverse types of social metadata to construct a common concept hierarchy. In addition, the methods developed by the project will use social relations, in the form of community participation, to discover community-specific vocabularies and concepts, and identify facets of multi-dimensional concepts.

In the future, Social Web sites and data management tools will allow users to express ever richer types of knowledge, including complex predicates and semantic relations. The ability to aggregate individually expressed knowledge into a unified whole will transform the way people use information. Global concept hierarchies, for instance, can help users visualize how their content relates to that of others and allow for more efficient browsing, search and discovery. By linking content to a common concept hierarchy, the methods developed by the project could also be used to integrate disparate data and align it across domains. The proposed work, therefore, addresses one of the more important emerging questions in AI, namely, how to harness the power of collective intelligence.

For further information about this project, please see the project Web site at www.isi.edu/~lerman/projects/folksonomy.html.

Project Report

The Social Web has revolutionized the production of knowledge. On sites like Wikipedia, Twitter, YouTube, Flickr, and others, users generate and publish content, annotate it with descriptive keywords, and interact with others. Although social metadata contained in the descriptive keywords lacks formal structure, it captures the collective "folk" knowledge of the community. NSF-funded researchers from University of Southern California Information Sciences Institute and University of Maryland have developed computational methods to extract such "folk" knowledge from the traces of online activity of many users. Their methods merge personal fragments of knowledge that users of the social photo-sharing site Flickr create to organize their own photos, into a common deeper taxonomy of concepts. Once mined from data created by many users, such knowledge will add a rich semantic layer to user-generated content on the Social Web. It will also help people visualize how their content relates to that of others and allow for more efficient browsing and knowledge discovery. USC and UMD researchers worked with social metadata from the photo-sharing site Flickr. This site allows users to upload photos, tag them with descriptive labels, and also organize them within personal directories. Although the site itself does not impose constraints on how these directories are created and used, individuals generally employ them to represent intuitive relations between concepts, for example, creating a folder "people" with sub-folders "family" and "friends". The figure above shows two personal directories created by one user: one for the places in Africa he traveled to, and the other one to organize holiday photos. The research team developed computational method to automatically learn taxonomies of concepts – what they call folksonomies – from thousands of such personal directories. Their method extends powerful distributed inference algorithm to concurrently combine many small structures (user-generated directories) into a larger, more comprehensive structure (communal folksonomy). Researchers showed that their method allows them to learn accurate and complete folksonomies. The figure above shows one such folksonomy, of places related to Africa, which was automatically learned by their method. This folksonomy is more complete than those specified by any individual users, and includes knowledge that may not be found in a knowledgebase that has been created by experts. We learn, for example, that people see places like "South Africa" both as place which contains other places, such as "Cape Town" and "Soweto", but also as a destination for observing "Rhinos", "Lions" and "Antelopes". The researchers have also developed computational methods to leverage the diversity of expertise among users to learn more comprehensive and accurate folksonomies, and novel approaches to validate learned folksonomies. Their research shows the potential to unleash structured knowledge from the massive amount of user-generated content.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0812677
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$482,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089