This project will investigate a new generation of provenance-rich social knowledge collection systems that will greatly improve the ability of people to create online communities of interest and share information. The research will transform the state of the art in social content collection in several important ways. First, social knowledge collection systems will be augmented to support contributors to structure factual content, so that information can be aggregated to answer reasonably interesting albeit simple factual queries. We will build on a semantic wiki framework to allow users to create structured factual content as object-property-value triples. It will not assume pre-defined ontologies, but rather develop algorithms that analyze current content and suggest opportunities for structuring contributions so they can be aggregated to answer simple queries. Second, they will include detailed provenance records that reflect how the content was created, allowing contributors to enter alternative viewpoints and enabling consumers to make quality and trust judgments. The research will include developing algorithms that derive trust metrics from the provenance records, and to allow users to define views on the content based on provenance criteria. It will create novel approaches to propagate trust across content topics and categories and complement existing algorithms that propagate trust in social networks. Third, the systems will proactively guide contributors to invest effort where it is most needed, developing novel algorithms to detect knowledge gaps, and by allowing users to define queries that will be used to drive further contributions.

This work has the potential for a broader impact in many areas where social content collection is already widely used, not only in scientific communities but also for societal issues, such as citizen participation in local communities, health, and governance. All these communities would benefit from further structure, provenance models, and guided knowledge collection. Despite their popularity, social content collection sites currently have important limitations. First, because the content has very little structure they cannot aggregate information and answer many simple questions. Second, contributors have uneven expertise and skills and therefore the content is of very varying quality, yet there is no assistance for consumers to tell apart the valuable from the dubious. Third, these sites depend on the initiative of contributors to figure out how the content needs to grow, and there is no systematic analysis to expose knowledge gaps and guide contributors proactively. This research project addresses all three of those issues.

Project Report

systems that greatly improve the ability of people to create online communities of interest and share information. The research is transforming the state of the art in social content collection in several important ways. First, we have augmented social knowledge collection systems to support contributors to structure factual content, so that information can be aggregated to answer reasonably interesting albeit simple factual queries. Second, these systems will be extended to include detailed provenance records that reflect how the content was created, allowing contributors to enter alternative viewpoints and enabling consumers to make quality and trust judgments. Third, the systems will proactively guide contributors to invest effort where it is most needed, developing novel algorithms to detect knowledge gaps, and allowing users to define queries that will be used to drive further contributions. We have investigated the use of these results in science. One application focuses on allowing scientists to formulate and resolve science tasks through an open framework that facilitates ad-hoc participation and entice collaborators based on attractive science goals. Another application is to allow scientists to collaborate to create useful metadata to describe and aggregate datasets, which can improve scientific data sharing.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1117281
Program Officer
William Bainbridge
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$490,000
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90089