This is a study of tagging, the assignment of labels to information objects by users, and the "folksonomy" categorization systems that can result. By the 19th Century, increasing amounts of information were being published, and it was clear that efficient methods of organization were needed for the information to be accessible. In response, categorization schemes like the Library of Congress Classification and the Dewey Decimal System were invented. The overall information dissemination system contained clear roles and divisions of labor: editors decided what got published, information professionals categorized published works, and most people simply consumed the results. The Internet has toppled this traditional approach. There is no publication barrier, so orders of magnitude more information is available online and information professionals cannot keep up. However, new technologies have arisen that work in this context, notably tagging. Any user can associate tags with items such as documents, movies, or photos, and the tags serve as keys for retrieval. Since tags can be created by any user, the number of tags contributed scales with a community's size: thus, tagging works at Internet scale. Tagging lets users represent their own perspectives, which aids retrieval.

However, tagging is a young technology, with significant challenges and unmet potential. Individual tags are often of poor quality, and many tagging systems are globally incoherent. Empirical evaluations of tagging systems in use are few, and formal comparisons to traditional approaches have not been done. Tagging applications have been limited mainly to search. This project addresses these challenges. It will develop a firmer scientific understanding of the strengths and weaknesses of tagging as a categorization method. It will explore the potential of tagging to enable powerful applications beyond information retrieval. The project consists of three main research activities: (1) Creating a set of metrics to quantify the value of a categorization structure; using these metrics in formal and empirical comparisons of tagging systems to traditional categorizations; (2) Designing mixed-initiative interaction techniques for computational agents and people to detect, evaluate and resolve problems in tagging systems; (3) Developing novel tag-based applications for users to express their preferences and navigate complex information spaces.

This research will create both information-theoretic and usage-based metrics to measure the value of a categorization structure. Studies will be done to show relations between the two types of metric, letting designers predict, for example, how many tags per item are required for effective user search. Systematic cost-benefit comparisons of tagging systems to traditional expert categorizations will be done, thus providing empirical data to a debate that has been characterized by heated conjecture. The utility and generality of a set of mixed-initiative interaction techniques and novel applications will be established by (a) implementing them in multiple platforms, and (b) evaluating them in careful field experiments.

Improving the effectiveness of tagging will help millions of users find the information, products, and services they seek. More directly, the techniques of this project will be implemented in four working online communities, for movie viewers, cyclists, ethics researchers, and politically interested citizens. Collectively these sites have tens of thousands of users, all of whom will benefit directly. Many students will be trained, learning multiple research methods and gaining valuable experience with real online communities. Finally, the software will be developed under an open source license and datasets will be published, thus facilitating other researchers and web site developers in their work.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0964695
Program Officer
William Bainbridge
Project Start
Project End
Budget Start
2010-04-15
Budget End
2014-03-31
Support Year
Fiscal Year
2009
Total Cost
$981,788
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455