Online Question-Answering Sites (Q&A sites, for short) are a new and rapidly growing piece of the community-created content landscape. From Knowledge iN in South Korea to Microsoft's Live QnA in the U.S., to Yahoo! Answers in 26 countries world-wide, users are flocking to sites where they can post questions and get answers. Yahoo! Answers alone attracts about 18 million unique visitors monthly and has accumulated over 400 million answers since its launch in 2005. Not only do these sites meet individual needs for information, the content they generate is an important source for online search and knowledge discovery. However, a casual browse through any one site reveals significant noise in the signal. Too many questions receive sarcastic or even insulting answers. Too often, it seems, users re-ask a question rather than finding value in the answers already posed.

Because the dramatic growth of Q&A sites is so recent, there has been little opportunity to empirically investigate them as a new information resource and as a new type of social space. This project examines five Q&A sites using a variety of observational and experimental methods to gain an understanding of how users interact in and with these sites while also developing tools to help users better meet their goals. Specifically, this project will: 1.) identify structures or properties in questions posed on Q&A sites that affect response characteristics such as quantity, quality, and timeliness; and explore the use of templates, critics, and bots to help question-askers obtain better responses; 2.) identify structures or properties in response threads that suggest high risk of failure; and explore the use of bots to intervene and "rescue" derailed Q&A threads; 3.) understand the lifecycle of Q&A site participants, including their social interaction on the sites; and develop tools to help support user integration into online Q&A communities.

Given their importance as an information resource and social phenomenon, understanding online Q&A sites has intellectual merit in its own right, in addition: this work advances an important line of research on online and computer-mediated communications that has helped rhetoric and communications experts contribute to the design of more effective online communication tools. It will also be the first work to study online Q&A from both a social and an information resource perspective, giving new insight into the nature of online voluntary knowledge creation.

Broader Impacts: Online question-answering sites have become an important source of information and advice for individuals and businesses, as well as an important source of content for web search engines. By understanding these sites and developing tools to support their continued successful operation, we will help Q&A community designers understand approaches that can both promote beneficial social experiences for users and the construction of valuable community-contributed repositories of knowledge.

Project Report

This project was inspired by the wide variety of outcomes in existing online social question-answering communities -- communities ranging from broad interest ones such as Yahoo! Answers and Ask Metafilter to narrower topical communities such as StackOverflow (for programming) and TurboTax Live Community (for tax preparation). While we, and others, had done some initial research that showed experimentally that these communities are capable of producing high quality answers--even to challenging question, we did not know enough about how they work, and particularly about what motivates the most valuable community members to contribute substantial time and effort to helping others through question-answering and in the process to creating an archive of searchable information that others later use to find information. Our approach was two-pronged. Through a collaboration between computer scientists and rhetoricians we were able to study the communication and behaviors of online question asking while also developing computational tools to identify promising contributors, to identify interesting question-asking situations, and the like. Among the results of this interdisciplinary collaboration were the following outcomes: * Development and validation of the first comprehensive taxonomy of question types asked online. This six-category taxonomy is both interesting in its own right and useful for processing questions. We've empirically shown that different categories of questions inspire different degrees of response, produce different levels of reusable content, and fit with or are out of place in different communities. We also have developed classifiers that classify parts of this taxonomy. * Development of a suite of data mining and machine learning tools for identifying promising participants in a question-answering community--specifically, people who have high potential to become top-level question answerers. We experimented with a variety of different techniques for identifying such people (ratings of initial answers, frequency of answering, etc.), but eventually developed a new measure that turned out to outperform the rest--high performers are particularly good at directing their effort to where it is useful (i.e., answering questions that need answers, rather than piling another answer onto the most-answered ones). * A lifecycle study of high contributors, bringing together a mix of survey research, interviews, and artifact analysis to understand how top-contributors behave and why they behave that way. Among the interesting findings is the fact that top contributors are far from uniform--there are different roles that people settle into, each of them useful and based on different skills and interests. * Along the way, we developed a number of useful computational tools. We developed classifiers to idenify the likely lifespan of a question (i.e., how long answers will be good). Such a classifier can distinguish a question such as "How many Japanese civilians were killed in WWII?", which has a long lifespan, from "Who are the Twins playing this evening?", which has a short one. We also experimented with different techniques for entering items from restricted sets (e.g., movies) to find ones that are most successful without distracting users from their underlying question or answer. Along the way, we are proud to have conducted research in a manner designed to educate as many students as possible. We trained five doctoral students and twelve undergraduates in research, exposing them to the excitement and challenges of interdisciplinary research. All the doctoral students and many of the undergraduates succeeded to the point of publishing peer-reviewed research papers, and many had the experience of making oral presentations of their research. Finally, we engaged in outreach to the broader research community, participating in a variety of panels and workshops and publishing broad-interest articles to help raise awareness of the potential of Q&A systems beyond our research and into practice.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0812148
Program Officer
Kevin Crowston
Project Start
Project End
Budget Start
2008-09-01
Budget End
2013-02-28
Support Year
Fiscal Year
2008
Total Cost
$483,531
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455