"Crowdsourcing" is the idea of using the "wisdom of crowds", that is, combining large numbers of judgments by non-experts, to produce reliable answers to complex problems. In the field of natural language processing(NLP), annotating sentences to show what events they express (and which parts of the sentence express which participants) is such a complex task. For example, the sentence "Maria rides the bus from home to her office" should be recognized as a Ride_vehicle event, with "Maria" as Mover, "the bus" as the Vehicle, "from home" as the Source and "to her office" as the Goal; NLP systems should also be able to recognize the same event with the same participants in the sentence "Maria's bus ride from home to her office takes 40 minutes", but most current systems cannot.
FrameNet (http://framenet.icsi.berkeley.edu) is building a lexical database of hundreds of event types (called "semantic frames") and examples of each in annotated sentences, which can be used to train NLP systems. But expert annotation of sentences is slow and expensive; this project is testing whether crowdsourcing can speed up the creation of such databases, specifically by exploring two crowdsourcing techniques to see which works better for these tasks: (1) online games, where players compete to see who can annotate rapidly and accurately (similar to the "Verbosity" game) and (2) a system in which people are paid small amounts of money to complete such tasks, using Amazon's "Mechanical Turk" (www.mturk.com). If successful, these techniques could be used to build better databases for new NLP systems that really understand "who did what to whom", thus improving question answering and web searching.
Background: Linguists and computer scientists at the International Computer Science Institute have been developing the FrameNet database since 1997 (http://framenet.icsi.berkeley.edu). It is designed to be useful both to humans, as an online dictionary, and also as part of software systems that will "understand" ordinary language in a much deeper way than they now do. This project studied ways to make developing FrameNet faster and less expensive. Human beings understand words largely by relating them to common situations, as in this example: She tossed the letter across the table to Jerry. You immediately know that the sentence can be broken into parts (She/ tossed/ the letter/ across the table/ to Jerry), and that the main idea comes from the verb toss. She is the one who does the tossing, the letter is the thing that gets tossed, across the table tells the path that the letter followed, and to Jerry tells where the letter wound up. FrameNet analyzes this sentence as an example of the Cause_motion frame, and gives names to the roles of the parts like this: [Agent She] TOSSED [Theme the letter] [Path across the table] [Goal to Jerry]. The FrameNet team has developed more than 1,110 definitions of such common situations (called semantic frames), ranging from getting a job, to being arrested, to changing leadership in an organization. They have defined the set of roles for each frame and manually labeled almost 200,000 example sentences with the roles, similar to the bracketed example above. Other researchers, working independently, have used machine learning techniques to train classifiers on the FrameNet labeled data, and created software systems that can automatically label sentences in all sorts of text. These automatic semantic role labeling (ASRL) systems make it possible for computers to extract information and reason about the events and relations described in ordinary text such as news websites and blogs. ASRL systems based on FrameNet are currently being used for tasks such as recognizing different types of events in military texts. Accurate, automatic semantic role labeling is a vital step toward helping computers really interpret natural language. Approach: However, the process of defining frames and manually labeling text is time-consuming and requires experienced staff. The purpose of this EAGER project was to find out whether some parts of this process could be done more quickly and cheaply using "crowd sourcing". The basic approach is to break the problem into small, simple tasks which can be done quickly and easily, then use Amazon Mechanical Turk (AMT) to distribute these tasks to many workers, collect many responses to each task and combine them to get the result. This approach is known to work well for tasks such as describing pictures in a few words, classifying music tracks as cheerful or sad, etc. The job of defining frames and connecting them to existing frames is relatively complicated, so we decided to concentrate on labeling sentences, using sentences that already had one word automatically labeled as the target. In many cases, the target word has more than one sense; in FrameNet terms, that means it could be in one of several different frames. So the first step was to ask the AMT workers to look at ambiguous sentences and decide which frame the target word fits into. For example: (1) We HEADED along the ridge for about three miles. (2) George Yeo was appointed to HEAD the new Ministry for Information and Arts... (3) Most single parent families are HEADED by women; just 10 per cent are single fathers. Clearly, (2) and (3) are similar, while (1) is different. (1) is an instance of the Self_motion frame, which contains head, and also dozens of other verbs like march, saunter, waddle, and walk. (2) and (3) are instances of head in the Leadership frame, which includes verbs like administer, command,and rule, and nouns like king, chief and CEO. Results: We tried several ways of presenting this choice to the workers; in the final version, we displayed one (randomly chosen) example sentence for each frame (like (1) and (2) above), and asked the workers to group new sentences (like (3) ) with one of them. Using this system, AMT workers were able to classify hundreds of sentences into frames quickly and accurately. Unfortunately, we have not yet been able to build a good system to crowdsource the remaining subtask, role labeling. We hope to solve this problem by simplifying the labeling task and gradually developing an ongoing group of experienced workers. If we can find a way to efficiently expand the FrameNet database, we expect that ASRL and natural language understanding systems will improve significantly, and we will be closer to the day when computers can scan a sentence and learn who did what to whom. (See diagram for a more complex example.)