Location-based services are rapidly gaining traction in the online world as they allow highly personalized services and easier retrieval and organization of multimedia. However, such services require accurate geolocation information (geo-tags) to be associated with the multimedia data e.g., videos. Because only a small fraction of available video data is geo-tagged. Hence, there is a growing interest in systems that estimate the geolocation of a given video automatically that does not include geo-location metadata. While machine learning offers a potential approach to training automatic location estimators, it requires a standardized training corpus of geo-tagged videos. Automatic collection of videos introduces a bias toward videos that are easily processible by machines and towards geographical locations that are over-represented in current corpora. Hence there is a need for carefully curated standard data sets.

This EArly-concept Grants for Exploratory Research (EAGER) project explores a novel, somewhat high risk, approach to collecting such an annotated training corpus of geo-tagged videos using Mechanical Turk (www.mturk.com), a "marketplace for work" for engaging workers with the desired expertise from around the world to work on a specific task, in this case, participating in a game that involves annotating videos with geolocation metadata e.g., GPS coordinates. The user interface for the game will allow participants to estimate the location of videos by clicking on a map. The knowledge gained from this EAGER would set the stage for more comprehensive geotagged multimedia data collection efforts. The resulting data sets and benchmarks will be made available to the research community to enable detailed and systematic comparative analysis of alternative methods (e.g., machine learning algorithms for predicting geolocation information from videos).

The availability of standardized geo-tagged multimedia data sets will help drive advances in machine learning techniques for geo-location prediction. The resulting advances in geo-tagging multimedia data would enable intelligent location based services and a variety of domains including law enforcement, personalized and location-aware media retrieval, for a variety of applications including journalistic and criminal investigations.

Project Report

Summary Crowdsourcing is "the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers." (wikipedia.com). Traditionally, crowdsourcing is used for tasks that are easy to do for humans but hard to do for programs because the technology isn't there yet. In this project, the higher-level question was if crowdsourcing services, like Amazon Mechanical Turk, could be used to solve tasks that are both hard for machines (because of the immaturity of the technology) and for humans (because it's not day-to-day straightforward problem solving). Using the example of location estimation (given a video, what is the location that it shows), we conducted a study to find out how to qualify and select human crowdsourcers to perform hard, non-intuitve tasks. The findings were: Using a well-written tutorial, a qualification task, and an emperically-established qualification threshold (that is based both on accuracy and time spent on the task), it is possible to get to a crowdsourced annotation accuracy of the same quality as a set of locally hired experts. Skilled crowdsourcers, however, have a higher compensation expectation, which however, is still about less than 50% of that of local expert annotators. Using our findings, we were able to establish a human baseline for location estimation. The state of technology of location estimation is currently in about the same accuracy range as human skilled annotations. In other words, current state-of-the-art automatic systems would most likely pass a Turing-like test, even though there is still a huge improvement potential for them. Conclusion: It is possible to collect training videos for location estimation on Mechanical Turk, especially when redundancy is used in addition to strict qualification. This will allow to extend the task of automatic location estimation to areas of the globe that are currenlty not well-covered with geo-tagged videos as training data. Broader Impacts We developed a new methodology for educating, qualifying, and selecting crowdsourcing providers to perform high-quality non-intuitive annotations for artificial intelligence training. This will advance artificial intelligence by providing new options of creating machine learning training data. Furthermore, it might allow for a novel way to combine artificial intelligence with human intelligence for tasks that are hard to solve for both the human and the machine. Academia, industry, and government will benefit from this new realm of possibilities for content analysis, which is especially applicable in large scale.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1138599
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2011
Total Cost
$50,000
Indirect Cost
Name
International Computer Science Institute
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94704