Because both information and connectivity are more available today than ever before thanks to digital technologies, questions can now be addressed by enlisting massive human demographics to supplement the limitations of computer computation. This is especially relevant in the case of visual analytics, where human intuition remains far superior to existing computer object recognition algorithms. While algorithms are limited by pre-labeling requirements, humans can perceive subtle variations and nuances to identify and classify unexpected objects. These tasks, however, are often too massive in scale for a single human to accomplish. Distributing this task over a massive network not only succeeds in categorizing data, but generates massive quantities of human quantifiers (training data) to potentially teach computer vision algorithms to mimic human perception in order to distinguish the normal from the abnormal.

This exploratory project will combine collective human visual perception with machine learning and object recognition, through a study of 1.25 million crowd-sourced inputs provided by over 6,000 volunteers labeling satellite imagery in a search for anomalies in northern Mongolia. These data, collected from June 2010 to the present via an online platform developed by the PI in collaboration with National Geographic Digital Media, afford an ideal "case study" environment to investigate the nature of crowd generated data and methods that distill the wide variability of human input into computational algorithms. The online participants, excited by the potential of discovering the tomb of Genghis Khan, examined massive amounts of ultra-high resolution multispectral satellite imagery to label loosely defined anomalies into various categories. Trends that emerged from the massive volume of labels represent a collective human perspective on what the images contain. A team led by the PI traveled to Mongolia to ground-truth areas of high user input convergence. The resulting ground-truthed anomalies provide a unique opportunity to both accurately measure the quality of human/automated analysis and to investigate the effect of supplementing noisy crowd-sourced data sets with small pools of absolute data in machine learning. In the current project the PI will develop a framework for applying and evaluating the following three research phases designed to study the nature of large scale human generated data for integration into supervised learning algorithms:

1. Consensus Clustering - Tag evaluation mechanisms based upon the volume and consistency of neighboring tags and the ability of the individuals creating those tags. Unsupervised methods for "merging" labels will also be applied for extended anomalies such as roads and rivers.

2. Feature Vector Extraction - Both the type of features (e.g., color, luminance, edges and gradients, scale, orientation, etc.) and the extent of the neighborhoods (e.g., local, wide and global) required to detect anomalies are unknown a priori. Thus, the aim is to determine sufficiently diverse features to capture all relevant cues within the image.

3. Machine Learning - Dominant features representative of, and excluded from, pixel groups of given categories will be determined from the results of Phase 2 above.

Broader Impacts: In this exploratory study the PI will lay the foundation for extracting new machine/human collaborative opportunities from the resource of the crowd. Understanding the bonds between human and computer intelligence will have a profound impact on many branches of science. Thus, concepts developed in this effort may ultimately prove transformative by affording migration of crowd-sourcing from a project-based tool for distributed analytics into a portal bridging collective human perception and machine learning.

Project Report

The emergence of crowdsourcing (distributed human computation) in science is the result of: (a) the ever-increasing size and scale of data collected, and (b) the increased capacity of our interconnected bandwidth required to perform these collaborative efforts. While there have been exciting examples of scalable human analytics, it would be a challenge for human computation alone to analyze every image on the web, every galaxy in the sky, or every cell in the human body. Yet we try, and as a result we have begun to produce extensive repositories of human labeled data to represent the collective visual perceptive reasoning still out of the reach of automated systems. This research aimed to explore frameworks for machine learning that could efficiently learn from these noisy human generated labels to extend the power of crowdsourced human perception. A data set of over 2 million labels collected from a National Geographic sponsored survey of ultra-high resolution satellite imagery for archaeological structures was used in this study. We explored two mathematical approaches to define consensus among the crowd within the geospatial framework of the data: (a) density based clustering, and (b) kernal density estimation. Both approaches showed dependency upon the characteristics/dimensions of the intended features to be identified. For non-linear features the process of kernel density estimation to define consensus resulted in a higher accuracy to the groundtruth. For linear features a triadic clustering approach facilitated a process to "connect the dots" for road segmentation, mapping rural non-paved roads across a barren landscape from sparse human generated point data. An open innovation public coding challenge was created with TopCoder and the NASA Tournament Laboratory (NTL) to generate a broad range of efficient algorithms to learn from these consensus clusters. 395 unique solutions were submitted within a highly structured framework, providing 20 solutions that showed positive learning curves. As expected, in each case the learning curves showed higher accuracy when seeking objects with greater visual familiarity (algorithm showed better results with modern structures tags over ancient structures tags). However, the overall results of this effort validated the initial framework of machine learning from crowdsourced visual perception, with the must successful algorithms gaining sufficient (showing high agreement with the crowd consensus when defining points of interest) optimization after less than 10K independent human inputs. This is a significant increase in efficiency over the 2.3 million tags required to survey the same area without the human/machine learning framework. While this study focused on a satellite imagery based archaeological survey as its case study, there exists a broad range of scientific and industry challenges where scalable human perception networks may help us dive into the unknown and extract the unexpected. This is exemplified by the many subsequent crowdsourced satellite remote sensing efforts that have extended from this initial experiment, with applications ranging from humanitarian monitoring to disaster assessment. Specific examples include: the crowdsourced search and rescue effort for the lost schooner Nina in the Tasman sea; and the coming data analytics changes associated NASA's Cassini mission where algorithm from this research will be applied. Future research should focus on understanding the impact and limitations created by diverse of input quality from the crowd.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1145291
Program Officer
Ephraim Glinert
Project Start
Project End
Budget Start
2011-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2011
Total Cost
$66,002
Indirect Cost
Name
University of California San Diego
Department
Type
DUNS #
City
La Jolla
State
CA
Country
United States
Zip Code
92093