Data analytics, the automatic discovery of patterns in large datasets, is an integral part of contemporary digital practice. Owing to their large scale, broad scope, and unprecedented granularity, such data are manually intractable and, thus, data mining algorithms do work that humans cannot. Still, data analytics necessitates human labor to make it work, for example deciding what data to collect, pre-processing the data to make them algorithm-ready, and making sense of the results. This research addresses related technical and societal challenges by identifying, tracking, and analyzing the multiple forms of human labor involved in the practice of data analytics, and by using this analysis to develop new methods for data analytics research and training. By articulating work practices that previously have been taught largely through apprenticeship, this work expands the reach of data analytics beyond those with direct connections to existing researchers. It increases transparency and accountability of data analysis by making clear how data analysis results are developed and by developing techniques to better communicate results. It supports a better fit between data analysis and domain contexts and demonstrates good practices for integrating social and technical research. The key question this project will answer is how people and machines can work together more effectively to make sense of large-scale data.

Through a collaboration between sociologists of technology and data scientists, this research will identify and address invisible labor at three stages in the analytics process: (1) Conceptualization: How is a problem conceptualized and translated into a machine-solvable data analytic problem? (2) Pre-processing: How are data collected, cleaned, and made algorithm-ready? (3) Post-processing: How are the results of data analysis contextualized, represented, and made sense of, both individually and publically? This research answers these questions by analyzing the uptake of data analytics in the digital humanities. This is a useful site for surfacing questions of human labor because data analytics is a powerful potential tool for the humanities, but does not map directly onto traditional research methods in this field. Thus, mapping problems onto data analytics and translating the results of data analytics into meaningful arguments for the target domain requires more explicit articulation than is the case in more "data-native" disciplines. This research develops implications for the practice of data analysis in 4 areas: (1) designing new curricula for training in data analysis; (2) developing software systems and research methods that better address and support human labor; (3) exploring new ways to make the process of data analysis transparent, accountable, and communicable; and (4) creating a nuanced sociological understanding of the practice of data analysis.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1526155
Program Officer
William Bainbridge
Project Start
Project End
Budget Start
2015-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2015
Total Cost
$500,000
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850