This project will study the situated practices, human assumptions, and organizational routines that transform "little data" into mineable stores of "big data" harnessed for measures and metrics. Currently, our knowledge about the origins of big data and what goes into collecting, curating, manipulating, and deploying these huge information resources is limited. We know even less about the social and cultural implications of these activities, particularly in non-academic contexts. A growing group of scholars urge critical interrogation of the methods, analytical assumptions, and underlying biases of big data science. A nuanced understanding of the situated practices through which big datasets are assembled and manipulated is required before we can comprehend their social and political implications, particularly if we are to evaluate the quality of the scientific results based on analysis on the manipulation of such datasets.
The research will be carried out through a multi-sited ethnography of obstetrical data production in healthcare, an area where big data and associated metrics are both important and problematic. First, it will examine the situated practice and lived experience of creating the massive amounts of information that come to form the datasets. Second, it will trace how the results emerge through automatized measures and algorithms and affect the very environments they are supposed to reflect. This research spans the lifecycle of data. It will investigate how information is collected by practitioners, clerks, and coders and transformed into local repositories of supposedly "clean" data to be manipulated by performance improvement specialists. It will then trace how information is transferred and refined further in a statewide data center and deployed by a major quality improvement organization. Finally, the research will follow the aggregated data back to the local hospitals themselves and assess how data visualizations and performance measures affect local decisions and hospital functioning.
The broader impacts of this project include both near and long-term benefits. In the short term, this research will benefit the individuals and organizations struggling with questions about how to organize local resources to produce and deploy big data in service of management and performance improvement goals. In the long term, this research will generate foundational conceptual models that help to create design recommendations and practice guidelines regarding the social, ethical, and political implications of creating and using big data.