Researchers now have access to richer and more detailed behavioral data than ever before. For example, when studying how children learn to walk, researchers can collect eye-tracking data from miniature head-mounted cameras recording the infant's eye movements and field of view, making it possible to see exactly where the child looks while navigating through the environment. Simultaneously, researchers can collect high-speed motion-tracking data detailing the trajectories of the child's limb movements and video data about the child's path relative to caregivers and obstacles, interactions with people, objects, and surfaces, and affective responses while walking, falling, and interacting. Despite the widespread availability of video and other recording technologies, behavioral researchers typically settle for analyzing only one variable in one stream of data, rather than seeking relations among multiple variables across multiple data streams. Powerful data analysis tools and sophisticated data management practices are needed to integrate different kinds of data and relate them to each other tools and practices that few researchers have. In addition, researchers usually work in isolation, seldom sharing data that might illuminate others' research. Without richer analyses and data sharing, theoretical progress in developmental psychology and other fields of behavioral science is hampered. The purpose of this workshop is to delve into the conceptual, technical, and management issues that, when resolved, will allow researchers to perform richer analyses across large, shared, data sets. The workshop will focus in part on the future development of an emerging open-source software tool, OpenSHAPA, and will explore how OpenSHAPA might be extended to encompass new data exploration and visualization tools and promote data management and data sharing. Twenty-two researchers will participate in the workshop, representing the fields of cognitive, perceptual, social, language, and motor development, human-computer interaction, visual analytics, computer science, eResearch, cognitive science, and human factors. Collectively, the invited researchers have experience with different aspects of the problem of exploring rich behavioral data, such as performing massive data visualization, innovative data analyses, integrating multiple data streams, performing custodianship of shared data sets, and creating eResearch communities and data management tools.
The outcomes from the workshop will help to improve the quality of behavioral science. First, findings from the workshop will have an immediate impact on further development of the OpenSHAPA tool, where development is shared across a burgeoning community of users. Possible directions are changes to the architecture to prepare for expansion of data management and data sharing capabilities, building links to existing software, creating libraries of scripts for users to manage data in standardized ways, creating web-based user guides and best practices, expanding user forums, and providing efficient technical support. Research community members can freely adopt OpenSHAPA, expand their current use of it, or build bridges between it and other open source tools, and will bring new users into the community of current users and developers. Second, the richer data analysis that results should support richer theoretical insights. Better data management practices will support more reliable and replicable research, and will better preserve data for future use within and across laboratories. A community of open data sharing practices will lead to greater transparency and efficiency in research and teaching by allowing researchers to inspect each other's data sets and analyses, thereby reducing puzzling failures to replicate, generating new hypotheses, and exposing students to original footage of tasks and findings.
Data Coding, Analysis, Archiving, and Sharing for Open Collaboration: From OpenSHAPA to Open Data Sharing Project Outcome Report NSF Workshop held 15-16 September 2011 National Science Foundation Headquarters, Arlington, VA Supported by NSF Award #1139702 Karen E. Adolph, New York University Penelope M. Sanderson, The University of Queensland On 15-16 September 2011, Karen Adolph and Penelope Sanderson hosted a workshop, "Data Coding, Analysis, Archiving, and Sharing for Open Collaboration: From OpenSHAPA to Open Data Sharing." Participants were 35 researchers in developmental science, educational research, computer science, and cognitive science, and program officers from the National Science Foundation, the National Institute for Child Health and Human Development, and the Institute of Educational Sciences. Discussions focused on the promises and pitfalls of open sharing of video data in developmental science. Presentations from workshop participants are posted at http://databrary.org. Participants agreed that the barriers to open video data sharing are surmountable, potential benefits are many, and the timing is right to proceed with a video data sharing initiative in developmental science. Rather than limiting the contents of a shared data library to a particular domain, data should represent the diversity of work in developmental science. To make shared videos maximally useful, deposits should include relevant metadata including codebooks, coding spreadsheets, and manuscripts. Video, the most common medium for recording behavior, has unique challenges and virtues. Video cannot be made anonymous without compromising its value, so a system to share video must address privacy concerns and ensure participants’ consent. Tools for video coding and analyses should be free and open source. Sharing digital video requires substantial storage capacity, powerful search and streaming tools, and significant computational resources for transcoding videos into common, preservable formats. Despite these challenges, effective data sharing can transform discovery. More rapid progress will be made when researchers can mine existing data sets to address issues beyond those examined in the original research; when researchers can point readers and reviewers to raw video data that illustrate procedures and findings; when users can browse for exemplars to stimulate new work, gather preliminary data, expand samples, run replications, examine cohort effects, and assess effects of geographic location or population by using data in a shared archive; and when instructors can search for suitable examples to illustrate methods and findings to their students. Data sharing conserves research funds by avoiding unnecessary duplication and supporting more investigators. Data contributors will receive more attention and citations by users and their data and tools will survive in useable form beyond their lifetimes. Since the workshop, we have made substantial progress toward meeting these goals. We added additional expertise to our leadership team. We submitted proposals to NSF and NIH to fund a large-scale video sharing project consisting of a five-year plan to (1) transform the culture of developmental science by building a community of researchers committed to open video data sharing; (2) expand the free, open source video coding software, OpenSHAPA, to enable coding, exploring, and analyzing video; (3) build a data management system to support data sharing within labs, among collaborators, and in the Databrary repository; (4) create participant permissions and contributor/user standards that enable open sharing of video data while limiting access to authorized users and ensure participant confidentiality; and (5) create a web-based Databrary repository for open sharing and preservation of video data. The NSF proposal received funding in September 2012 (NSF BCS 1238599). We convened an advisory board of experts in developmental science and data sharing. We secured letters of support from 100+ developmental scientists to work with Databary and their local IRBs to gain permission to share video data. And, we drafted participant permission templates, contributor agreements, and user agreements and have begun to evaluate them with input from other scientists. In short, we are creating the system envisioned by the workshop participants. Intellectual Merit. By creating tools for open video data sharing, we expect to deepen insights and accelerate the pace of discovery in developmental science. The intellectual merit of this workshop is that it enabled the development of a successful proposal aimed at achieving these ends. Broader Impacts. The entire behavioral science community can benefit from the Databrary, OpenSHAPA and lab management tools, and we expect that insights that emerge can facilitate open data sharing and better data management practices in other research communities. Moreover, through our community building, we will train a new generation of developmental scientists who will be empowered with an unprecedented set of tools for discovery. Finally, since some data sets will be available for public viewing, we will raise the profile of developmental science and bolster interest in and support for scientific research among the public at large.