There are currently thousands of scientists creating millions of data sets describing an increasingly diverse matrix of social and physical phenomena. This rapid increase in both amount and diversity of data implies a corresponding increase in the potential of data to empower important new collaborative research initiatives. However, the sheer volume and diversity of data presents a new set of challenges in locating all of the data relevant to a particular line of research. Taking full advantage of the unique data managed by the "long-tail of science" requires new tools specifically created to assist scientists in their search for relevant data sets. DataBridge is an e-science collaboration environment tool designed specifically for the exploration of a rich set of sociometric tools and the corresponding space of relevance algorithms, and their adaptation to define semantic bridges that link large numbers of diverse datasets into a sociometric network. Data from several large NSF funded projects will be analyzed to develop relevance-based data discovery methods. Sociometric network analysis (SNA) algorithms will be used to explore the space of relevancy (different ways data can be related to each other) by metadata and ontology, by pattern analysis and feature extraction, and via human connections. By linking data, human interactions, and usage methods and practices, rich models of social networks inter-connecting massive long tail science data can be created that enhance scientific collaboration and discovery. DataBridge supports advances in Science and Engineering by directly enabling and improving discovery of relevant scientific data across large, distributed and diverse collections. The system will also provide an easy means of publishing data to the DataBridge and incentivize data producers to do so by enabling collaboration and citation. The design will be domain-agnostic and highly extensible and adaptive, supporting inclusion of new relevance algorithms and indexing techniques. DataBridge will be distributed under an open source license enabling wider use and crowd-sourced improvements of the technology. The concepts developed in the project - semantically linking data through sociometric network analysis - will have an impact on non-scientific data collections and will effectively improve access and discovery of information over the web.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1247652
Program Officer
Robert Chadduck
Project Start
Project End
Budget Start
2012-11-01
Budget End
2015-10-31
Support Year
Fiscal Year
2012
Total Cost
$857,981
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Type
DUNS #
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599