This CAREER project develops, implements, and validates Orchestra, a general middleware layer for supporting collaborative data sharing: the effective exchange of data and updates between loose confederations of scientific collaborators who may disagree about their preferred schema, or even about which data items or updates are correct. The proposed methodology is "peer-to-peer" and "bottom-up" in style: it provides mechanisms by which one can rapidly join a data sharing confederation, by mapping a new data source to existing schemas within the confederation and then specifying what updates are to be trusted and accepted. The primary means of data exchange is through reconciling modifications made locally with those (trusted) updates made elsewhere; operations from elsewhere can always be overridden locally. Additionally, the research develops mechanisms for globally querying across all data sources and schemas. The work naturally extends techniques from data integration and peer data management.
Broader Impact: This work enables greater data sharing in the scientific community, especially bioinformatics: it focuses on the use, derivation, and sharing of information in bioinformatics databases and warehouses, and it will be evaluated in such applications. The developed system will be disseminated on the Web and promoted through seminars delivered at the Penn Center for Bioinformatics and the regional Greater Philadelphia Bioinformatics Alliance, as well as other scientific forums. The research project will be integrated into an educational program (graduate and undergraduate) to teach data management in the broader context, focusing on data integration and exchange as well as traditional databases.