Scientific community databases fulfill an important research need by offering curated information to audiences with shared basic and applied research goals. They serve as clearinghouses for community information and communication. Researchers need easy-to-use analytical workflows in an easily accessible and familiar location, but the community database typically lacks the infrastructure to support these needs and more data exchange is needed between sites. Additionally, the community database requires the ability to easily incorporate results from analytical workflows for public dissemination, and the capacity to transfer large datasets quickly between computational resources and the database. Tripal, an open-source toolkit used for construction of online genomic and genetic databases, is uniquely positioned to provide solutions to these challenges as it has been adopted by multiple community databases and thus provides a common infrastructure.
This project creates Tripal Gateway: a set of modules (extensions) to be incorporated into Tripal to foster greater data dissemination, collaboration, and research. The team develops three modules that integrate Tripal with Galaxy (a popular workflow system), interconnects Tripal sites for data sharing, and utilizes emerging technologies for faster data exchange: - Tripal Galaxy - a module integrating Galaxy workflows into a Tripal site, providing both next-generation analytical workflows and seamless transition of results into the community database. - Tripal Exchange - a module to provide capabilities for cross-site querying, enabling collation and viewing of data from multiple sites, and integration of data into workflows. - Tripal SDN - a module incorporating software defined networking (SDN) technology, providing mechanisms to improve speed of data exchange.
These new modules are developed, implemented, and tested in conjunction with six data sites (the Citrus Genome Database, Cool Season Food Legumes, CottonGen, the Genome Database for Rosaceae, Hardwood Genomics, and TreeGenes). Integration of the Tripal Gateway is also anticipated for four additional databases (GrainGenes, KnowPulse, LegumeInfo, and PeanutBase). After implementation, this effort will interlink and allow cross-querying across a major Arabidopsis resource, four legume genomics sites, the primary cotton community site, GrainGenes, and four different tree genomic sites covering fruit trees and forest trees. Implementation of Tripal Gateway into the community databases servicing these extensive research communities will support basic and applied research that is both crop-specific and broadly useful across crop agriculture.