The University of Wisconsin is awarded a grant for a workshop entitled "Cyberinfrastructure and the Dimensions in Biodiversity - Planning for Success," that will gather community input to define a vision, technical requirements, procedures, and approaches for the development of cyberinfrastructure (CI) supporting integrative research in biodiversity sciences. The workshop will convene an interdisciplinary group of scientists with CI expertise representing biology (including genetics, genomics and metagenomics, taxonomy and systematics, ecology), earth sciences, informatics, computer science, along with government agency and non-governmental organization representatives. The focus will be on identifying the requirements for software, middleware (as well as underlying standards and frameworks), computational capability, and other CI that will be needed or leveraged to support data-intensive research in an interoperable and integrated information environment. The outcomes of the workshop are intended to provide a set of community derived recommendations and lead to development of further strategy with an ultimate goal of providing improved support, usability, and sustainability of CI resources in biodiversity research community; one that is nimble, adaptable, and responsive to inevitable changes in the research and IT landscapes.
The issues to be covered in this workshop address a broad cross-section of research in biological diversity. The discussion of CI in this context also has potential applicability to other data intensive science. Ultimately, the use of biodiversity information made available through an interoperable and integrated information infrastructure is likely to achieve broader applicability, use, and re-use.
The NSF ‘Dimensions of Biodiversity’ program recognizes that emerging technologies in computing and cyberinfrastructure are revolutionizing our ability to investigate the broad scale patterns and processes underlying biodiversity. This report documents the outcomes of an NSF sponsored workshop (DBI-1047800) held at Madison, WI during October 13-15, 2010 that was charged with identifying aspects of cyberinfrastructure necessary for supporting successful research in the Dimensions of Biodiversity program. Workshop participants represented a broad spectrum of discip-lines, ranging from cyberinfrastructure, informatics, and computer science to biodiversity, biology, and environmental science, and provided excellent insights into the major cyberinfrastructure needs of biodiversity science. For CI developments in general the importance was emphasized to strike a balance between satisfying specific needs of DoB projects and leveraging of existing developments in the larger biodiversity informatics community. The former will assure researcher buy-in but promote business as usual approaches, while the latter may lead to more cumbersome requirements for documentation, annotation and best practices but will improve interoperability and sustainability. This balance should provide optimal return of investment by promoting an economic ecosystem for collaboration and dissemination of data, tools, and computations while reusing existing technologies, leveraging developments in the larger community and determining the needs that are specific to DoB. A focused, mission oriented and coordinated effort in collaboration between DoB researchers and CI experts is needed to develop a specific road map for identifying, selecting and/or developing resources and analytical tools necessary for biodiversity research. Specific issues identified by this workshop are A governance / management / operational structure that ensures coordination of CI developments and agreements on standards (in particular a taxon concept standard) and best practices. A coordinated effort with input from DoB projects can improve design and coding quality leading to broader applicability of fewer and better tools. Standards and best practices need to be agreed upon in a community process that ensures participant buy-in. Leveraging existing initiatives (DataNet, iPlant, etc). Sustainable data and tools repository that provides state of the art curation approaches making data and tools discoverable, easy to access, integrate and use. Repeatability of analyses will require publication and extensive documentation of data and analytical procedures, preferably in a standardized approach that allows repeating of analyses by others. Access to advanced computational cycles; although the computing power is available it is currently not easily accessible for most scientists and standardized procedures need to be developed which should make the underlying hardware transparent to the user. Workforce training is needed on several levels (undergraduate, graduate and post-graduate) for scientists to know what tools are available, use them effectively and communicate requirements to software developers. CI sustainability needs to be achieved in two areas. First, in a community supported process, best of breed CI developments need to be identified for further maintenance and second, negotiations with universities, libraries and professional societies need to insure financial support. The most challenging of these will, in fact, be to determine the governance / management / operational structure that can deliver CI resources in a way that best meets the needs of a scientific community for which past efforts at coordination have been frequently unsuccessful. As this issue influences several others mentioned, it is also the most important one. Several examples of more successful organizations are for example the US-LTER program, UNIDATA, NCBI, ESIP, EOL.