Scientific specimens, typically found in museum collections, serve as the anchor for an expanding array of information that grows and changes over time. This information, about specimens and the species that the specimens represent, is often scattered geographically across institutions and across independent computer systems, making it difficult to access or synthesize. The goal of this project is to develop a two-way system of linking and tracking scientific specimens and specimen-related data across biological collections, and to make this system widely available to the scientific community and the public. This system would employ globally unique identifiers, or GUIDs, to tag and update information associated with specimens, allowing communication between end users and collections. This project will improve data quality and quantity for non-scientists and scientists, and will actively engage use communities through training workshops, summer student internships, and community BioBlitz enhancements.
The ability to integrate specimen data and associated information across biological collections will enable critical studies related to systematics, biogeography, and changing species distributions. These in turn have implications for climate change, changing land use, and other questions key to understanding the past, placing changes in an historical context, and predicting the future of species and environments. This project is part of a 10-year effort to digitize and mobilize the scientific information associated with biological specimens held in U.S. research collections. The images and digitized data from this project will be integrated into the online national resource as outlined in the community strategic plan available at http://digbiocol.files.wordpress.com/2010/05/digistratplanfinaldraft.pdf.
The BiSciCol (Biological Science Collections) project supported the development of a framework for tracking information related to biodiversity data as it flows along divergent paths from the initial collection of biological specimens and observations, through to the eventual application of this information to scientific and social problems. The role of Bishop Museum in this collaborative project was twofold: first, to provide integration of biological science collection data with taxonomic data and services developed by the Global Names Architecture (GNA) infrastructure (as part of a separate NSF-funded collaborative project, DBI-1062441); and second to develop a field-based data management system that allows researchers collecting data on the occurrence of organisms during field expeditions to capture critical biodiversity information with a high degree of accuracy, completeness, and efficiency. Through this project, we developed powerful services and APIs to allow biological collections data to be integrated with the Global Names Usage Bank (GNUB, a core component of the GNA). Whereas most existing biological collections databases create their own internal taxonomic authority files for the scientific names applied to specimens, images, and other occurrence records, GNUB provides a comprehensive system for managing taxonomic information on a global scale. It serves as the data infrastructure behind ZooBank (the official registry of scientific names of animals under the International Commission on Zoological Nomenclature; ICZN), and is changing the way biodiversity data is cross-linked through a series of data components and data services. It includes features for tracking historical usage of taxon names, robust information on associated literature citations, and cross-links to more than half a million external database records to link biodiversity information together. One of the critical features of GNUB that was developed in part with support through this project is a Mata-Authority-based "taxonomic translation" system, that allows the conversion (in real-time) of myriad spellings, combinations and synonym names linked to specimen data into a single "corrected" taxonomy from the perspective of a particular "Meta-Authority" (e.g., the Integrated Taxonomic Information System [ITIS], or local institution or individual taxonomist). Funding from the BiSciCol enabled us to demonstrate how integration of specimen data with the Global Names Architecture can dramatically improve the quality and utility of occurrence-based biodiversity data. Also through this project, we developed, implemented, and tested a robust set of software tools that allow researchers to capture digital information about collected specimens and observations, and associated tissue samples, images, event and locality data, and other related information in the field, during the course of an expedition. Part of this process involves the generation of persistent, globally unique identifiers, which are critical for the function of the BiSciCol infrastructure. These software tools were tested over the course of three separate expeditions (one to the Philippines, one to Pohnpei in Micronesia, and one on a NOAA cruise to the Papah?naumoku?kea Marine National Monument in the Northwestern Hawaiian Islands). This system proved highly effective for managing information about specimens, visual observations, in-situ images, extracted tissue samples, processed specimen images, and all associated metadata (event, locality, agents, etc.) as they were gathered, in real time during the expedition (Figure 2). This new system allowed for much higher accuracy and completeness of captured data, particularly in terms of cross-linking related information on images, tissue samples, taxonomic identifications, and other associated data, and dramatically improved efficiency by reducing the overall time required to capture the information digitally in real time. Importantly, information on tissue samples (Figure 3), in-situ images (Figure 4) and processed specimen images (Figure 5) are integrated with each other at the time the information is originally created, ensuring correct and complete data capture. Through the work supported by this NSF grant, we have greatly improved the overall methods by which information related to biological science collections data are captured and cross-linked, ensuring more accurate and accessible data for broad-scale analysis and use in establishing conservation priorities. It is our sincere hope and expectation that a better coordination of biodiversity data through infrastructures developed for the BiSciCol project will ultimately lead to a more coordinated and empowered biodiversity research landscape, elevating this area of research to the scale of funding currently enjoyed by other scientific disciplines such as astronomy and physics. This will represent a critical transformation of high relevance and importance to future humanity, as we enter a period of climate change and increasing rates of species extinction at the same time that technology is allowing us to decipher the four billion years of valuable information encoded in the global biodiversity genome.