Proposal Number: IIS-0241229 Principal Investigator: Thomas Moritz
This international collaborative project proposes to design and test approaches to mark-up and extraction of scientific date from a corpus drawn from the biosystematics literature of entomology (ants) and to develop a set of applications bases on an ontology for this topical area. Ontologies for natural history information are particularly complex because of the diversity of the source material and selected descriptors. The project will build on digital library work underway at the American Museum of Natural History, on biological informatics at Ohio State University and computer science at Universitat Magdeburg in Germany. The project is jointly sponsored by the German Deutsche Forschungsgemeinschaft.
Biosystematics is the science that provides the definitional foundations for organismic biology and for the applied science of biodiversity conservation. Within zoological systematics, insects are a uniquely important group of organisms with major impacts on human health and economics; ants constitute a particularly important group of social insects accounting for a major part of the biomass in tropical rainforests and displaying remarkable diversity and behavioral variation. This proposal is designed to test approaches to mark-up and extraction of scientific data from a corpus of texts drawn from the biosystematics literature of entomology (ants) and to develop a set of applications including powerful search and retrieval strategies to operate on this corpus. Specifically, approaches to automated XML mark-up will be tested using NMNH hypothesized Taxon-X Schema (derived from the implicit structure of scientific publications in biological systematics and already under development at AMNH). Then the corpus of marked-up literature will be used to explore the automated extraction and application of imbedded scientific data including scientific names, morphological characters, species distribution data (for plotting in GIS and contribution to the emerging world database of biological diversity) and collection locales/events for inclusion in a gazetteer of collecting events.
Accomplishment of these goals will result in increased worldwide access and usefulness of an extensive body of scientific literature that is now generally restricted to users who have access to major research libraries. The project will also stimulate interdisciplinary interaction between computer scientists and domain specialists in a field that stands to benefit greatly from IT research. The core work of this project will contribute to global efforts to design and implement of similar, general purpose ontological services capable of supporting biological and conservation education at all levels. These project can contribute to research and education in biology by prototyping a model by extending the practical usefulness of the enormous body of legacy literature in biosystematics and contributing to the completion and continuing updating of international biodiversity databases. The corpus of literature to be made available on the Web -- as well as the methodology and protocols for managing them -- occurring in the context of the Commonsphilosophy can create a new model for international access to this literature offering access to biosystematics libraries to those who have never before had such resources available.