A collaborative award has been made to the University of Arizona and the University of California at Davis to develop novel ways of tying scientific names directly to published biological characteristics of organisms, and to implement a new user-friendly program, the Explorer of Taxon Concepts (ETC), to assist with the disambiguation of the scientific names of species at all taxonomic ranks. Prototypes from several successful NSF-funded projects are integrated through ETC to enable: (i) text-mining extraction of taxonomic knowledge from scientific literature, (ii) analysis and integration of this knowledge using logic-based reasoning and information theoretic methods, and (iii) result visualization. The results shed light on similarities and differences among various scientists' understanding of a particular species, as well as relations between the terminology used by different scientists, allowing for more accurate integration of data gathered by different investigators. A component of the ETC project is computer science research aimed at a novel integration of state-of-the-art logic inference and information theoretic approaches to taxonomic science.
Scientific names are the primary identifiers for organisms and the anchor for the communication and comparison of biological knowledge. However, there is constant revision of the definition of taxa by experts, making interpretation of the names through time challenging. This project will produce and demonstrate the use of ETC software on descriptive scientific literature from the Rosaceae (the Rose family) and Apoidea (the Bee super-family) to facilitate research into critical pollination systems. These pollination systems are currently of great concern due to reductions in bee populations globally with the potential to reduce yield of many staple food crops. ETC's components support scientific knowledge value added to its inputs, making them useful in many other biodiversity information applications. Character and anatomy ontologies built and enhanced by the ETC project will benefit all knowledge-based applications in biology. The project adopts the following strategies to broaden its accessibility: The integration of ETC components with existing biological computing infrastructure such as DataONE and iPlant will make the tools broadly available. The partnership with iPlant's successful Education, Outreach, and Training (EOT) group will document the software for instructional use and encourage its adoption in the classroom. Components of the research and the final products will also be packaged into learning modules for college and graduate level courses at University of Arizona, the University of California at Davis, and other universities. Project outcomes will be accessible via the link provided at: http://sirls.arizona.edu/node/684.