Core Database Technologies to enable the Integration of AToL Information

Cellinese, Nico; Beaman, Reed

Abstract

The AToL (Assembling the Tree of Life ) is a large-scale collaborative research effort sponsored by the National Science Foundation to reconstruct the evolutionary origins of all living things. Currently 31 projects involving 150+ PIs are underway generating novel data including studies of bacteria, microbial eukaryotes, vertebrates, flowering plants and many more. The data being generated by these projects include and are not limited to: (i) Specimens and their provenance including collection information, voucher deposition, etc.; (ii) Phenotypic descriptions and their provenance; (iii) Genotypic descriptions and their provenance; (iv) Interpretation of the primary measurements including homology ; (v) Estimates of phylogenies and methods employed; and (vi) Post-tree analyses such as character evolution hypotheses. While the data collection, storage, and dissemination within each projects are well coordinated, there is a critical need to develop the infrastructure to integrate all ATOL data sources, allowing the individual efforts to become multipliers for global hypotheses. Furthermore, as the projects continue to expand and address diverse corners of the Tree of Life, efficient project management will be greatly aided by workflow and data management tools targeted towards the ATOL problem domain. The project will develop new, compact, abstract data models for phylogenetics, leveraging use cases from a broad survey of empirical projects. The integration system will develop novel mappings between different phylogenetic data domains, and allow individual projects to join a network of integrated databases in an incremental manner. The data provenance system, which allows tracking of how each data object was created, will be unique to systematics data management. The provenance system will not only allow tracking of what kinds of decisions were made in producing a particular tree or a particular column of a data matrix, but will also allow tracking of alternative data lineages such that, for example, different opinions on character homology might be tracked. The results of the research will be delivered in robust software tools that can be used by the entire evolutionary biology community. The study will develop a community-based formal model of data objects used in systematics, primarily through a continuing set of workshops. This activity will not only develop new data management tools, but will also have the effect of synthesizing disparate views of the phylogenetics research domains. The results of the system will be extensible to other domains of evolutionary biology, thereby contributing to the broader mission of evolutionary synthesis. The project will also provide training for the general systematics community in latest database technologies. Finally, by leveraging existing outreach efforts at the Penn Center for Bioinformatics, the project will link to other biological database efforts in genomics and biomedical sciences, disseminating phylogenetic information to the broad biomedical research community.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0840702
Program Officer: Sylvia J. Spengler

Project Start
Project End
Budget Start: 2008-07-15
Budget End: 2010-09-30
Support Year
Fiscal Year: 2008
Total Cost: $201,215
Indirect Cost

Core Database Technologies to enable the Integration of AToL Information
Cellinese, Nico Beaman, Reed
University of Florida, Gainesville, FL, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments