The AToL (Assembling the Tree of Life ) is a large-scale collaborative research effort sponsored by the National Science Foundation to reconstruct the evolutionary origins of all living things. Currently 31 projects involving 150+ PIs are underway generating novel data including studies of bacteria, microbial eukaryotes, vertebrates, flowering plants and many more. The data being generated by these projects include and are not limited to: (i) Specimens and their provenance including collection information, voucher deposition, etc.; (ii) Phenotypic descriptions and their provenance; (iii) Genotypic descriptions and their provenance; (iv) Interpretation of the primary measurements including homology ; (v) Estimates of phylogenies and methods employed; and (vi) Post-tree analyses such as character evolution hypotheses. While the data collection, storage, and dissemination within each projects are well coordinated, there is a critical need to develop the infrastructure to integrate all ATOL data sources, allowing the individual efforts to become multipliers for global hypotheses. Furthermore, as the projects continue to expand and address diverse corners of the Tree of Life, efficient project management will be greatly aided by workflow and data management tools targeted towards the ATOL problem domain. The project will develop new, compact, abstract data models for phylogenetics, leveraging use cases from a broad survey of empirical projects. The integration system will develop novel mappings between different phylogenetic data domains, and allow individual projects to join a network of integrated databases in an incremental manner. The data provenance system, which allows tracking of how each data object was created, will be unique to systematics data management. The provenance system will not only allow tracking of what kinds of decisions were made in producing a particular tree or a particular column of a data matrix, but will also allow tracking of alternative data lineages such that, for example, different opinions on character homology might be tracked. The results of the research will be delivered in robust software tools that can be used by the entire evolutionary biology community. The study will develop a community-based formal model of data objects used in systematics, primarily through a continuing set of workshops. This activity will not only develop new data management tools, but will also have the effect of synthesizing disparate views of the phylogenetics research domains. The results of the system will be extensible to other domains of evolutionary biology, thereby contributing to the broader mission of evolutionary synthesis. The project will also provide training for the general systematics community in latest database technologies. Finally, by leveraging existing outreach efforts at the Penn Center for Bioinformatics, the project will link to other biological database efforts in genomics and biomedical sciences, disseminating phylogenetic information to the broad biomedical research community.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0629846
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2006-10-01
Budget End
2011-09-30
Support Year
Fiscal Year
2006
Total Cost
$1,216,155
Indirect Cost
Name
University of Pennsylvania
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19104