Phylogenetic trees, which depict the genealogical relationships of organisms to each other, are a key tool for organizing and analyzing information about biological diversity. Trees are used by many researchers in comparative biology and the demand for them is high. The proposed research will take advantage of the phenomenal breadth of data in the GenBank molecular sequence database (currently including sequences from 185,000 species, or some 10% of all species known to science) to build an electronic repository of one billion phylogenetic trees. The goal of this research is to build a very large number of phylogenetic trees and then construct simple search and retrieval tools to match these trees to any query list of species in which a user may be interested.
The primary impact of this research will be outside of the phylogenetics research community. Users of phylogenetic trees span most areas of modern biology: epidemiologists, genomicists, functional morphologists, conservation biologists, and community ecologists, to name a few. A repository of molecular phylogenies built using a consistent methodology, and satisfying prescribed minimal levels of statistical confidence, will provide users a level of consistency needed for strong inferences within and between comparative biological studies.