This collaborative project aims to establish a national computational resource to move the research community much closer to the realization of the goal of the Tree of Life initiative, namely, to reconstruct the evolutionary history of all organisms. This goal is the computational Grand Challenge of evolutionary biology. Current methods are limited to problems several orders of magnitude smaller, and they fail to provide sufficient accuracy at the high end of their range.
The planned resource will be designed as an incubator to promote the development of new ideas for this enormously challenging computational task; it will create a forum for experimentalists, computational biologists, and computer scientists to share data, compare methods, and analyze results, thereby speeding up tool development while also sustaining current biological research projects.
The resource will be composed of a large computational platform, a collection of interoperable high-performance software for phylogenetic analysis, and a large database of datasets, both real and simulated, and their analyses; it will be accessible through any Web browser by developers, researchers, and educators. The software, freely available in source form, will be usable on scales varying from laptops to high-performance, Grid-enabled, compute engines such as this project's platform, and will be packaged to be compatible with current popular tools. In order to build this resource, this collaborative project will support research programs in phyloinformatics (databases to store multilevel data with detailed annotations and to support complex, tree-oriented queries), in optimization algorithms, Bayesian inference, and symbolic manipulation for phylogeny reconstruction, and in simulation of branching evolution at the genomic level, all within the context of a virtual collaborative center.
Biology, and phylogeny in particular, have been almost completely redefined by modern information technology, both in terms of data acquisition and in terms of analysis. Phylogeneticists have formulated specific models and questions that can now be addressed using recent advances in database technology and optimization algorithms. The time is thus exactly right for a close collaboration of biologists and computer scientists to address the IT issues in phylogenetics, many of which call for novel approaches, due to a combination of combinatorial difficulty and overall scale. The project research team includes computer scientists working in databases, algorithm design, algorithm engineering, and high-performance computing, evolutionary biologists and systematists, bioinformaticians, and biostatisticians, with a history of successful collaboration and a record of fundamental contributions, to provide the required breadth and depth.
This project will bring together researchers from many areas and foster new types of collaborations and new styles of research in computational biology; moreover, the interaction of algorithms, databases, modeling, and biology will give new impetus and new directions in each area. It will help create the computational infrastructure that the research community will use over the next decades, as more whole genomes are sequenced and enough data are collected to attempt the inference of the Tree of Life. The project will help evolutionary biologists understand the mechanisms of evolution, the relationships among evolution, structure, and function of biomolecules, and a host of other research problems in biology, eventually leading to major progress in ecology, pharmaceutics, forensics, and security.
The project will publicize evolution, genomics, and bioinformatics through informal education programs at museum partners of the collaborating institutions. It also will motivate high-school students and college undergraduates to pursue careers in bioinformatics. The project provides an extraordinary opportunity to train students, both undergraduate and graduate, as well as postdoctoral researchers, in one of the most exciting interdisciplinary areas in science. The collaborating institutions serve a large number of underrepresented groups and are committed to increasing their participation in research.