The number of species with sequenced genomes is rising rapidly and will continue to do so with projects to sequence all eukaryotic species in the UK (Darwin Tree of Life project) and on the planet (Earth Biogenome Project) underway. To make sense of assembled genome data important features, such as protein-and non-coding genes, need to be identified and described; this general process is called annotation. Despite major advances in methods to automatically annotate genomes, the most accurate annotations require human assessment. However, the prohibitive cost usually prevents manual annotation (with curated updates) from being performed on individual species. A scalable alternative is to direct manual effort towards reference datasets and to harvest contributions from the broader research community. The resulting high-quality annotations can then be projected across species based on inferred homology. It is essential that the software used for annotation is fast, flexible and easy to use by different communities of annotators (professional curators, bench biologists, or curious non-experts). Of the currently available software platforms to annotate genomes, Artemis and Apollo are the two most popular and have been in wide use for 20 years. Artemis, developed at the Sanger Institute, has been used primarily for viewing, annotating and analysing the genomes of prokaryotic and eukaryotic microbes.

A major strength of Artemis is its companion the ‘Artemis Comparison Tool' (ACT) that allows gene structures to be created or edited in the context of discovering and exploring genome conservation. A major limitation of both Artemis and ACT is that the software performs badly on sequences larger than a few tens of megabases. Like Artemis, Apollo started as a desktop tool, but was redesigned as a web-based tool and now runs on a shared server so that multiple users can browse and create annotations across the same genome simultaneously. Apollo comfortably handles any size genome and scales well with multiple concurrent users. This project will integrate the best of Artemis and Apollo to create a single higher performance annotation platform. The new Apollo will benefit from modern and modular architecture, for collaborative development and improved sustainability. Apollo will also be enhanced with new data interfaces, developed in collaboration with the EMBL-EBI group, so that genome comparison data can be accessed across servers, and annotation performed in the context of exploring synteny. The new generation of annotation tool will replace the existing Artemis and Apollo projects and be integrated into major genome annotation projects as well as retaining is usability by individual small-scale users.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
2031120
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2020-08-01
Budget End
2023-07-31
Support Year
Fiscal Year
2020
Total Cost
$342,818
Indirect Cost
Name
University of California Berkeley
Department
Type
DUNS #
City
Berkeley
State
CA
Country
United States
Zip Code
94710