The cost of sequencing a genome has been dramatically reduced in the last decade, and the natural consequence is that an ever-growing number of researchers are sequencing more and more new genomes, both within populations and across species. Each of these researchers are collecting genomic information on a regular basis, but their ability to collaborate and share information and expertise cab be limited by the absence of supporting tools. To address this need, we have developed Apollo, an easy to use web-based environment that empowers distributed researchers to interactively explore and refine accurate genomic structural annotations via informative visualizations. We now propose to extend this tool with the goal fully embedding genomic ?crowdsourcing? into the research lifecycle, upscaling the volume and utility of data that can be processed by the system. We will incorporate more tools for profiling annotations (e.g. protein motifs, multiple sequence alignments, protein family placement, and inferred function) by integrating Apollo with the external tools and services, such as Galaxy, InterPro, and track hubs; and will provide a broader range of automatic checks and quality measures prior to submission. Apollo will serve both as an editing environment and as a collaborative communications center, which will dramatically increase the amount of biological information that can be used in analysis of genome-scale human datasets generating additional insights into human disease risk, progression and potential therapies. Critically, all contributions from scientists working in this collaborative research environment will be individually recognized (using ORCIDs) to assure due credit is given and that the provenance of each annotated genomic feature is available. To realize this vision, we have outlined a series of specific aims, supported by detailed technical plans. We will implement a more streamlined, scalable setup procedure to ease installation and deployment for newly sequenced organisms. We will provide a standardized API providing a platform for the integration of new capabilities and workflows tailored to individual and community needs. We will develop a stand-alone validation package to expedite merging revised gene sets with prior versions, as well as real-time quality control during the annotation process. We will implement an annotation-by-annotation, messaging system to enable curators to explain their decisions and discuss their reasoning with others. We will implement a provenance system to provide scientific credit to contributing researchers. We will introduce support for the co-curation of multiple related genomes and make better use of evolutionary information from homology searches, to improve annotation consistency and save curator time. We will enable Apollo to function both as a ?Track Server?, to dynamically share new annotations via the Ensembl browser (or others), and reciprocally to act as a ?Track Client?, to display tracks pulled from either the EBI or UCSC hubs. Lastly, we will engage with the community to obtain feedback for enhancements, and reciprocally provide training and documentation.

Public Health Relevance

Sequencing a genome is an important step towards understanding how genes work together to direct the growth, development and maintenance of an organism; and can provide insights and suggest strategies for improving human health and the environment we live in. But this is true only if the sequences of these genomes are well-annotated with functional information. The Apollo Genome Editor?s real-time collaborative environment will mobilize more researchers, expanding their opportunities to work together to refine the accuracy, coverage, and precision of genome annotations, and make them widely available through centralized, public, searchable, on-line resources.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM080203-11A1
Application #
9596385
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2007-08-01
Project End
2022-05-31
Budget Start
2018-09-10
Budget End
2019-05-31
Support Year
11
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of California Berkeley
Department
Biomedical Engineering
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Schoville, Sean D; Chen, Yolanda H; Andersson, Martin N et al. (2018) A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Sci Rep 8:1931
Harper, Lisa; Campbell, Jacqueline; Cannon, Ethalinda K S et al. (2018) AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database (Oxford) 2018:
Poynton, Helen C; Hasenbein, Simone; Benoit, Joshua B et al. (2018) The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology. Environ Sci Technol 52:6009-6022
Papanicolaou, Alexie; Schetelig, Marc F; Arensburger, Peter et al. (2017) Erratum to: The whole genome sequence of the Mediterranean fruit fly, Ceratitis capitata (Wiedemann), reveals insights into the biology and adaptive evolution of a highly invasive pest species. Genome Biol 18:11
Putman, Tim E; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian et al. (2017) WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database (Oxford) 2017:
Buels, Robert; Yao, Eric; Diesh, Colin M et al. (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66
Lee, Eduardo; Helt, Gregg A; Reese, Justin T et al. (2013) Web Apollo: a web-based genomic annotation editing platform. Genome Biol 14:R93
Lee, Ed; Harris, Nomi; Gibson, Mark et al. (2009) Apollo: a community resource for genome annotation editing. Bioinformatics 25:1836-7