This supplemental proposal knits together several bioinformatics visualization tools in the service of SARS-CoV-2 genome analysis. The core of the proposal is a newly-prototyped JavaScript viewer, ABrowse, that is capable of rendering multiple sequence alignments, navigable by phylogenetic trees, and integrated with protein structure views, all in a single embeddable component. The ABrowse viewer is currently employed to render the Pfam SARS-CoV-2 special release: a collection of 40 protein domains from the coronavirus genome, along with PDB structures. (ABrowse is also a candidate for Pfam's future default viewer, as noted in the letters of support.) We propose to accelerate ABrowse development for use by the COVID-19 pandemic, specifically targeting scaling, performance, and integration issues that are most relevant to scientists studying the virus. Chief amongst these is scaling ABrowse to handle millions of protein sequences (and/or SARS-CoV-2 genome sequences) by means of a new, compressed storage format suitable for random-access user-driven exploration of very large trees (and alignments) over the web. Beyond scaling, we also address integration, developing plugins for ABrowse to run within JBrowse (the genome browser that is the focus of the project to which this is a supplemental proposal) as well as Auspice (the web dashboard of NextStrain, the phylogenetic genome alignment and annotation package that is widely used for COVID-19 analysis). We also propose several user interface enhancements to make ABrowse more useful as a navigation tool for COVID-19 data.

Public Health Relevance

We develop a new web application for integrative browsing of SARS-CoV-2 genome sequences, protein alignments, structures, and phylogenetic trees. The web app will scale to millions of genome-length sequences and will be integrated with the JBrowse genome browser and the NextStrain pathogen visualization platform. The app is built exclusively using dynamic HTML and JavaScript, so that it can be served from cloud storage with negligible CPU load.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
3R01HG004483-12S1
Application #
10162044
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Wellington, Christopher
Project Start
2020-09-04
Project End
2021-06-30
Budget Start
2020-09-04
Budget End
2021-06-30
Support Year
12
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California Berkeley
Department
Biomedical Engineering
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94710
Schoville, Sean D; Chen, Yolanda H; Andersson, Martin N et al. (2018) A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Sci Rep 8:1931
Harper, Lisa; Campbell, Jacqueline; Cannon, Ethalinda K S et al. (2018) AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database (Oxford) 2018:
Poynton, Helen C; Hasenbein, Simone; Benoit, Joshua B et al. (2018) The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology. Environ Sci Technol 52:6009-6022
Holmes, Ian H (2017) Historian: accurate reconstruction of ancestral sequences and evolutionary rates. Bioinformatics 33:1227-1229
Holmes, Ian H (2017) Solving the master equation for Indels. BMC Bioinformatics 18:255
Papanicolaou, Alexie; Schetelig, Marc F; Arensburger, Peter et al. (2017) Erratum to: The whole genome sequence of the Mediterranean fruit fly, Ceratitis capitata (Wiedemann), reveals insights into the biology and adaptive evolution of a highly invasive pest species. Genome Biol 18:11
Putman, Tim E; Lelong, Sebastien; Burgstaller-Muehlbacher, Sebastian et al. (2017) WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database (Oxford) 2017:
Buels, Robert; Yao, Eric; Diesh, Colin M et al. (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66
De Maio, Nicola; Holmes, Ian; Schlötterer, Christian et al. (2013) Estimating empirical codon hidden Markov models. Mol Biol Evol 30:725-36
Westesson, Oscar; Skinner, Mitchell; Holmes, Ian (2013) Visualizing next-generation sequencing data with JBrowse. Brief Bioinform 14:172-7

Showing the most recent 10 out of 13 publications