Computational Component 2: Improved Annotation of Echinoderm Genomic Features SUMMARY Echinobase serves as a centralized resource for the most complete and accurate genome assemblies and annotations for species in the phylum Echinodermata. The effectiveness of the echinoderm research community is directly impacted by the quality of the genome assembly and annotation data hosted on Echinobase. A recent survey identified improvements to genome assemblies and gene annotations, including capturing gene orthologues, as top priorities for the Echinobase user community. Assembling echinoderm genomes has historically been challenging given the relatively high levels of polymorphism and repetitive sequence common in these genomes. Recent advances in technologies and algorithms for genome sequencing and assembly permit significant improvements to the contiguity and accuracy of echinoderm genomes in our care and this component addresses how this will be achieved. Upgraded genome assemblies along with the deluge of echinoderm transcriptome datasets warrant the re-annotation of gene models and enable the improved identification of orthologous relationships, both among echinoderms and also to species outside of the phylum. This is a significant enhancement of Echinobase because the orthologs identified will serve to contextualize echinoderm research to a broader community of developmental biologists. Echinobase will also serve as a venue for easily synthesizing information obtained from high-throughput assays of echinoderm development, including transcriptome and functional genomics datasets, along with annotations of noncoding regulatory elements throughout the genome. These efforts will also result in the generation of recommendations for optimal approaches in the assembly of large polymorphic genomes, which remains a difficult problem, and for bioinformatic analyses involving echinoderm genomes. This will in turn enhance the capabilities of the echinoderm community to make genomic inquiries, as well as increase the overall impact of this research. The types of data housed in Echinobase are most impactful when considered as an ensemble rather than in isolation; integrating signals from datasets generated by multiple research groups, using orthogonal approaches, and across species magnifies the significance of any trends identified in a single dataset. This is especially the case for GRN studies, which must draw on all these types of evidence to support nodes and linkages. By enabling GRN studies, this component therefore, supports the most impactful research in developmental biology for this phylum. Efforts to annotate and display this data across multiple types of datasets, and multiple species, are unlikely to be undertaken by individual investigators and thus only a focused effort as proposed here, as part of the overall Echinobase, can provide this impactful resource.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Biotechnology Resource Grants (P41)
Project #
5P41HD095831-02
Application #
9789705
Study Section
Special Emphasis Panel (ZHD1)
Project Start
Project End
Budget Start
2019-07-01
Budget End
2020-06-30
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213