Annotation and curation of large-scale spatial gene expression data for sea urchins Spatial gene expression data are an invaluable source of information, as they provide a simultaneous assessment of transcript distribution over a large field of cells or even throughout entire embryos. The acquisition of spatial expression data remains in most species a slow process, which is why large scale collections of publicly accessible spatial gene expression data are only available for very few species. However, given the importance of gene expression data for developmental biology, in particular for the analysis of gene regulatory networks, these collections are widely used and benefit larger communities. The extensive analysis of gene regulatory networks in the sea urchin embryo have over the last two decades produced a large set of spatial gene expression data that unfortunately remains accessible only through individual publications. In addition, our lab has in the last few years conducted a systematic analysis of the spatial expression of regulatory genes during the first three days of sea urchin development. This analysis includes almost all genes encoding known transcription factors, approximately 350 regulatory genes in total. For every gene, spatial expression was analyzed at five stages during the first 72h of sea urchin development by whole mount in situ hybridization, and dozens to hundreds of microcopy images were acquired for each sample to capture different embryo orientations and different focal depths. The result are >220,000 images that without proper curation and annotation will remain difficult to access for the broader community, which includes scientists working with echinoderms and also an increasing number of scientists interested in comparative developmental biology. In this project, we will curate the existing set of expression data by collecting images of stained embryos from the newly generated dataset, and by selecting and processing for each gene and stage a small number of representative images for inclusion into a database. We will include also data from past research projects that focused mainly on the expression of regulatory genes during pre-gastrular development. We will complete the ongoing annotation of observed spatial gene expression patterns in order to enhance the accessibility of the spatial expression data. Furthermore, this project will develop a controlled vocabulary that will facilitate the consistent description of spatial gene expression domains that so far are characterized only at the molecular level, by expression of regulatory genes. Finally, we will use these data to generate a publicly accessible database of spatial expression data for sea urchins, which will display microcopy images of stained embryos along with detailed annotations of expression patterns, searchable by gene name and developmental stage.

Public Health Relevance

The present project contributes to a larger research effort to decipher the genomic control of development. In recent years it has become clear that the regulation of gene expression is a crucial determinant of development and health, and the disruption of these control mechanisms is the cause of numerous serious diseases, including developmental defects, physiological disorders and cancer. A thorough understanding of how these control mechanisms work is essential for identifying and treating any malfunctions. This is a basic research proposal with the objective to enhance the access to gene expression data and thus facilitate research on genomic control mechanisms.

Agency
National Institute of Health (NIH)
Institute
Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD)
Type
Small Research Grants (R03)
Project #
1R03HD094047-01
Application #
9437286
Study Section
National Institute of Child Health and Human Development Initial Review Group (CHHD)
Program Officer
Coulombe, James N
Project Start
2018-06-15
Project End
2020-05-31
Budget Start
2018-06-15
Budget End
2019-05-31
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
California Institute of Technology
Department
Type
Schools of Arts and Sciences
DUNS #
009584210
City
Pasadena
State
CA
Country
United States
Zip Code
91125