The fruit fly, Drosophila melanogaster, has for the last century been fundamental to the study of genetics. It is used in many areas of research as the model organism of choice, as it provides the ability to study genetics in the laboratory and apply findings to human genetics. Its use as a model is due to two factors: First, its genetic code can be relatively easily manipulated in the laboratory and this coupled with a short life cycle, provides a means by which a gene or pathway function can be rapidly studied. Secondly, the vast majority of the fundamental biochemical mechanisms and pathways are conserved between fly and humans. Indeed, 75% of the genes that cause human disease are found in fly and, thus, the data collected in the fly can be used to provide insights into the same processes within humans. The emergence of a new technology, single cell RNA sequencing (scRNA-seq), has provided information as to which genes are switched on or most active from a single cell. Within the fly community this provides the ability to quickly map clusters of cells and cell types to the whole anatomy and link this to both phenotype and function. The increasing number of scRNA-seq datasets from different species has resulted in the development of the Single Cell Expression Atlas (scEA). This is a web portal which enables users to more easily visualize and interpret this data. It is anticipated that the level of fly single cell data will increase from 10 datasets to ~100 in 2020 and further two-fold increase in 2021. Key to the scientific exploitation of this data will be the ability of users to not only effectively analyze the fly data but also to examine the interconnections between fly data and human or mouse datasets.

This project will provide the means by which fly datasets can be easily interpreted and also linked to mouse and human datasets via scEA. The scEA currently hosts scRNA-seq data for over 500K assays and this includes data for the Human Cell Atlas (HCA) and Mouse Cell Atlas (MCA), amongst others. Analysis pipelines will be developed to combine the available and emerging datasets, alongside the necessary computational infrastructure to host the Fly Cell Atlas (FCA) datasets. ScEA will provide users with an easy to navigate web service with exploratory querying capability, in addition to data download capabilities for further data analysis. The service will be fully integrated with the established fly resources, Flybase, Virtual Fly Brain and the Drosophila Resources at Harvard University. This project will also develop a process for annotation of the datasets. This annotation step adds additional scientific information to the data which provides the user with a greater level of biological understanding and so aids the interpretation and analysis. This annotation will expand on the existing FlyBase anatomy ontology which is a structure of controlled vocabularies used to describe the anatomy of the fly this will ensure that there is full compatibility across new and existing resources. The scEA will develop and provide the means by which the data can be easily visualized and mined for cell types, while also providing the fly community with the ability to contribute their scientific expertise to the annotation. The scEA user interface will be further developed to provide a greater level of cross species query ability as the resulting FCA will be linked within scEA to the HCA, MCA and any further datasets enabling cross species comparisons which will aid in the discovery of novel biological insights. This project aims to provide the fly community with practical solutions for connecting, re-using and reanalyzing datasets and so will close the gap in translating biological discoveries in model organisms, such as the fruit fly, to humans and vice versa. This project will make the results of this comparative analysis rapidly available to the growing user community.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
2035515
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2020-08-01
Budget End
2024-07-31
Support Year
Fiscal Year
2020
Total Cost
$209,672
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138