Recent advances in high-throughput bio-imaging technologies are enabling scientists to capture the spatio- temporal patterns of gene expression in cells, organs and individuals in efforts to generate a more comprehensive picture of genome function. Today, images capturing gene expression and protein localization patterns have unprecedented spatial resolution, resulting in high-quality maps (expression images) in model organisms. However, computational biology of gene expression patterns lags far behind genome informatics. Automated and efficient tools for analyzing expression images are a prerequisite for generating biological insights into gene functions, interactions and networks for the next generation of scientists. This project focuses on the development of novel tools and techniques for large-scale biological annotation and comparative analysis of gene expression patterns in the early development of a model organism (Drosophila melanogaster;the fruit fly). This choice is important because the fruit fly is a canonical model organism for understanding the development of humans and other animals. The availability of >100,000 images that capture gene expression patterns, provides an opportunity to examine the similarities and differences in the expression of the developing embryos of fruit fly genes, many of which show a very high sequence and biochemical function similarity to humans proteins. Three primary aims of the proposed research are as follows: (a) Develop machine learning methods that will use the existing, but coarse, knowledge in training, and will produce refined and new stage information for existing and future images. The knowledge of the precise developmental stage is important because it enables the biologically-meaningful mining of genes with similar spatial patterns, calculating the developmental trajectories of gene expressions, facilitating stage-sensitive textual annotation of expressions captured in images, and building genome-wide expression pattern maps at critical junctures in development. (b) Develop machine learning methods to describe expression patterns in words by using the existing controlled vocabulary. These descriptions will enable the use of efficient text-mining tools to identify genes expressed in similar organs and their precursors. These descriptions will also provide a better comparison of expression patterns across species, because many efforts in the scientific community relate organism specific controlled vocabularies with each other. (c) Develop transfer learning techniques for stage and text annotation that can be used for images generated from future techniques. This is important because traditionally, machine learning approaches assume that the training data and test data are drawn from the same distribution. However, new bio-imaging techniques are producing higher resolution data with substantially different color and intensity distributions, and robust methods that apply across techniques are desired.

Public Health Relevance

This project focuses on the analysis of gene expression patterns in the early development of Drosophila melanogaster. This fruit fly is a canonical model organism for understanding the development of humans and other animals. The proposed research will explicate the function and interconnection of animal genes and lead to a better understanding of human diseases.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM010730-03
Application #
8502758
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2011-08-15
Project End
2014-07-31
Budget Start
2013-08-01
Budget End
2014-07-31
Support Year
3
Fiscal Year
2013
Total Cost
$291,834
Indirect Cost
$95,736
Name
Arizona State University-Tempe Campus
Department
Genetics
Type
Organized Research Units
DUNS #
943360412
City
Tempe
State
AZ
Country
United States
Zip Code
85287
Liu, Yashu; Nie, Zhi; Zhou, Jiayu et al. (2014) Sparse generalized functional linear model for predicting remission status of depression patients. Pac Symp Biocomput :364-75
Stecher, Glen; Liu, Li; Sanderford, Maxwell et al. (2014) MEGA-MD: molecular evolutionary genetics analysis software with mutational diagnosis of amino acid variation. Bioinformatics 30:1305-7
Montiel, Ivan; Konikoff, Charlotte; Braun, Bremen et al. (2014) myFX: a turn-key software for laboratory desktops to analyze spatial patterns of gene expression in Drosophila embryos. Bioinformatics 30:1319-21
Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin et al. (2014) Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study. Neuroimage 87:220-41
Xiang, Shuo; Yuan, Lei; Fan, Wei et al. (2014) Bi-level multi-source learning for heterogeneous block-wise missing data. Neuroimage 102 Pt 1:192-206
Yuan, Lei; Pan, Cheng; Ji, Shuiwang et al. (2014) Automated annotation of developmental stages of Drosophila embryos in images containing spatial patterns of expression. Bioinformatics 30:266-73
Chen, Jianhui; Tang, Lei; Liu, Jun et al. (2013) A convex formulation for learning a shared predictive structure from multiple tasks. IEEE Trans Pattern Anal Mach Intell 35:1025-38
Zhou, Jiayu; Liu, Jun; Narayan, Vaibhav A et al. (2013) Modeling disease progression via multi-task learning. Neuroimage 78:233-48
Yuan, Lei; Wang, Yalin; Thompson, Paul M et al. (2012) Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. Neuroimage 61:622-32