Cellular signal transduction system (CSTS) plays a fundamental role in maintaining homeostasis of a cell, and perturbations of CSTS lead to diseases such as cancers and diabetes. Most of cellular signaling pathways eventually regulate gene expression, thus the latter can be used as a universal bioassay reflecting the state of CSTS. However, the gene expression profile from a cell reflects a mixture of responses to all active signaling pathways, thus it is a challenge to de-convolute the signals embedded in the gene expression data. With advent of high throughput biotechnology, major public databases host the results of millions of gene expression assays, collected under diverse diseases as well as the experimental conditions designed by experimental biologists to probe almost all aspects of CSTS. The wealth of these big data provides unprecedented opportunities to investigate CSTS under physiological and pathological conditions, but it also poses unprecedented challenges: how to reveal signals from convoluted data and turn big data into knowledge. We hypothesize that given a sufficiently large compendium of gene expression data collected under diverse conditions, in which different parts of CSTS are perturbed (either by designed experiments or by diseases), the cellular signals embedded in the gene expression data can be revealed and their organization can inferred using current state-of-the-art deep learning models. In this project, we will compil a comprehensive compendium of human gene expression data and then employ modern deep-learning algorithms and supercomputers to mine the data.
We aim to reveal major cellular signals that regulate gene expression under physiological and pathological conditions and to infer the organization of signals in human CSTS. Combined the identified signals with genomic alteration data and drug response data, we aim to further identify pathways underlying disease such as cancers, to use the genomic data to predict drug sensitivity of cancer cell lines, and to predict patient clinical outcomes, in a pathway-centered manner.
This project aims to develop and apply novel ?deep learning? algorithms to mine a comprehensive compendium of genome-scale data in order to reveal the major signals regulating gene expression under physiological and pathological conditions. This knowledge can be further applied to understand disease mechanisms of cancers and guide personalized treatment of cancer patients.
Showing the most recent 10 out of 16 publications