Many chronic diseases are complex and very heterogeneous. They can be affected by multiple genes in combination with lifestyle and environmental factors, and patients of one disease can be divided into subgroups, e.g., cancer subtypes or stages of Alzheimer's disease (AD). One can use the genome-wide gene expression data to investigate these disease's molecular signatures, which may help understand disease etiology and guide precise treatments. Graphical models are powerful tools to estimate complex network interactions among a large number of genes. To develop biostatistical and machine learning methods to estimate such directed graphical models using gene expression data is the primary goal of this project. To this end, this project contains the following research activities: (1) Given known disease subtypes, Aim 1 develops novel techniques to jointly estimate multiple undirected/directed graphical models with one model per subtype. (2) In Aim 2, we consider the situation where disease subtypes are not defined a prior. We propose to identify disease subtypes by gene expression clustering, and the uncertainty of clustering is incorporated into the estimation of multiple directed graphical models in Aim 1. (3) Recent single cell RNAsequencing technology enables researchers to profile multiple cells from the same patient.
Aim 3 focuses on estimating multiple directed graphical models (e.g., for multiple subclones of tumor cells, or multiple types of brain cells) using single cell RNA-seq data of one patient. The effectiveness of the proposed graphical model estimation methods will be demonstrated using cancer and AD data analysis. The research results have great potential to offer new insights on the understanding and precise treatments of these diseases. Furthermore, these methods are general enough to be applied to analyze omic data of other diseases as well. The research team will disseminate computational efficient and user-friendly software packages, research publications, academic presentations and collaborations with experts in cancer research and neurological diseases.
The research to be carried out in this project will provide biomedical researchers with new tools to build graphical models that can be used to analyze big and complex omic data. These tools can be used to prioritize drug targets or to assess the consequence of drug with particular molecular targets, and thus are of great value for the research and practice of precision medicine.