Many chronic diseases are complex and very heterogeneous. They can be affected by multiple genes in combination with lifestyle and environmental factors, and patients of one disease can be divided into subgroups, e.g., cancer subtypes or stages of Alzheimer's disease (AD). One can use the genome-wide gene expression data to investigate these disease's molecular signatures, which may help understand disease etiology and guide precise treatments. Graphical models are powerful tools to estimate complex network interactions among a large number of genes. To develop biostatistical and machine learning methods to estimate such directed graphical models using gene expression data is the primary goal of this project. To this end, this project contains the following research activities: (1) Given known disease subtypes, Aim 1 develops novel techniques to jointly estimate multiple undirected/directed graphical models with one model per subtype. (2) In Aim 2, we consider the situation where disease subtypes are not defined a prior. We propose to identify disease subtypes by gene expression clustering, and the uncertainty of clustering is incorporated into the estimation of multiple directed graphical models in Aim 1. (3) Recent single cell RNAsequencing technology enables researchers to profile multiple cells from the same patient.
Aim 3 focuses on estimating multiple directed graphical models (e.g., for multiple subclones of tumor cells, or multiple types of brain cells) using single cell RNA-seq data of one patient. The effectiveness of the proposed graphical model estimation methods will be demonstrated using cancer and AD data analysis. The research results have great potential to offer new insights on the understanding and precise treatments of these diseases. Furthermore, these methods are general enough to be applied to analyze omic data of other diseases as well. The research team will disseminate computational efficient and user-friendly software packages, research publications, academic presentations and collaborations with experts in cancer research and neurological diseases.

Public Health Relevance

The research to be carried out in this project will provide biomedical researchers with new tools to build graphical models that can be used to analyze big and complex omic data. These tools can be used to prioritize drug targets or to assess the consequence of drug with particular molecular targets, and thus are of great value for the research and practice of precision medicine.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM126550-01
Application #
9459529
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Resat, Haluk
Project Start
2017-08-15
Project End
2021-07-31
Budget Start
2017-08-15
Budget End
2018-07-31
Support Year
1
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Yu-Feng Liu, Leo; Liu, Yufeng; Zhu, Hongtu et al. (2018) SMAC: Spatial multi-category angle-based classifier for high-dimensional neuroimaging data. Neuroimage 175:230-245
Fu, Sheng; Zhang, Sanguo; Liu, Yufeng (2018) Adaptively weighted large-margin angle-based classifiers. J Multivar Anal 166:282-299
Sun, Wei; Bunn, Paul; Jin, Chong et al. (2018) The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Res 46:3009-3018
Liu, Yang; He, Qianchan; Sun, Wei (2018) Association analysis using somatic mutations. PLoS Genet 14:e1007746
Zhang, Chong; Pham, Minh; Fu, Sheng et al. (2018) Robust Multicategory Support Vector Machines using Difference Convex Algorithm. Math Program 169:277-305
Zhao, Junlong; Yu, Guan; Liu, Yufeng (2018) ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT. Ann Stat 46:3362-3389
Chen, Jingxiang; Fu, Haoda; He, Xuanyao et al. (2018) Estimating individualized treatment rules for ordinal treatments. Biometrics 74:924-933
Wang, WeiBo; Sun, Wei; Wang, Wei et al. (2018) A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection. BMC Bioinformatics 19:74
Chen, Jingxiang; Zhang, Chong; Kosorok, Michael R et al. (2018) Double Sparsity Kernel Learning with Automatic Variable Selection and Data Extraction. Stat Interface 11:401-420