Tumors are complex biological systems. No single type of molecular approach fully elucidates tumor behavior, necessitating analysis at multiple levels encompassing genomics and proteomics. Therefore different types of data from numerous sources are now collected at a genome-wide scale, including: DNA copy number alterations, mRNA expression, protein expression measurements and many others. However, the full extent of biomedical information in these studies cannot be realized without effective statistical and computational methods. Thus, the long-term goal of this research is to develop innovative methods jointly modeling these different types of data to help uncover the large-scale organization of genes and proteins interacting. To tackle this challenge, this proposal begins in Aim 1 by developing new statistical and computational methods for identifying DNA/RNA/Protein interactions. We propose to use tools developed for graphics models and study conditional dependencies among genes/proteins with various conditional correlations.
Aim 2 proposes novel approaches to integrate the interaction network with disease phenotypes to improve biomarker identification and clinical outcome prediction. We will derive modules of genes/proteins which are associated with disease initiation/progression, and use boosting procedures to incorporate the module information into the predictive models. Sparse regression techniques together with proper smooth regularization will be used to handle the high-dimensionality and to account for the local correlation in both aims. The proposal uses two breast cancer studies as motivating examples. But the tools develop here can be well generalized to other disease. Success of this research will result in substantially improved statistical methods for large-scale integration studies, and thus help to increase mechanistic understanding of the contribution of genomic/proteomics alterations to tumor growth and progression, as well as facilitate the development of more effective molecular diagnostic and prognostic tests. Data from the two breast cancer studies will be used together with extensive simulation experiments to test and refine the methodology for real-world application.
|Fu, Rong; Wang, Pei; Ma, Weiping et al. (2017) A statistical method for detecting differentially expressed SNVs based on next-generation RNA-seq data. Biometrics 73:42-51|
|Zhou, Yan; Wang, Pei; Wang, Xianlong et al. (2017) Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis. Genet Epidemiol 41:70-80|
|Petralia, Francesca; Song, Won-Min; Tu, Zhidong et al. (2016) New Method for Joint Network Analysis Reveals Common and Different Coexpression Patterns among Genes and Proteins in Breast Cancer. J Proteome Res 15:743-54|
|Wang, Xianlong; Qin, Li; Zhang, Hexin et al. (2015) A regularized multivariate regression approach for eQTL analysis. Stat Biosci 7:129-146|
|Danaher, P; Paul, D; Wang, P (2015) Covariance-based analyses of biological pathways. Biometrika 102:533-544|
|Petralia, Francesca; Wang, Pei; Yang, Jialiang et al. (2015) Integrative random forest for gene regulatory network inference. Bioinformatics 31:i197-205|
|Teixeira, Leonardo K; Wang, Xianlong; Li, Yongjiang et al. (2015) Cyclin E deregulation promotes loss of specific genomic regions. Curr Biol 25:1327-33|
|Danaher, Patrick; Wang, Pei; Witten, Daniela M (2014) The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc Series B Stat Methodol 76:373-397|
|Hu, Jie Kate; Wang, Xianlong; Wang, Pei (2014) Testing gene-gene interactions in genome wide association studies. Genet Epidemiol 38:123-34|
|Cheng, Jie; Levina, Elizaveta; Wang, Pei et al. (2014) A sparse Ising model with covariates. Biometrics 70:943-53|
Showing the most recent 10 out of 28 publications