Tumors are complex biological systems. No single type of molecular approach fully elucidates tumor behavior, necessitating analysis at multiple levels encompassing genomics and proteomics. Therefore different types of data from numerous sources are now collected at a genome-wide scale, including: DNA copy number alterations, mRNA expression, protein expression measurements and many others. However, the full extent of biomedical information in these studies cannot be realized without effective statistical and computational methods. Thus, the long-term goal of this research is to develop innovative methods jointly modeling these different types of data to help uncover the large-scale organization of genes and proteins interacting. To tackle this challenge, this proposal begins in Aim 1 by developing new statistical and computational methods for identifying DNA/RNA/Protein interactions. We propose to use tools developed for graphics models and study conditional dependencies among genes/proteins with various conditional correlations.
Aim 2 proposes novel approaches to integrate the interaction network with disease phenotypes to improve biomarker identification and clinical outcome prediction. We will derive modules of genes/proteins which are associated with disease initiation/progression, and use boosting procedures to incorporate the module information into the predictive models. Sparse regression techniques together with proper smooth regularization will be used to handle the high-dimensionality and to account for the local correlation in both aims. The proposal uses two breast cancer studies as motivating examples. But the tools develop here can be well generalized to other disease. Success of this research will result in substantially improved statistical methods for large-scale integration studies, and thus help to increase mechanistic understanding of the contribution of genomic/proteomics alterations to tumor growth and progression, as well as facilitate the development of more effective molecular diagnostic and prognostic tests. Data from the two breast cancer studies will be used together with extensive simulation experiments to test and refine the methodology for real-world application.
|Hu, Jie Kate; Wang, Xianlong; Wang, Pei (2014) Testing gene-gene interactions in genome wide association studies. Genet Epidemiol 38:123-34|
|Danaher, Patrick; Wang, Pei; Witten, Daniela M (2014) The joint graphical lasso for inverse covariance estimation across multiple classes. J R Stat Soc Series B Stat Methodol 76:373-397|
|Chen, Lin S; Prentice, Ross L; Wang, Pei (2014) A penalized EM algorithm incorporating missing data mechanism for Gaussian parameter estimation. Biometrics 70:312-22|
|Wang, Pei; Chao, Dennis L; Hsu, Li (2011) Learning oncogenic pathways from binary genomic instability data. Biometrics 67:164-73|
|Peng, Jie; Wang, Pei; Zhou, Nengfeng et al. (2009) Partial Correlation Estimation by Joint Sparse Regression Models. J Am Stat Assoc 104:735-746|