The long-term objective of this project is to develop powerful and computationally-efficient statistical methods for statistical modeling of high-dimensional genomic data motivated by important biological problems and experiments.
The specific aims of the current project include developing novel survival analysis methods to model the heterogeneity in both patients and biomarkers in genomic studies and developing robust survival analysis methods to analyze high-dimensional genomic data. The proposed methods hinge on a novel integration of methods in high-dimensional data analysis, theory in statistical learning and methods in human genomics. The project will also investigate the robustness, power and efficiencies of these methods and compare them with existing methods. Results from applying the methods to studies of ovarian cancer, lung cancer, brain cancer will help ensure that maximal information is obtained from the high-throughput experiments conducted by our collaborators as well as data that are publicly available. Software will be made available through Bioconductor to ensure that the scientific community benefits from the methods developed.

Public Health Relevance

The last decade of advanced laboratory techniques has had a profound impact on genomic research, however, the development of corresponding statistical methods to analyze the data has not been in the same pace. This project aims to develop, evaluate, and disseminate powerful and computationally-efficient statistical methods to model the heterogeneity in both patients and biomarkers in genomic studies. We believe our proposed methods can help scientific community turn valuable high-throughput measurements into meaningful results.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Ramos, Erin
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Wisconsin Madison
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Li, Quefeng; Yu, Menggang; Wang, Sijian (2017) A Statistical Framework for Pathway and Gene Identification from Integrative Analysis. J Multivar Anal 156:1-17
Kong, Jing; Wang, Sijian; Wahba, Grace (2015) Using distance covariance for improved variable selection with application to learning genetic risk models. Stat Med 34:1708-20
Geng, Zhigeng; Wang, Sijian; Yu, Menggang et al. (2015) Group variable selection via convex log-exp-sum penalty with application to a breast cancer survivor study. Biometrics 71:53-62
Eng, Kevin H; Hanlon, Bret M; Bradley, William H et al. (2015) Prognostic factors modifying the treatment-free interval in recurrent ovarian cancer. Gynecol Oncol 139:228-35
Xu, Yaoyao; Yu, Menggang; Zhao, Ying-Qi et al. (2015) Regularized outcome weighted subgroup identification for differential treatment effects. Biometrics 71:645-53
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan et al. (2015) Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data. Cancer Inform 13:123-31