The long term objective of this project is to develop powerful and computationally efficient statistical methods of identifying genes underlying complex genetic diseases in humans.
The specific aim of this project is to continue to develop survival models to incorporate age of onset data, environmental covariates information, gene-environment interactions, and multiple disease loci into family-based association analysis, joint linkage and linkage disequilibrium analyses, and multipoint multi-trait-locus linkage analysis of complex human diseases. The proposed methods build on our current methods and hinge on novel integration of methods in multivariate survival analysis and methods in modern human genetics. The focus will be on the development of survival models for: (1) incorporating age of onset and environmental risk factors into genetic association study using a linkage disequlibrium based Cox model for family data of any size; (2) joint analysis of linkage and linkage disequilibrium for age of onset data based on nuclear families; (3) for multipoint multi-trait-locus linkage tests that can incorporate age of onset and environmental covariates data using the additive genetic frailty model. The project will also investigate the power and efficiencies of these methods, and compare them with existing methods. In addition, this project will develop practical and feasible computer programs in order to implement the proposed methods, to evaluate the performance of these methods through extensive simulations and application to real data on HLA-associated diseases, including type 1 diabetes, rheumatoid arthritis, celiac disease, narcolepsy, and ankylosing spondylitis. The work proposed here will contribute both statistical methodology to mapping genes for complex diseases and multivariate survival analysis, offer insight into each of the clinical areas represented by the various data sets to evaluate these new methods, and facilitate final identification of genes involved in these complex diseases. All programs developed under this grant and detailed documentations will be made available free-of-charge to interested researchers via the World Wide Web.

National Institute of Health (NIH)
National Institute of Environmental Health Sciences (NIEHS)
Research Project (R01)
Project #
Application #
Study Section
Genome Study Section (GNM)
Program Officer
Mcallister, Kimberly A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Yin, Jianxin; Li, Hongzhe (2013) Adjusting for High-dimensional Covariates in Sparse Precision Matrix Estimation by ýýý1-Penalization. J Multivar Anal 116:365-381
Vardhanabhuti, Saran; Li, Mingyao; Li, Hongzhe (2013) A Hierarchical Bayesian Model for Estimating and Inferring Differential Isoform Expression for Multi-Sample RNA-Seq Data. Stat Biosci 5:119-137
Yin, Jianxin; Li, Hongzhe (2012) Model Selection and Estimation in the Matrix Normal Graphical Model. J Multivar Anal 107:119-140
Kalli, Anastasia; Hess, Sonja (2012) Effect of mass spectrometric parameters on peptide and protein identification rates for shotgun proteomic experiments on an LTQ-orbitrap mass analyzer. Proteomics 12:21-31
Daye, Z John; Xie, Jichun; Li, Hongzhe (2012) A Sparse Structured Shrinkage Estimator for Nonparametric Varying-Coefficient Model with an Application in Genomics. J Comput Graph Stat 21:110-133
Daye, Z John; Li, Hongzhe; Wei, Zhi (2012) A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Res 40:e60
Sun, Hokeun; Li, Hongzhe (2012) Robust Gaussian graphical modeling via l1 penalization. Biometrics 68:1197-206
Cai, T Tony; Jeng, X Jessie; Li, Hongzhe (2012) Robust Detection and Identification of Sparse Segments in Ultra-High Dimensional Data Analysis. J R Stat Soc Series B Stat Methodol 74:773-797
Nguyen, Le B; Diskin, Sharon J; Capasso, Mario et al. (2011) Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility Loci. PLoS Genet 7:e1002026
Xie, Jichun; Cai, T Tony; Li, Hongzhe (2011) Sample size and power analysis for sparse signal recovery in genome-wide association studies. Biometrika 98:273-290

Showing the most recent 10 out of 52 publications