The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of genomic data motivated by important biological questions and experiments.
The specific aim of the current project is to develop new statistical models and methods for analysis of genomic data with graphical structures, focusing on methods for analyzing genetic pathways and networks, including the development of nonparametric pathway-smooth tests for two-sample and analysis of variance problems for identifying pathways with perturbed activity between two or multiple experimental conditions, the development of group Lasso and group threshold gradient descent regularized estimation procedures for the pathway-smoothed generalized linear models, Cox proportional hazards models and the accelerated failure time models in order to identify pathways that are related to various clinical phenotypes. These methods hinge on novel integration of spectral graph theory, non-parametric methods for analysis of multivariate data and regularized estimation methods fro statistical learning. The new methods can be applied to different types of genomic data and will ideally facilitate the identification of genes and biological pathways underlying various complex human diseases and complex biological processes. The project will also investigate the robustness, power and efficiencies o these methods and compare them with existing methods. In addition, this project will develop practical a feasible computer programs in order to implement the proposed methods, to evaluate the performance o these methods through application to real data on microarray gene expression studies of human hear failure, cardiac allograft rejection and neuroblastoma. The work proposed here will contribute both statistical methodology to modeling genomic data with graphical structures, to studying complex phenotypes and biological systems and methods for high-dimensional data analysis, and offer insight into each of the clinical areas represented by the various data sets to evaluate these new methods. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers via the World Wide Web.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA127334-03
Application #
7599555
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Li, Jerry
Project Start
2007-07-01
Project End
2011-04-30
Budget Start
2009-05-01
Budget End
2010-04-30
Support Year
3
Fiscal Year
2009
Total Cost
$290,671
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789
Xia, Yin; Cai, Tianxi; Cai, T Tony (2018) Multiple Testing of Submatrices of a Precision Matrix with Applications to Identification of Between Pathway Interactions. J Am Stat Assoc 113:328-339
B Sohn, Michael; Li, Hongzhe (2018) A GLM-based latent variable ordination method for microbiome samples. Biometrics 74:448-457
Chen, Eric Z; Bushman, Frederic D; Li, Hongzhe (2017) A Model-Based Approach For Species Abundance Quantification Based On Shotgun Metagenomic Data. Stat Biosci 9:13-27
Shi, Pixu; Li, Hongzhe (2017) A model for paired-multinomial data and its application to analysis of data on a taxonomic tree. Biometrics 73:1266-1278
Zhao, Sihai Dave; Cai, T Tony; Cappola, Thomas P et al. (2017) Sparse simultaneous signal detection for identifying genetically controlled disease genes. J Am Stat Assoc 112:1032-1046
Liao, Katherine P; Sparks, Jeffrey A; Hejblum, Boris P et al. (2017) Phenome-Wide Association Study of Autoantibodies to Citrullinated and Noncitrullinated Epitopes in Rheumatoid Arthritis. Arthritis Rheumatol 69:742-749
Zhao, Sihai Dave; Cai, T Tony; Li, Hongzhe (2017) Optimal detection of weak positive latent dependence between two sequences of multiple tests. J Multivar Anal 160:169-184
Cai, Tianxi; Cai, T Tony; Zhang, Anru (2016) Structured Matrix Completion with Applications to Genomic Data Integration. J Am Stat Assoc 111:621-633
Cai, T Tony; Liu, Weidong (2016) Large-Scale Multiple Testing of Correlations. J Am Stat Assoc 111:229-240

Showing the most recent 10 out of 63 publications