Methods for genomic data with graphical structures

Lee, Hongzhe

Abstract

The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of large-scale multiple genomics data motivated by important biological questions and experiments. New high-throughput technologies and next generation sequencing are generating various types of very high-dimensional genomic and proteomic data and metadata (e.g., networks and pathways databases) in order to obtain a systems-level understanding of various complex phenotypes. As the amount and complexity of the data increases and as the questions being addressed become more sophisticated, statistical analysis methods that can integrate these genomic data and in the meanwhile can incorporate information about gene function and pathways into analysis of numerical vector/matrix data are required in order to draw valid statistical and biological inferences.
The specific aims of the current project are to develop new statistical models and methods for integrative analysis of genomic data in the context of pathways and networks. Motivated by analysis of genetic genomics data and diverse cancer genomic data, the first aim is to develop novel statistical methods for estimating genotype-adjusted precision matrix for a set of genes at the transcriptional levels. The resulting regression coefficient matrix and sparse precision matrix provide important information on gene regulation when the cis- and trans-genetic effects on gene expressions are adjusted.
The second aim i s to develop high dimensional instrumental variable regression for eQTL data analysis in order the identify the potential causal genes for a phenotype where the genome-wide genotypes are served as instrumental variables.
Aims 3 and 4 propose a set of new methods for gene set enrichment analysis, including methods for gene-set analysis by testing homogeneity of the covariance matrices and a class of multivariate random-set methods for integrative analysis of diverse genomic data. These methods hinge on novel integration of methods for high dimensional regression and high dimensional covariance matrix estimation and novel incorporation of prior functional gene sets and pathways. The new methods can be applied to different types of genomic data and will ideally help facilitate the identification of genes and their complex interactions as well as the biological pathways underlying various complex human diseases. The work proposed here will contribute statistical methodology to modeling high dimensional genomic data and to studying complex phenotypes and biological systems and offer insights into each of the biological areas represented by the various data sets. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers.

Public Health Relevance

This project aims to develop powerful statistical and computational methods for integrative analysis of diverse genomic data. The novel statistical methods are expected to gain more insights into how genomic perturbation and pathways dysfunction can lead to development of complex diseases such as neuroblastoma and human heart failure.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project (R01)
Project #: 5R01CA127334-08
Application #: 8851994
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Li, Jerry

Project Start: 2007-07-01
Project End: 2016-06-30
Budget Start: 2015-07-01
Budget End: 2016-06-30
Support Year: 8
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2016 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$293,687
NIH 2015 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania
NIH 2014 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania
NIH 2013 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$281,354
NIH 2012 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$304,000
NIH 2010 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$289,814
NIH 2009 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$290,671
NIH 2008 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$291,451
NIH 2007 R01 CA	Methods for genomic data with graphical structures Lee, Hongzhe / University of Pennsylvania	$292,160

Publications

Vajravelu, Ravy K; Scott, Frank I; Mamtani, Ronac et al. (2018) Medication class enrichment analysis: a novel algorithm to analyze multiple pharmacologic exposures simultaneously using electronic health record data. J Am Med Inform Assoc 25:780-789

Xia, Yin; Cai, Tianxi; Cai, T Tony (2018) Multiple Testing of Submatrices of a Precision Matrix with Applications to Identification of Between Pathway Interactions. J Am Stat Assoc 113:328-339

B Sohn, Michael; Li, Hongzhe (2018) A GLM-based latent variable ordination method for microbiome samples. Biometrics 74:448-457

Chen, Eric Z; Bushman, Frederic D; Li, Hongzhe (2017) A Model-Based Approach For Species Abundance Quantification Based On Shotgun Metagenomic Data. Stat Biosci 9:13-27

Shi, Pixu; Li, Hongzhe (2017) A model for paired-multinomial data and its application to analysis of data on a taxonomic tree. Biometrics 73:1266-1278

Zhao, Sihai Dave; Cai, T Tony; Cappola, Thomas P et al. (2017) Sparse simultaneous signal detection for identifying genetically controlled disease genes. J Am Stat Assoc 112:1032-1046

Liao, Katherine P; Sparks, Jeffrey A; Hejblum, Boris P et al. (2017) Phenome-Wide Association Study of Autoantibodies to Citrullinated and Noncitrullinated Epitopes in Rheumatoid Arthritis. Arthritis Rheumatol 69:742-749

Zhao, Sihai Dave; Cai, T Tony; Li, Hongzhe (2017) Optimal detection of weak positive latent dependence between two sequences of multiple tests. J Multivar Anal 160:169-184

Chen, Eric Z; Li, Hongzhe (2016) A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32:2611-7

Cai, T Tony; Li, Hongzhe; Liu, Weidong et al. (2016) Joint Estimation of Multiple High-dimensional Precision Matrices. Stat Sin 26:445-464

Showing the most recent 10 out of 63 publications

Comments

Be the first to comment on Hongzhe Lee's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: