Statistical Methods for Next-Generation Sequence Data

Lee, Hongzhe

Abstract

The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of large-scale next-generation sequence (NGS) data motivated by important biological questions and experiments.
The specific aim of the current project is to develop new statistical models and computational methods for analysis of NGS data, focusing on robust methods for discovering copy number variants (CNVs) in germline DNAs, development of a general log-linear model for identifying alternative exon usages on one- and multi-sample RNA-seq data allowing for non-uniformity on short- read sequencing rates, development of novel nonparametric statistical methods for identifying histone modification sites based on the chromatin immunoprecipitation and high-throughput sequencing (ChIP-seq) data, and novel methods for analysis of metagenomic data from human microbiome studies. These problems are all motivated by the PI's close collaborations with Penn investigators. The methods hinge on novel integration of biological insights and methods for high dimensional data analysis, including detection and identification of sparse structured-signals, wavelet-based nonparametric regression and nonparametric hypothesis testing and penalized regression analysis for tree-structured covariates. The new methods can be applied to different types of NGS data and will ideally facilitate the identifications of genes and biological pathways underlying various complex human diseases and biological processes. The project will also investigate the robustness, power and efficiencies of these methods and compare them with existing methods. In addition, this project will develop practical and feasible computer programs in order to implement the proposed methods, to evaluate the performance of these methods through applications to NGS data sets related to CNV and RNA-seq analysis in African populations, linkage of peroxi- some proliferator activator receptor (PPAR)3 and adipose differentiation and insulin resistance and effects of diets on human microbiome. The work proposed here will contribute statistical methodology to modeling ultra-high dimensional next-generation sequence data and to studying complex phenotypes and biological systems and offer insights into each of the biological areas represented by the various data sets. All programs developed under this grant and detailed documentation will be made available free-of-charge to interested researchers.

Public Health Relevance

This project aims to develop powerful statistical and computational methods for analysis of next-generation sequence data, which has enabled comprehensive analysis of genomes, transcriptomes, and interactomes and micro- biomes. The novel statistical methods are expected to gain more insights into how genomic/metagenmic variations can lead to development of complex phenotypes such as cardiovascular phenotypes and insulin resistance and better understanding the genetic structural variations in African populations.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM097505-01A1
Application #: 8237259
Study Section: Special Emphasis Panel (ZRG1-HDM-S (02))
Program Officer: Brazhnik, Paul

Project Start: 2012-07-01
Project End: 2016-03-31
Budget Start: 2012-07-01
Budget End: 2013-03-31
Support Year: 1
Fiscal Year: 2012
Total Cost: $304,250
Indirect Cost: $109,250

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2015 R01 GM	Statistical Methods for Next-Generation Sequence Data Lee, Hongzhe / University of Pennsylvania	$303,282
NIH 2015 R01 GM	Statistical Methods for Next-Generation Sequence Data Lee, Hongzhe / University of Pennsylvania
NIH 2014 R01 GM	Statistical Methods for Next-Generation Sequence Data Lee, Hongzhe / University of Pennsylvania	$303,617
NIH 2013 R01 GM	Statistical Methods for Next-Generation Sequence Data Lee, Hongzhe / University of Pennsylvania	$293,302
NIH 2012 R01 GM	Statistical Methods for Next-Generation Sequence Data Lee, Hongzhe / University of Pennsylvania	$304,250

Publications

B Sohn, Michael; Li, Hongzhe (2018) A GLM-based latent variable ordination method for microbiome samples. Biometrics 74:448-457

Chen, Eric Z; Bushman, Frederic D; Li, Hongzhe (2017) A Model-Based Approach For Species Abundance Quantification Based On Shotgun Metagenomic Data. Stat Biosci 9:13-27

Shi, Pixu; Li, Hongzhe (2017) A model for paired-multinomial data and its application to analysis of data on a taxonomic tree. Biometrics 73:1266-1278

Zhao, Sihai Dave; Cai, T Tony; Cappola, Thomas P et al. (2017) Sparse simultaneous signal detection for identifying genetically controlled disease genes. J Am Stat Assoc 112:1032-1046

Zhao, Sihai Dave; Cai, T Tony; Li, Hongzhe (2017) Optimal detection of weak positive latent dependence between two sequences of multiple tests. J Multivar Anal 160:169-184

Chen, Eric Z; Li, Hongzhe (2016) A two-part mixed-effects model for analyzing longitudinal microbiome compositional data. Bioinformatics 32:2611-7

Cai, T Tony; Li, Hongzhe; Liu, Weidong et al. (2016) Joint Estimation of Multiple High-dimensional Precision Matrices. Stat Sin 26:445-464

Zhao, Ni; Chen, Jun; Carroll, Ian M et al. (2015) Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test. Am J Hum Genet 96:797-807

Kelly, Brendan J; Gross, Robert; Bittinger, Kyle et al. (2015) Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics 31:2461-8

Jia, Cheng; Hu, Yu; Liu, Yichuan et al. (2015) Mapping Splicing Quantitative Trait Loci in RNA-Seq. Cancer Inform 14:45-53

Showing the most recent 10 out of 35 publications

Comments

Be the first to comment on Hongzhe Lee's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: