Methods for Integrative Genomic Data Analysis

Lee, Hongzhe

Abstract

The broad, long-term objective of this project concerns the development of novel statistical methods, theory and computational tools for statistical modeling of large-scale multiple high-dimensional genomic data motivated by important biological questions and experiments. New high-throughput technologies and next generation sequencing are generating various types of very high-dimensional genetics, genomic, epigenomics, metabolomics data in order to obtain an integrative understanding of various complex phenotypes. As the types and complexity of the data increase and as the questions being addressed become more sophisticated, statistical methods that can both integrate these genomic data and incorporate information about gene function and pathways are required in order to draw valid statistical and biological inferences.
The specific aims of the current project are to develop new statistical models and methods for causal integrative analysis of eQTL data with genome wide genetic association data (GWAS) in order to identify the possible causal genes and pathways for disease phenotypes. Motivated by analysis of diverse genomic data, the first aim is to develop novel causal mediation analysis methods to identify the genes that mediate the effects of genetic variants on disease phenotypes by constructing gene regulatory networks based on eQTL data.
Aim 2 is to develop high-dimensional instrumental variables (HDIV) regression models in order to identify the phenotype-causing genes using eQTLs as possible instrumental variables.
Aims 3 develops methods for estimating the genetic relatedness between disease phenotype and gene expressions in order to identify the possible disease causing genes and biological pathways. Finally, Aim 4 is to develop statistical methods that can effectively integrate GTEx data with GWAS association summary statistics in order to identify possible causal disease genes and pathways. These methods hinge on novel integration of methods for multiple related high-dimensional regressions and high-dimensional causal inference. The new methods can be applied to different types of genomic data and will ideally help facilitate the identification of genes and their complex interactions as well as the biological pathways underlying various complex human diseases. The work proposed here will contribute statistical methodology and theory for modeling high-dimensional genomic data and to studying complex phenotypes and biological systems and o er insights into each of the biological areas represented by the various data sets, including Alzheimer's disease, cardiometabolic syndrome, and chronic kidney disease. All algorithms and software tools developed under this grant and detailed documentation will be made available free-of-charge to interested researchers.

Public Health Relevance

and Relevance to Public Health This project aims to develop powerful statistical and computational methods for integrative analysis of diverse genomic data. The novel statistical methods are expected to gain more insights into how genomic perturbation and pathways dysfunction can lead to development of complex diseases such as Alzheimer's disease, cardiometabolic syndrome, and chronic kidney disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM129781-02
Application #: 9752369
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brazhnik, Paul

Project Start: 2018-09-01
Project End: 2022-06-30
Budget Start: 2019-07-01
Budget End: 2020-06-30
Support Year: 2
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Biostatistics & Other Math Sci
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2020 R01 GM	Methods for Integrative Genomic Data Analysis Lee, Hongzhe / University of Pennsylvania
NIH 2019 R01 GM	Methods for Integrative Genomic Data Analysis Lee, Hongzhe / University of Pennsylvania
NIH 2018 R01 GM	Methods for Integrative Genomic Data Analysis Lee, Hongzhe / University of Pennsylvania

Publications

Gao, Yuan; Li, Hongzhe (2018) Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples. Nat Methods 15:1041-1044

Comments

Be the first to comment on Hongzhe Lee's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: