Statistical analysis of large genomic data sets

Zhu, Xiaofeng

Abstract

Heritability analysis in the largest whole genome sequence (WGS) dataset, the NHLBI Trans-omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), strongly suggested that ?missing heritability? can be attributed to rare variants that are not well targeted by array-based genotype variants. Large genome wide association studies (GWAS), complemented by whole genome sequencing studies (WGS), will be a cost efficient strategy to identify genetic variants and understand the genetic architecture of complex traits. Multiple large Biobanks with SNP-array data and whole genome sequencing data, such as the NHLBI Trans-omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), provide an unprecedented but challenging opportunity to understand the genetic mechanisms underlying complex diseases. We have identified three pressing challenges in utilizing large GWAS and WGS datasets and propose the following four specific aims to meet the challenges: 1) Differentiate horizontal pleiotropy from mediation using GWAS summary statistics and apply the methods to publicly existing data. 2) Prioritize genetic variants sensitive to interactions, and estimate the overall contribution of interactions to a phenotype. 3) Incorporate family linkage/local ancestry to identify genetic variants in the TOPMed whole genome sequencing data. 4) Develop corresponding software that will be made publicly available. We will apply our new analytic methods to TOPMED WGS, UK Biobank data and many existing GWAS summary statistics. Our data analysis will focus on blood pressure, obesity and sleep disorders, and their effects on disease outcomes such as cardiovascular disease, diabetes, heart failure and dementia.

Public Health Relevance

A large amount of genetic data on complex traits, such as blood pressure, obesity and sleep disorders, has been accumulated. But the genetic architecture of these traits is still poorly understood, and the knowledge gained has limited use in clinical application. In this proposal we propose to develop novel statistical methods and software tools for analyzing multiple correlated traits using available summary statistics to address the causal relationships among such traits, for prioritizing genetic variants sensitive to environmental interaction effects, for estimating the overall contribution of interactions to a particular trait, and for detecting causal rare genetic variants from whole genome sequencing data. We will apply the new methods to large available genome-wide summary statistics, the UK biobank data and the whole genome sequencing data from the NHLBI Trans-omics for Precision Medicine (TOPMed). We will focus on genes that predispose to blood pressure, obesity and sleep disorders, and their effects on other disease outcomes, including cardiovascular disease, diabetes, heart failure and dementia.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG011052-02
Application #: 10161804
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Sofia, Heidi J

Project Start: 2020-05-08
Project End: 2024-02-29
Budget Start: 2021-03-01
Budget End: 2022-02-28
Support Year: 2
Fiscal Year: 2021
Total Cost
Indirect Cost

Institution

Name: Case Western Reserve University
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 077758407

City: Cleveland
State: OH
Country: United States
Zip Code: 44106

Related projects


NIH 2021 R01 HG	Statistical analysis of large genomic data sets Zhu, Xiaofeng / Case Western Reserve University
NIH 2020 R01 HG	Statistical analysis of large genomic data sets Zhu, Xiaofeng / Case Western Reserve University

Comments

Be the first to comment on Xiaofeng Zhu's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: