Vast amounts of whole genome sequence and imputed sequence data are being generated for many complex traits and diseases. Most studies, e.g. UK10K, National Heart, Lung and Blood Institute-Exome Sequencing Project, have concentrated on detecting main effects. Pleiotropy, although an important phenomenon in genetic etiology, has not been adequately studied and methods are limited to detect pleiotropy for rare and imputed variants. Additionally, although there have been reports of pleiotropic loci it has been difficult to elucidate if these effects underlie disease etiology or are false positives. We will tackle this problem using a multi-prong approach that utilizes pleiotropic association testing, estimating tissue-specific disease heritability and detecting tissue-specific pleiotropy. To meet the goals of this study we will use omics data, implement previously developed methods and extend existing methods to analyze imputed and rare variants. To ensure discoveries for a large variety of complex diseases and traits e.g. asthma, type 2 diabetes, adiposity, and lipids, and to demonstrate that these methods are an effective approach to study pleiotropy, data from the UK Biobank (500,000 study subjects) will be analyzed. A split sample design will be employed in which 350,000 subjects (Release 2) for Discovery and 150,000 subjects (Release 1) for Replication. Secondary replication and fine mapping will be performed using TOPMed data which will have >150,000 individuals with whole genome sequence data with 26% of these individuals being African- American, 10% Hispanic, and 7% Asian. All methods will be implemented in our SEQSpark software which uses parallel processing to make it feasible to analyze hundreds of thousands of samples efficiently and quickly. Not only is this study expected to improve our understanding of the genetic etiology for complex diseases and traits, but it also has high public health significance; understanding pleiotropic effects will improve our ability to estimate genetic risk and provide insight into drug targets for the development of treatments of multiple diseases due to shared genetic architecture. The framework and software developed in this proposal will be available to the scientific community to apply to other large datasets for the identification of pleiotropic loci beyond those phenotypes described here.

Public Health Relevance

Using omics data, we will elucidate pleiotropic variants that play a role in a number of common traits and diseases that have a high public health significance that include: asthma, adiposity, type 2 diabetes, and blood lipid profiles. We will accomplish these goals by using a large population sample of 500,000 individuals with genome-wide imputed sequence data, >150,000 subjects with whole genome sequence (WGS) data and ~950 subjects with gene expression data from 53 tissues and WGS data. We will use statistical methods to detect pleiotropic effects and perform biological validation in order to bring about a better understanding of the role pleiotropy plays in complex disease etiology.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Infectious Diseases, Reproductive Health, Asthma and Pulmonary Conditions Study Section (IRAP)
Program Officer
Gan, Weiniu
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code