In response to PA-17-088, ?Secondary Analyses of Existing Cohorts, Data Sets and Stored Biospecimens to Address Clinical Aging Research Questions (R01)?, we propose integrating existing GWAS summary data of Alzheimer's disease (AD) with existing proteomic and metabolomic quantitative trait locus (pQTL/mQTL) data to identify proteins and metabolites putatively causal to AD. The overarching goal is to both boost statistical power and enhance interpretability for causal inference in the post-GWAS era by leveraging many published large-scale GWAS summary association datasets and omic data. In an emerging and increasingly in?uential approach called transcriptome-wide association studies (TWAS), by integrating GWAS summary data with gene expression (or eQTL) data, one aims to improve over the current practice of GWAS to not only increase statistical power to identify more genetic variants associated with GWAS traits, but also link the (non-coding) genetic variants to their target genes, thus gaining insights into the genetic basis of common diseases and complex traits. In practice, however, TWAS may fail to identify true causal genes while giving false positives due to the violation of its modeling assumptions (e.g. due to LD or horizontal pleiotropy of SNPs). We ?rst propose three new methods to check possible violations of modeling assumptions in TWAS, then propose two more robust and powerful approaches that improve over the standard TWAS. Next, we extend TWAS to xWAS to integrate GWAS with proteomic and metabolomic traits (i.e. pQTL and mQTL), to identify (putatively) causal proteins and metabolites, analogous to detecting causal genes/transcripts in TWAS. We apply the new (and existing) methods to integrate large-scale GWAS summary data of AD and atrial ?brillation (AF) with pQTL and mQTL to identify putatively causal proteins and metabolites for AD and AF respectively, and to investigate whether AF is causal to AD, thus not only advancing our understanding of the etiology of AD and AF, but also possibly offering modi?able targets for interventions on the two devastating diseases. Finally, we will develop and disseminate publicly available software implementing the proposed analysis methods, e.g. as R packages, to facilitate the wide use by the scienti?c community.
This proposed research is expected to not only advance statistical analysis for causal inference to identify novel and putatively causal proteins and metabolites for Alzheimer's disease (AD) and atrial ?brillation (AF), but also contribute valuable computational tools to the elucidation of genetic components of common diseases, thus facilitating their prevention, early diagnosis and therapeutic development.