Incorporating Local Haplotype Sharing to Detect Genetic Associations

Guan, Yongtao

Abstract

This proposal aims to develop statistical models and computational methods to quantify the degree of local haplotype sharing between two individuals at an arbitrary marker, and to provide an in-depth understanding on how haplotypes affect disease phenotypes directly, or serve as genetic background for polymorphic sites to affect phenotypes differentially, and how haplotype background serves as a medium for rare variants to aggregate and affect phenotypes. An investigation into these problems will provide insights into etiology of complex traits and computational tools for disease association mapping, a strategic goal that NIH has invested heavily, and will shed light on the lingering puzzle of the missing heritability. Our haplotype method reinvents the haplotype association mapping to provide several benefits -- no phasing requirement, no sliding-window requirement, an ability to work directly with next-generation sequencing data, and enhanced interpretability of association findings. Because each SNP serves as a core SNP for its local haplotypes, our haplotype method has the same number of tests as the single SNP analysis. Detecting genetic associations accounting for haplotype backgrounds at each marker will shift paradigm for the large-scale genetic association studies. The single-marker test assumes that an allele has the same effect, independent of its haplotype background. Our fundamental assumption is that, depending on its local haplotype background, an allele can have a positive effect, zero effect, or a negative effect towards a phenotype (for ex- ample, due to local epistatic interactions). When all individuals share the same local haplotype background, our assumption reduces to the conventional assumption of homogeneous effect; when individuals have different local haplotype backgrounds, our assumption generates more power. For example, when an allele has a large effect when presenting on a particular haplotype background and zero effect otherwise, traditional analysis, which ignores the haplotype background, will fail to detect the association because the signal is diluted by individuals with other haplotype backgrounds. On the other hand, if correctly quantified, haplotype background can control and reduce the noise introduced by those individuals. Aggregating rare variants within an LD block makes the aggregation approach applicable to whole genome sequencing data. Current methods aggregate rare variants based on the gene annotation and are difficult to extend to whole genome sequencing data. Our method can quantify LD blocks, allowing for aggregation of rare variants in a LD block. This not only avoids arbitrariness in aggregating variants, but also contributes to interpret- ing associations. On the other hand, current methods aggregate rare variants ignoring the variants' haplotype background. This will inevitably lose power. An extreme example is analyzing sequencing data of the admixed samples, where ignoring the haplotype background is equivalent to not controlling for the local ancestry. Thus, we propose methods to aggregate the rare variants according to their haplotype background.

Public Health Relevance

Our proposed methods provide novel statistical methods and computational tools to analyze the existing SNP array data sets and the upcoming exome and whole genome sequencing data sets, increasing the association findings and adding value to current and future investments. Incorporating local haplotype sharing to detect genetic associations have the potential to detect novel associations, regions that harbor allelic heterogeneity, and associations that have large conditional effect sizes. Thus, our methods are extremely valuable to understanding disease etiology and pinpointing casual variants. Together, these will have profound impact on our ability to produce better treatment solutions, better prevention and improved healthcare.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 7R01HG008157-05
Application #: 9793542
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2018-08-01
Project End: 2019-02-28
Budget Start: 2018-08-01
Budget End: 2019-02-28
Support Year: 5
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Duke University
Department
Type
DUNS #: 044387793

City: Durham
State: NC
Country: United States
Zip Code: 27705

Related projects


NIH 2018 R01 HG	Incorporating Local Haplotype Sharing to Detect Genetic Associations Guan, Yongtao / Baylor College of Medicine
NIH 2018 R01 HG	Incorporating Local Haplotype Sharing to Detect Genetic Associations Guan, Yongtao / Duke University
NIH 2017 R01 HG	Incorporating Local Haplotype Sharing to Detect Genetic Associations Guan, Yongtao / Baylor College of Medicine	$336,115
NIH 2016 R01 HG	Incorporating Local Haplotype Sharing to Detect Genetic Associations Guan, Yongtao / Baylor College of Medicine	$372,949
NIH 2015 R01 HG	Incorporating Local Haplotype Sharing to Detect Genetic Associations Guan, Yongtao / Baylor College of Medicine

Publications

Zhou, Quan; Guan, Yongtao (2018) On the Null Distribution of Bayes Factors in Linear Regression. J Am Stat Assoc 113:1362-1371

Zhou, Quan; Zhao, Liang; Guan, Yongtao (2016) Strong Selection at MHC in Mexicans since Admixture. PLoS Genet 12:e1005847

Qi, Hongjian; Dong, Chengliang; Chung, Wendy K et al. (2016) Deep Genetic Connection Between Cancer and Developmental Disorders. Hum Mutat 37:1042-50

Comments

Be the first to comment on Yongtao Guan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: