Project Description: Genomewide association studies (GWAS) have identified >4000 genetic loci for a wide range of human traits, but still leaving a large proportion of heritability unexplained. In the post-GWAS era, geneticists are exploiting massively parallel sequencing technologies to study less common (minor allele frequency [MAF] 0.5- 5%) and rare (MAF<0.5%) variants, hereafter together referred to as rare variants for brevity. In the meantime, multiethnic GWAS, recognized as potentially more powerful for gene discovery and fine mapping, are receiving increasing attention from the genetics community. Among the multiethnic populations, admixed populations such as African Americans and Hispanic Americans are particularly attractive because they comprise more than 20% of the US population. These admixed populations offer a unique opportunity for gene mapping because one can utilize admixture linkage disequilibrium (LD) to search for genes underlying diseases that differ strikingly in prevalences across populations. However, little methodological work exists for admixed populations that can accommodate post-GWAS data. The methodological work lags in at least three major areas. First, there are few, if any, genotype imputation methods that are tailored to admixed samples, can accommodate the ever increasing public resources, and the typical mixture of genotyping and sequencing data among the study samples. Imputation will continue to play an essential role as sequencing will remain cost prohibitive for large GWAS collections of samples. Second, there has been no published work on practical issues regarding rare variant imputation in admixed populations. Third, despite the recent rich literature of statistical methods for rare variant association analysis in relatively homogenous populations, the field needs methods that can efficiently analyze rare variants in admixed samples, particularly with imputed or partially imputed data. In this application, we propose the following aims to fill in the above gaps: 1). Develop efficient hidden Markov model and Singular Value Decomposition based methods for haplotype-to-haplotype imputation in admixed populations;2). Assess quality of and provide practical guidelines on rare variants imputation in admixed populations;3). Develop a robust statistical test for the analysis of rare variants in admixed populations;and 4). Develop, distribute and support freely available software packages for the methods developed in this project.

Public Health Relevance

Genomewide association studies (GWAS) have identified >4000 genetic loci for a wide range of human traits, but still leaving a large proportion of heritability unexplained. In the post-GWAS era, geneticists are exploiting massively parallel sequencing technologies to study less common (minor allele frequency [MAF] 0.5- 5%) and rare (MAF<0.5%) variants, hereafter together referred to as rare variants for brevity. In the meantime, multiethnic GWAS, recognized as potentially more powerful for gene discovery and fine mapping, are receiving increasing attention from the genetics community. Among the multiethnic populations, admixed populations such as African Americans and Hispanic Americans are particularly attractive because they comprise more than 20% of the US population. These admixed populations offer a unique opportunity for gene mapping because one can utilize admixture linkage disequilibrium (LD) to search for genes underlying diseases that differ strikingly in prevalences across populations. However, little methodological work exists for admixed populations that can accommodate post-GWAS data. In this application, we will fill in methodological and practical gaps in the genetic analysis of rare variants in admixed populations

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006703-03
Application #
8634810
Study Section
Special Emphasis Panel (ZRG1-GGG-C (02))
Program Officer
Brooks, Lisa
Project Start
2012-05-16
Project End
2015-02-28
Budget Start
2014-03-01
Budget End
2015-02-28
Support Year
3
Fiscal Year
2014
Total Cost
$308,717
Indirect Cost
$85,855
Name
University of North Carolina Chapel Hill
Department
Genetics
Type
Schools of Medicine
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Duan, Qing; Xu, Zheng; Raffield, Laura M et al. (2018) A robust and powerful two-step testing procedure for local ancestry adjusted allelic association analysis in admixed populations. Genet Epidemiol 42:288-302
Luo, Yiwen; Maity, Arnab; Wu, Michael C et al. (2018) On the substructure controls in rare variant analysis: Principal components or variance components? Genet Epidemiol 42:276-287
Ju, Chelsea J-T; Zhao, Zhuangtian; Wang, Wei (2017) Efficient Approach to Correct Read Alignment for Pseudogene Abundance Estimates. IEEE/ACM Trans Comput Biol Bioinform 14:522-533
Hui, Daniel; Fang, Zhou; Lin, Jerome et al. (2017) LAIT: a local ancestry inference toolkit. BMC Genet 18:83
Raffield, Laura M; Zakai, Neil A; Duan, Qing et al. (2017) D-Dimer in African Americans: Whole Genome Sequence Analysis and Relationship to Cardiovascular Disease Risk in the Jackson Heart Study. Arterioscler Thromb Vasc Biol 37:2220-2227
Martin, Joshua S; Xu, Zheng; Reiner, Alex P et al. (2017) HUGIn: Hi-C Unifying Genomic Interrogator. Bioinformatics 33:3793-3795
Cannon, Maren E; Duan, Qing; Wu, Ying et al. (2017) Trans-ancestry Fine Mapping and Molecular Assays Identify Regulatory Variants at the ANGPTL8 HDL-C GWAS Locus. G3 (Bethesda) 7:3217-3227
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng et al. (2016) Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression. Genet Epidemiol 40:333-40
Lange, Ethan M; Ribado, Jessica V; Zuhlke, Kimberly A et al. (2016) Assessing the Cumulative Contribution of New and Established Common Genetic Risk Factors to Early-Onset Prostate Cancer. Cancer Epidemiol Biomarkers Prev 25:766-72
Xu, Zheng; Zhang, Guosheng; Wu, Cong et al. (2016) FastHiC: a fast and accurate algorithm to detect long-range chromosomal interactions from Hi-C data. Bioinformatics 32:2692-5

Showing the most recent 10 out of 42 publications