Project Description: Genomewide association studies (GWAS) have identified >4000 genetic loci for a wide range of human traits, but still leaving a large proportion of heritability unexplained. In the post-GWAS era, geneticists are exploiting massively parallel sequencing technologies to study less common (minor allele frequency [MAF] 0.5- 5%) and rare (MAF<0.5%) variants, hereafter together referred to as rare variants for brevity. In the meantime, multiethnic GWAS, recognized as potentially more powerful for gene discovery and fine mapping, are receiving increasing attention from the genetics community. Among the multiethnic populations, admixed populations such as African Americans and Hispanic Americans are particularly attractive because they comprise more than 20% of the US population. These admixed populations offer a unique opportunity for gene mapping because one can utilize admixture linkage disequilibrium (LD) to search for genes underlying diseases that differ strikingly in prevalences across populations. However, little methodological work exists for admixed populations that can accommodate post-GWAS data. The methodological work lags in at least three major areas. First, there are few, if any, genotype imputation methods that are tailored to admixed samples, can accommodate the ever increasing public resources, and the typical mixture of genotyping and sequencing data among the study samples. Imputation will continue to play an essential role as sequencing will remain cost prohibitive for large GWAS collections of samples. Second, there has been no published work on practical issues regarding rare variant imputation in admixed populations. Third, despite the recent rich literature of statistical methods for rare variant association analysis in relatively homogenous populations, the field needs methods that can efficiently analyze rare variants in admixed samples, particularly with imputed or partially imputed data. In this application, we propose the following aims to fill in the above gaps: 1). Develop efficient hidden Markov model and Singular Value Decomposition based methods for haplotype-to-haplotype imputation in admixed populations;2). Assess quality of and provide practical guidelines on rare variants imputation in admixed populations;3). Develop a robust statistical test for the analysis of rare variants in admixed populations;and 4). Develop, distribute and support freely available software packages for the methods developed in this project.

Public Health Relevance

Genomewide association studies (GWAS) have identified >4000 genetic loci for a wide range of human traits, but still leaving a large proportion of heritability unexplained. In the post-GWAS era, geneticists are exploiting massively parallel sequencing technologies to study less common (minor allele frequency [MAF] 0.5- 5%) and rare (MAF<0.5%) variants, hereafter together referred to as rare variants for brevity. In the meantime, multiethnic GWAS, recognized as potentially more powerful for gene discovery and fine mapping, are receiving increasing attention from the genetics community. Among the multiethnic populations, admixed populations such as African Americans and Hispanic Americans are particularly attractive because they comprise more than 20% of the US population. These admixed populations offer a unique opportunity for gene mapping because one can utilize admixture linkage disequilibrium (LD) to search for genes underlying diseases that differ strikingly in prevalences across populations. However, little methodological work exists for admixed populations that can accommodate post-GWAS data. In this application, we will fill in methodological and practical gaps in the genetic analysis of rare variants in admixed populations

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006703-02
Application #
8470204
Study Section
Special Emphasis Panel (ZRG1-GGG-C (02))
Program Officer
Brooks, Lisa
Project Start
2012-05-16
Project End
2015-02-28
Budget Start
2013-03-01
Budget End
2014-02-28
Support Year
2
Fiscal Year
2013
Total Cost
$302,119
Indirect Cost
$83,665
Name
University of North Carolina Chapel Hill
Department
Genetics
Type
Schools of Medicine
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Huang, Kuan-Chieh; Sun, Wei; Wu, Ying et al. (2014) Association studies with imputed variants using expectation-maximization likelihood-ratio tests. PLoS One 9:e110679
Cheng, Wei; Zhang, Xiang; Guo, Zhishan et al. (2014) Graph-regularized dual Lasso for robust eQTL mapping. Bioinformatics 30:i139-48
Mazrouee, Sepideh; Wang, Wei (2014) FastHap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs. Bioinformatics 30:i371-8
Yan, Song; Li, Yun (2014) BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing. Bioinformatics 30:480-7
Zhang, Zhaojun; Wang, Wei (2014) RNA-Skim: a rapid method for RNA-Seq quantification at transcript level. Bioinformatics 30:i283-i292
Bizon, Chris; Spiegel, Michael; Chasse, Scott A et al. (2014) Variant calling in low-coverage whole genome sequencing of a Native American population sample. BMC Genomics 15:85
Kang, Jian; Huang, Kuan-Chieh; Xu, Zheng et al. (2013) AbCD: arbitrary coverage design for sequencing-based genetic studies. Bioinformatics 29:799-801
Byrnes, Andrea E; Wu, Michael C; Wright, Fred A et al. (2013) The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 37:666-74
Mao, Xianyun; Li, Yun; Liu, Yichuan et al. (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37:38-47
Huang, Jie; Liu, Eric Y; Welch, Ryan et al. (2013) WikiGWA: an open platform for collecting and using genome-wide association results. Eur J Hum Genet 21:471-3

Showing the most recent 10 out of 12 publications