Project Description: Genomewide association studies (GWAS) have identified >4000 genetic loci for a wide range of human traits, but still leaving a large proportion of heritability unexplained. In the post-GWAS era, geneticists are exploiting massively parallel sequencing technologies to study less common (minor allele frequency [MAF] 0.5- 5%) and rare (MAF<0.5%) variants, hereafter together referred to as rare variants for brevity. In the meantime, multiethnic GWAS, recognized as potentially more powerful for gene discovery and fine mapping, are receiving increasing attention from the genetics community. Among the multiethnic populations, admixed populations such as African Americans and Hispanic Americans are particularly attractive because they comprise more than 20% of the US population. These admixed populations offer a unique opportunity for gene mapping because one can utilize admixture linkage disequilibrium (LD) to search for genes underlying diseases that differ strikingly in prevalences across populations. However, little methodological work exists for admixed populations that can accommodate post-GWAS data. The methodological work lags in at least three major areas. First, there are few, if any, genotype imputation methods that are tailored to admixed samples, can accommodate the ever increasing public resources, and the typical mixture of genotyping and sequencing data among the study samples. Imputation will continue to play an essential role as sequencing will remain cost prohibitive for large GWAS collections of samples. Second, there has been no published work on practical issues regarding rare variant imputation in admixed populations. Third, despite the recent rich literature of statistical methods for rare variant association analysis in relatively homogenous populations, the field needs methods that can efficiently analyze rare variants in admixed samples, particularly with imputed or partially imputed data. In this application, we propose the following aims to fill in the above gaps: 1). Develop efficient hidden Markov model and Singular Value Decomposition based methods for haplotype-to-haplotype imputation in admixed populations;2). Assess quality of and provide practical guidelines on rare variants imputation in admixed populations;3). Develop a robust statistical test for the analysis of rare variants in admixed populations;and 4). Develop, distribute and support freely available software packages for the methods developed in this project.

Public Health Relevance

Genomewide association studies (GWAS) have identified >4000 genetic loci for a wide range of human traits, but still leaving a large proportion of heritability unexplained. In the post-GWAS era, geneticists are exploiting massively parallel sequencing technologies to study less common (minor allele frequency [MAF] 0.5- 5%) and rare (MAF<0.5%) variants, hereafter together referred to as rare variants for brevity. In the meantime, multiethnic GWAS, recognized as potentially more powerful for gene discovery and fine mapping, are receiving increasing attention from the genetics community. Among the multiethnic populations, admixed populations such as African Americans and Hispanic Americans are particularly attractive because they comprise more than 20% of the US population. These admixed populations offer a unique opportunity for gene mapping because one can utilize admixture linkage disequilibrium (LD) to search for genes underlying diseases that differ strikingly in prevalences across populations. However, little methodological work exists for admixed populations that can accommodate post-GWAS data. In this application, we will fill in methodological and practical gaps in the genetic analysis of rare variants in admixed populations

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG006703-03
Application #
8634810
Study Section
Special Emphasis Panel (ZRG1-GGG-C (02))
Program Officer
Brooks, Lisa
Project Start
2012-05-16
Project End
2015-02-28
Budget Start
2014-03-01
Budget End
2015-02-28
Support Year
3
Fiscal Year
2014
Total Cost
$308,717
Indirect Cost
$85,855
Name
University of North Carolina Chapel Hill
Department
Genetics
Type
Schools of Medicine
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Zhang, Guosheng; Huang, Kuan-Chieh; Xu, Zheng et al. (2016) Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression. Genet Epidemiol 40:333-40
Xu, Zheng; Zhang, Guosheng; Duan, Qing et al. (2016) HiView: an integrative genome browser to leverage Hi-C results for the interpretation of GWAS variants. BMC Res Notes 9:159
Lange, Ethan M; Ribado, Jessica V; Zuhlke, Kimberly A et al. (2016) Assessing the Cumulative Contribution of New and Established Common Genetic Risk Factors to Early-Onset Prostate Cancer. Cancer Epidemiol Biomarkers Prev 25:766-72
Fan, Ruzong; Wang, Yifan; Chiu, Chi-Yang et al. (2016) Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 202:457-70
Hu, Yi-Juan; Li, Yun; Auer, Paul L et al. (2015) Integrative analysis of sequencing and array genotype data for discovering disease associations with rare mutations. Proc Natl Acad Sci U S A 112:1019-24
Wang, Xuexia; Zhang, Shuanglin; Li, Yun et al. (2015) A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 39:294-305
Cheng, Wei; Shi, Yu; Zhang, Xiang et al. (2015) Fast and robust group-wise eQTL mapping using sparse graphical models. BMC Bioinformatics 16:2
Li, Jin; Lange, Leslie A; Duan, Qing et al. (2015) Genome-wide admixture and association study of serum iron, ferritin, transferrin saturation and total iron binding capacity in African Americans. Hum Mol Genet 24:572-81
Fan, Ruzong; Wang, Yifan; Boehnke, Michael et al. (2015) Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models. Genetics 200:1089-104
Wang, WeiBo; Wang, Wei; Sun, Wei et al. (2015) Allele-specific copy-number discovery from whole-genome and whole-exome sequencing. Nucleic Acids Res 43:e90

Showing the most recent 10 out of 31 publications