African Americans and Hispanic Americans represent the two largest racial minority groups in the United States, comprising ~28% of the U.S. population. Both populations are recently admixed and have inherited ancestry from more than one continent. Admixed populations offer a unique opportunity for mapping disease genes that have large allele frequency differences between ancestral populations. Recently, admixture mapping has become one of the main approaches for gene mapping studies in admixed populations. However, admixture mapping has a substantially lower resolution than association analysis. With the increasing availability of large volumes of high-density SNP genotyping data generated in genome-wide association studies (GWAS), the analysis is now moving towards SNP association. However, current available association methods either inflate type I error rates or reduce statistical power when applied to admixed populations. Powerful statistical methods for admixed populations are still underdeveloped. In this application, we will develop powerful statistical methods that are targeted for the analysis of genetic data generated from admixed populations. All proposed aims are motivated by problems that arise in genetics studies on which we are currently working. We have access to 12 large-scale candidate gene and GWAS datasets including 19,135 African Americans and 2,002 Hispanic Americans. These datasets together with the publically available data from the 1000 Genomes Project provide an ideal basis to guide the methodological research in this project. To facilitate gene mapping studies in admixed populations, we propose the following aims: 1) Develop a unified statistical framework for genetic association analysis of unrelated individuals and family data sampled from admixed populations. 2) Develop statistical methods to identify SNPs that can explain an admixture mapping signal. 3) Develop statistical methods for association analysis of copy number variations in admixed populations. 4) Develop statistical methods for analysis of secondary phenotypes in a case-control GWAS in admixed populations. 5) Develop, distribute and support freely available software packages for methods proposed in this application. The methods will be evaluated through analytical approaches, computer simulations and applications to multiple real datasets.

Public Health Relevance

African Americans and Hispanic Americans represent the two largest racial minority groups in the U.S., comprising ~28% of the U.S. population. However, current genome-wide association studies (GWAS), an approach that scans markers across the whole genome to find disease susceptibility genes, has been primarily focused on European Americans. Methods for the analysis of admixed populations such as African Americans and Hispanic Americans are underdeveloped due to the complexities posed by population admixture. In this application, we will develop a suite of statistical and computational tools that are targeted for the analysis of genetic data generated from admixed populations.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG005854-02
Application #
8139935
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
2010-09-09
Project End
2013-06-30
Budget Start
2011-07-01
Budget End
2012-06-30
Support Year
2
Fiscal Year
2011
Total Cost
$330,988
Indirect Cost
Name
University of Pennsylvania
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Wang, Xuexia; Zhang, Shuanglin; Li, Yun et al. (2015) A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 39:294-305
Liu, Yichuan; Morley, Michael; Brandimarto, Jeffrey et al. (2015) RNA-Seq identifies novel myocardial gene expression signatures of heart failure. Genomics 105:83-9
Xu, Zheng; Duan, Qing; Yan, Song et al. (2015) DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31:2434-42
Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2014) Tissue-specific RNA-Seq in human evoked inflammation identifies blood and adipose LincRNA signatures of cardiometabolic diseases. Arterioscler Thromb Vasc Biol 34:902-12
Hu, Yu; Liu, Yichuan; Mao, Xianyun et al. (2014) PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Res 42:e20
Byrnes, Andrea E; Wu, Michael C; Wright, Fred A et al. (2013) The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 37:666-74
Liu, Eric Yi; Li, Mingyao; Wang, Wei et al. (2013) MaCH-admix: genotype imputation for admixed populations. Genet Epidemiol 37:25-37
Mao, Xianyun; Li, Yun; Liu, Yichuan et al. (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37:38-47
Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2013) Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. PLoS One 8:e66883
Chen, Hua Yun; Reilly, Muredach P; Li, Mingyao (2013) Semiparametric odds ratio model for case-control and matched case-control designs. Stat Med 32:3126-42

Showing the most recent 10 out of 18 publications