African Americans and Hispanic Americans represent the two largest racial minority groups in the United States, comprising ~28% of the U.S. population. Both populations are recently admixed and have inherited ancestry from more than one continent. Admixed populations offer a unique opportunity for mapping disease genes that have large allele frequency differences between ancestral populations. Recently, admixture mapping has become one of the main approaches for gene mapping studies in admixed populations. However, admixture mapping has a substantially lower resolution than association analysis. With the increasing availability of large volumes of high-density SNP genotyping data generated in genome-wide association studies (GWAS), the analysis is now moving towards SNP association. However, current available association methods either inflate type I error rates or reduce statistical power when applied to admixed populations. Powerful statistical methods for admixed populations are still underdeveloped. In this application, we will develop powerful statistical methods that are targeted for the analysis of genetic data generated from admixed populations. All proposed aims are motivated by problems that arise in genetics studies on which we are currently working. We have access to 12 large-scale candidate gene and GWAS datasets including 19,135 African Americans and 2,002 Hispanic Americans. These datasets together with the publically available data from the 1000 Genomes Project provide an ideal basis to guide the methodological research in this project. To facilitate gene mapping studies in admixed populations, we propose the following aims: 1) Develop a unified statistical framework for genetic association analysis of unrelated individuals and family data sampled from admixed populations. 2) Develop statistical methods to identify SNPs that can explain an admixture mapping signal. 3) Develop statistical methods for association analysis of copy number variations in admixed populations. 4) Develop statistical methods for analysis of secondary phenotypes in a case-control GWAS in admixed populations. 5) Develop, distribute and support freely available software packages for methods proposed in this application. The methods will be evaluated through analytical approaches, computer simulations and applications to multiple real datasets.

Public Health Relevance

African Americans and Hispanic Americans represent the two largest racial minority groups in the U.S., comprising ~28% of the U.S. population. However, current genome-wide association studies (GWAS), an approach that scans markers across the whole genome to find disease susceptibility genes, has been primarily focused on European Americans. Methods for the analysis of admixed populations such as African Americans and Hispanic Americans are underdeveloped due to the complexities posed by population admixture. In this application, we will develop a suite of statistical and computational tools that are targeted for the analysis of genetic data generated from admixed populations.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Brooks, Lisa
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Biostatistics & Other Math Sci
Schools of Medicine
United States
Zip Code
Hu, Yu; Liu, Yichuan; Mao, Xianyun et al. (2014) PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Res 42:e20
Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2014) Tissue-specific RNA-Seq in human evoked inflammation identifies blood and adipose LincRNA signatures of cardiometabolic diseases. Arterioscler Thromb Vasc Biol 34:902-12
Mao, Xianyun; Li, Yun; Liu, Yichuan et al. (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37:38-47
Liu, Yichuan; Ferguson, Jane F; Xue, Chenyi et al. (2013) Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. PLoS One 8:e66883
Liu, Eric Yi; Li, Mingyao; Wang, Wei et al. (2013) MaCH-admix: genotype imputation for admixed populations. Genet Epidemiol 37:25-37
Byrnes, Andrea E; Wu, Michael C; Wright, Fred A et al. (2013) The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 37:666-74
Chen, Hua Yun; Reilly, Muredach P; Li, Mingyao (2013) Semiparametric odds ratio model for case-control and matched case-control designs. Stat Med 32:3126-42
Li, Mingyao; Wang, Isabel X; Li, Yun et al. (2011) Widespread RNA and DNA sequence differences in the human transcriptome. Science 333:53-8
Chen, Hua Yun; Li, Mingyao (2011) Improving power and robustness for detecting genetic association with extreme-value sampling design. Genet Epidemiol 35:823-30
Feng, Tao; Elston, Robert C; Zhu, Xiaofeng (2011) Detecting rare and common variants for complex traits: sibpair and odds ratio weighted sum statistics (SPWSS, ORWSS). Genet Epidemiol 35:398-409

Showing the most recent 10 out of 13 publications