Population structure and admixture are key confounders in genome-wide association and medical resequencing studies. In particular, accounting for difference in ancestry among cases and controls, both in terms of genomic and geographic location, is critical for proper analysis and interpretation of studies with multi- and trans-ethnic samples. Genomic studies of Hispanics/Latinos, the largest and fastest growing minority group in the US, reveal that they are a highly genetically heterogeneous admixed group with immense variation among individuals and populations in the proportions of African, European, and Native American ancestry. Furthermore, while Mexican populations have been characterized genomically to some extent, genetic studies of populations from the Caribbean and South America have been largely underrepresented. Knowledge of the underlying complex genetic structure of US Hispanic/Latino and Caribbean populations is, therefore, essential to ensuring robustness of genotype-phenotype associations and understanding the medical relevance of associated variants across diverse populations in the US and throughout the Americas. Furthermore, since much is known about the African and European migrations into the Americas over the past 500 years, population genetic studies of Hispanics/Latinos serve as an excellent model for developing novel algorithms and approaches for characterizing fine-scale genetic structure of admixed populations, in general. This project will extend current studies of population genetic structure in US Hispanics/Latinos by densely genotyping 180 parent-offspring triads and sequencing the genomes of 30 triads from six U.S. populations of Caribbean- descent: Puerto Rico, Cuba, Dominican Republic, Haiti, Honduras and Colombia. We will combine the SNP, CNV, and whole genome sequence (WGS) data with other publically available genomic resources including the International HapMap project and the 1000 Genomes project to understand the complex genetic architecture of Hispanic/Latino populations in the US. We will accomplish this goal through the following specific aims: 1) Generate dense SNP genotype data across our sample of 180 triads using the Affymetrix 6.0 whole genome SNP chip (~1 million SNPs and CNVs), 2) Generate high coverage WGS data and build the complete genomes of 30 triads (5 from each of 6 populations) to at least 20X coverage, 3) Characterize population structure and admixture in our US Hispanic/Latino triads based on SNP genotype and WGS data including comparison to HapMap and 1000G data, and Aim 4) Assess and account for the impact of substructure on disease-association tests in order to improve the next generation of trans and multi-ethnic medical genomic studies. Our project is highly significant because it will provide immediate insights and new statistical methods to improve study design and genetic analysis for medical genomic studies in Hispanics/Latinos, other complex admixed groups, and multi- and trans-ethnic studies.
Showing the most recent 10 out of 18 publications