The NHGRI Genome Sequencing Program (GSP) will identify genomic variants relevant to health and disease by genome sequencing over 225,000 participants across a multitude of diseases. The GSP will also serve as a pilot for the Precision Medicine Initiative that aims to enroll and sequence more than a million people representative of U.S. ethnic diversity. Here, we propose a GSP analysis center focused on Multi- and Trans- ethnic Mapping of Mendelian and Complex Diseases. There is a growing recognition of the substantial scientific advantages, as well as public health importance, of conducting biomedical research across ethnically diverse cohorts. We propose to develop scalable methods that incorporate ancestry to optimize medical genomic study design and improve power for uncovering the role of common and rare variants in disease. Achieving this goal requires expertise across diverse domains of knowledge including: medical and population genomics, algorithm development for complex disease mapping, and expertise in management of large-scale databases. Here, we have assembled a world-class team of medical and population geneticists, computer scientists, statisticians and clinicians, with leading expertise in the development of novel and scalable strategies for characterizing sequence variants and their role in disease. Importantly, our group has been at the forefront of development of resources, study designs and methods to enable genomic research in U.S. minority populations. Our project has three main objectives. First, we will develop an Automated Scalable Ancestry Pipeline (ASAP) for common disease mapping in diverse populations. ASAP will improve the computational efficiency of existing state-of-the-art methods for ancestry inference and develop important extensions to linear mixed models (LMMs) and other mapping strategies leveraging local and global ancestry. We will also develop methods to refine phenotypes and identify common controls for disease studies and define endpoints. Secondly, we will develop tools and resources for trans- and multi-population rare variant discovery that incorporate patterns of local and sub-continental ancestry. We will also develop machine-learning tools for variant annotation that leverage ancestral information, patterns of sequence evolution, and protein structure in a unified framework. Furthermore, we will incorporate population-specific patterns of cellular phenotypes to improve functional prediction algorithms for non-coding and coding variants. Lastly, we will disseminate our results through web-based resource that empower the biomedical research community. We will augment existing resources including ClinGen by annotating and characterizing pathogenic variants across diverse populations. We will develop a secure web-server that allows sharing of summary statistics and analysis pipelines to enable discovery, fine-mapping and functional prediction of genetic variants. Our team has ample experience with NIH-funded consortia and is dedicated to meeting the overall GSP project goals through collaborative work with NHGRI leadership and other funded investigators.
The goal of our project is to accelerate the discovery of DNA variation relevant to health and disease by analyzing data from over 225,000 ethnically and racially diverse patients that will undergo genome sequencing. Of particular importance is ensuring we have powerful statistical methods for analyzing data from underserved groups including U.S. minority populations. Achieving this goal requires expertise across many domains of knowledge including: medical and population genomics, algorithm development for disease mapping, and expertise in large-scale databases.
Wojcik, Genevieve L; Fuchsberger, Christian; Taliun, Daniel et al. (2018) Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies. G3 (Bethesda) 8:3255-3267 |
DeBoever, Christopher; Tanigawa, Yosuke; Lindholm, Malene E et al. (2018) Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study. Nat Commun 9:1612 |
Park, Danny S; Eskin, Itamar; Kang, Eun Yong et al. (2018) An ancestry-based approach for detecting interactions. Genet Epidemiol 42:49-63 |
Kernohan, Kristin D; Frésard, Laure; Zappala, Zachary et al. (2017) Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum Mutat 38:611-614 |
Zaitlen, Noah; Huntsman, Scott; Hu, Donglei et al. (2017) The Effects of Migration and Assortative Mating on Admixture Linkage Disequilibrium. Genetics 205:375-383 |
Li, Xin; Kim, Yungil; Tsang, Emily K et al. (2017) The impact of rare variation on gene expression across tissues. Nature 550:239-243 |
Aschard, Hugues; Guillemot, Vincent; Vilhjalmsson, Bjarni et al. (2017) Covariate selection for association screening in multiphenotype genetic studies. Nat Genet 49:1789-1795 |
McAllister, Kimberly; Mechanic, Leah E; Amos, Christopher et al. (2017) Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 186:753-761 |
Pala, Mauro; Zappala, Zachary; Marongiu, Mara et al. (2017) Population- and individual-specific regulatory variation in Sardinia. Nat Genet 49:700-707 |
Ritchie, Marylyn D; Davis, Joe R; Aschard, Hugues et al. (2017) Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions. Am J Epidemiol 186:771-777 |
Showing the most recent 10 out of 12 publications