The NHGRI Genome Sequencing Program (GSP) will identify genomic variants relevant to health and disease by genome sequencing over 225,000 participants across a multitude of diseases. The GSP will also serve as a pilot for the Precision Medicine Initiative that aims to enroll and sequence more than a million people representative of U.S. ethnic diversity. Here, we propose a GSP analysis center focused on Multi- and Trans- ethnic Mapping of Mendelian and Complex Diseases. There is a growing recognition of the substantial scientific advantages, as well as public health importance, of conducting biomedical research across ethnically diverse cohorts. We propose to develop scalable methods that incorporate ancestry to optimize medical genomic study design and improve power for uncovering the role of common and rare variants in disease. Achieving this goal requires expertise across diverse domains of knowledge including: medical and population genomics, algorithm development for complex disease mapping, and expertise in management of large-scale databases. Here, we have assembled a world-class team of medical and population geneticists, computer scientists, statisticians and clinicians, with leading expertise in the development of novel and scalable strategies for characterizing sequence variants and their role in disease. Importantly, our group has been at the forefront of development of resources, study designs and methods to enable genomic research in U.S. minority populations. Our project has three main objectives. First, we will develop an Automated Scalable Ancestry Pipeline (ASAP) for common disease mapping in diverse populations. ASAP will improve the computational efficiency of existing state-of-the-art methods for ancestry inference and develop important extensions to linear mixed models (LMMs) and other mapping strategies leveraging local and global ancestry. We will also develop methods to refine phenotypes and identify common controls for disease studies and define endpoints. Secondly, we will develop tools and resources for trans- and multi-population rare variant discovery that incorporate patterns of local and sub-continental ancestry. We will also develop machine-learning tools for variant annotation that leverage ancestral information, patterns of sequence evolution, and protein structure in a unified framework. Furthermore, we will incorporate population-specific patterns of cellular phenotypes to improve functional prediction algorithms for non-coding and coding variants. Lastly, we will disseminate our results through web-based resource that empower the biomedical research community. We will augment existing resources including ClinGen by annotating and characterizing pathogenic variants across diverse populations. We will develop a secure web-server that allows sharing of summary statistics and analysis pipelines to enable discovery, fine-mapping and functional prediction of genetic variants. Our team has ample experience with NIH-funded consortia and is dedicated to meeting the overall GSP project goals through collaborative work with NHGRI leadership and other funded investigators.

Public Health Relevance

The goal of our project is to accelerate the discovery of DNA variation relevant to health and disease by analyzing data from over 225,000 ethnically and racially diverse patients that will undergo genome sequencing. Of particular importance is ensuring we have powerful statistical methods for analyzing data from underserved groups including U.S. minority populations. Achieving this goal requires expertise across many domains of knowledge including: medical and population genomics, algorithm development for disease mapping, and expertise in large-scale databases.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
1U01HG009080-01
Application #
9132536
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Felsenfeld, Adam
Project Start
2016-05-02
Project End
2020-03-31
Budget Start
2016-05-02
Budget End
2017-03-31
Support Year
1
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Stanford University
Department
Genetics
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94304
Wojcik, Genevieve L; Fuchsberger, Christian; Taliun, Daniel et al. (2018) Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies. G3 (Bethesda) 8:3255-3267
DeBoever, Christopher; Tanigawa, Yosuke; Lindholm, Malene E et al. (2018) Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study. Nat Commun 9:1612
Park, Danny S; Eskin, Itamar; Kang, Eun Yong et al. (2018) An ancestry-based approach for detecting interactions. Genet Epidemiol 42:49-63
Martin, Alicia R; Gignoux, Christopher R; Walters, Raymond K et al. (2017) Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet 100:635-649
Kernohan, Kristin D; Frésard, Laure; Zappala, Zachary et al. (2017) Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum Mutat 38:611-614
Zaitlen, Noah; Huntsman, Scott; Hu, Donglei et al. (2017) The Effects of Migration and Assortative Mating on Admixture Linkage Disequilibrium. Genetics 205:375-383
Li, Xin; Kim, Yungil; Tsang, Emily K et al. (2017) The impact of rare variation on gene expression across tissues. Nature 550:239-243
Aschard, Hugues; Guillemot, Vincent; Vilhjalmsson, Bjarni et al. (2017) Covariate selection for association screening in multiphenotype genetic studies. Nat Genet 49:1789-1795
McAllister, Kimberly; Mechanic, Leah E; Amos, Christopher et al. (2017) Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 186:753-761
Pala, Mauro; Zappala, Zachary; Marongiu, Mara et al. (2017) Population- and individual-specific regulatory variation in Sardinia. Nat Genet 49:700-707

Showing the most recent 10 out of 12 publications