There is increasing recognition that rare, non-coding, and structural genomic variations are all important contributors to disease risk. We propose to create a graphical map of all the variations found by the TOPMed consortium - ultimately incorporating information from 100,000 genomes - that will allow unbiased analysis of all forms of variation in concert. The proof of the utility of this map will be a partially phased diploid reconstruction of each of the TOPMed genomes, compactly represented as a pair of paths for each chromosome in a global genome graph reference. To demonstrate how such a genome graph can be transformative for integrative analysis, we will build the first population gene annotation - defined as the set of all the splice isoforms of genes being expressed, including their underlying haplotypes, within a comprehensive sampling of a population. A population gene annotation can be used to study the association between genetic variation and isoform expression in a statistically meaningful way. For example, given a population annotation, one could ask: ?How does a given genetic variant - which may be in a non-coding region - affect the expression of a given isoform??, or ?Which variants are associated with high expression of a given isoform in a particular disease state??. Typically such integrative analysis questions are hard-to-impossible to answer with current representations. Hemoglobin disorders caused by genetic variants affecting the production or structure of the alpha and beta globin proteins are the most common inherited blood disorders, affecting millions of individuals worldwide. To demonstrate the power of the graphical approach, we will create the most comprehensive map of genetic variants in the alpha and beta globin loci and associated regulatory genes to date. We will show how combining this map with the phenotype data available from projects such as the Jackson Heart and the Women's Health Initiative, and RNA-Seq data from TOPMed and other projects such as the Genotype-Tissue Expression (GTEx) project (gtexprotal.org) allows us to identify novel candidates for causal variants and provide evidence that other rare variants are benign. This demonstration project will drive the research and improve both speed and precision in the diagnosis of hemoglobin disorders. It will provide a convincing demonstration of the value of this type of integrated approach to the analysis of all forms of genetic variation. As these hemoglobin disorders disproportionately affect certain genetic subpopulations, this study will show how the use of a more comprehensive reference structure, tunable to specific ethnic subpopulations, can reduce the potential biases that occur when relying on a single reference genome. 1

Public Health Relevance

The project will build a graphical map of the genomes and transcriptomes of the TOPMed consortium. This map will form a foundational, integrative resource for the wealth of data, allowing the analysis of all forms of variation and expression data to be studied in concert. As a driving demonstration, it will be used to improve the detection, classification and curation of thalassemia and hemoglobin disorder associated variants, so increasing our ability to rapidly and precisely diagnose blood disorders from genomic data.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project--Cooperative Agreements (U01)
Project #
1U01HL137183-01
Application #
9312561
Study Section
Special Emphasis Panel (ZHL1-CSR-Q (F1))
Program Officer
Gan, Weiniu
Project Start
2017-04-15
Project End
2020-03-31
Budget Start
2017-04-15
Budget End
2018-03-31
Support Year
1
Fiscal Year
2017
Total Cost
$630,569
Indirect Cost
$211,478
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064
Paten, Benedict; Eizenga, Jordan M; Rosen, Yohei M et al. (2018) Superbubbles, Ultrabubbles, and Cacti. J Comput Biol 25:649-663
Kolmogorov, Mikhail; Armstrong, Joel; Raney, Brian J et al. (2018) Chromosome assembly of large and complex genomes using multiple references. Genome Res 28:1720-1732
Novak, Adam M; Garrison, Erik; Paten, Benedict (2017) A graph extension of the positional Burrows-Wheeler transform and its applications. Algorithms Mol Biol 12:18