The goal of our Center for Human Reference Genome Diversity is to generate as error-free, gapless, complete, and correctly haplotype-phased genome assemblies as possible from a set of 350 persons comprehensively capturing the full extent of human diversity.
We aim to capture >99% of allelic variants with >1% allele frequency, and to provide these genomes as a resource to the international community to enable genomic medicine and research addressing fundamental unanswered questions in biology and disease. We will employ a multi-platform approach using cutting-edge long read and linked read technologies to obtain the highest quality phased genomes.
Aim 1 will focus on sample collection and procuring cell lines from at least 350 individuals with a specific emphasis on filling in gaps in human diversity.
Aim 2 will generate highly contiguous chromosomal level assemblies that are over 99% haplotype-phased for at least 700 haploid genomes from 350 diploid samples.
Aim 3 will finish these genomes to be gapless from telomere-to-telomere (T2T) for each chromosome.
Aim 4 will evaluate the genomes for accuracy and completeness and perform initial variant calling to assess the level of human diversity. We will use a novel combination of technologies, sequencing strategies, and algorithms that we and others developed to produce the highest quality and most complete genome assemblies to date. Our effort will specifically target regions that have been excluded by other efforts, including segmental duplications, centromeres, and acrocentric DNA. To achieve these aims we have assembled an exceptional team consisting of leaders from around the world in consent ethics, sample collection, sample extraction, and high-quality genome sequencing, assembly, finishing and evaluation. The team also has expertise in using genomic technologies to address a broad range of scientific questions, so is highly cognizant of the practical needs of biomedical researchers who will use this resource. The high-quality genomes produced will be passed to the Human Reference Genome Center (HGRC) and Genome Reference Representation (GRR) groups for curation and release. The result will be a pan-human genome reference, representing important human diversity not present in the current reference genome. The data we generate will enable a fundamental shift in human genetics, fostering new discoveries from the single-nucleotide to chromosomal levels and revealing a more accurate and global view of the human population.

Public Health Relevance

Expanding the human reference genome data to include a large panel of individuals from diverse subpopulations is expected to provide a new foundational reference resource for human genetics and biomedicine that captures the full breadth of human diversity. Here we propose to deploy new, third generation sequencing technologies to generate high quality genomes for this new reference at low production time and cost.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01HG010971-02
Application #
10020424
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Felsenfeld, Adam
Project Start
2019-09-18
Project End
2024-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064