Approaches to complete the human genome will benefit from careful, benchmarked advances that demonstrate the capability to fully assemble and phase diploid chromosomes. The remaining unresolved regions in our high- resolution genomic maps are known to contain long tracts of repeats. The long-term objective of our research is to develop new experimental methods to complete chromosome scale assemblies to study the sequence organization, structural diversity, and disease impact of these novel sequences. In our first aim, we demonstrate the use of new approaches to generate the first telomere-to-telomere phased assembly of a human genome using effectively haploid complete hydatidiform moles (CHMs), and demonstrate the ability to scale these methods to a panel of CHMs. In our second aim we focus on validation methods of repeat assemblies to improve upon the structural and base-level accuracy of our assemblies. In our third aim we harden haplotype phasing method using high coverage ultra long data from diploid genomes to guide phased chromosome assemblies. We propose to optimize a new, cost-effective method of improving high quality reference genomes to reach complete, telomere-to-telomere genome assemblies. This research has the additional benefit that it will add new sequence to the human genome to systematically explore genetic variation of regions frequently overlooked as part of disease association and functional studies.

Public Health Relevance

The objective of the proposed study will be to develop innovative methods to reach complete, telomere-to- telomere phased assemblies of human genomes. To reach this milestone we propose technological innovation in both long read sequencing, novel repeat assembly strategies. Fundamental to the success of this project will be the equal emphasis on rigorous assessment of assembly structure and base-level accuracy.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG011274-01
Application #
10034316
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Smith, Michael
Project Start
2020-09-14
Project End
2025-06-30
Budget Start
2020-09-14
Budget End
2021-06-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064