Approaches to complete the human genome will benefit from careful, benchmarked advances that demonstrate the capability to fully assemble and phase diploid chromosomes. The remaining unresolved regions in our high-resolution genomic maps are known to contain long tracts of repeats. The long-term objective of our research is to develop new experimental methods to complete chromosome scale assemblies to study the sequence organization, structural diversity, and disease impact of these novel sequences. The goal of this proposal is to develop sequencing methods to improve the sequence throughput of reads that are hundreds of kilobases in length to improve consensus base accuracy, a necessary step to advance assembly efforts into the remaining gapped regions. In our first aim, we demonstrate the use of our approach to generate the first telomere-to-telomere phased assembly of a human chromosome. We hypothesize that this work will be critical to complete other chromosome reference assemblies. In our second aim we present a new approach to target and enrich for long-reads in repetitive DNAs that were previously misrepresented or missing completely in previous reference assemblies. This approach is expected to provide a new cost-effective method of studying genetic variation in these highly variable regions from a large number of individuals. This research has the additional benefit that it will add new sequence to the human genome to systematically explore genetic variation of regions frequently overlooked as part of disease-association studies.

Public Health Relevance

The objective of the proposed study will be to enable high throughput sequencing of long reads with high consensus base accuracy, necessary to dramatically improve phased, linear assemblies of repeat-rich regions. The technologies developed will guide efforts to complete chromosome-scale assemblies and will institute new studies of sequence variation in repetitive DNAs.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HG010548-02
Application #
9920185
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Smith, Michael
Project Start
2019-04-23
Project End
2021-03-31
Budget Start
2020-04-01
Budget End
2021-03-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of California Santa Cruz
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
125084723
City
Santa Cruz
State
CA
Country
United States
Zip Code
95064