Single-molecule sequence assembly and analysis

Phillippy, Adam

Abstract

A major highlight of the past year was the successful completion of the first human chromosome ever to be sequenced and assembled from end-to-end without gaps. The project built on nanopore sequencing data generated and mentioned in 2018's annual report. This year, in collaboration primarily with Karen Miga at UC Santa Cruz, we led the complete assembly of a human X chromosome using the ultra-long read sequencing technique reported in 2018 and the Canu software reported in 2017. Achieving a complete X chromosome required further refinement of our assembly approach and the development of novel tools for improving the accuracy of highly repetitive regions of the genome, such as the centromeric satellite arrays. One of these new tools is a highly accurate single-molecule sequencing strategy, which we contributed to the development of (ref 12). Successful reconstruction of the first complete human chromosome is a milestone achievement, and we will be working toward the publication of this result, and the completion of additional chromosomes, in the coming year. Both the data and methods developed for this project put us on a path to complete all remaining gaps in the human reference genome in the coming years. Moving beyond a single reference genome, and towards a pan-genome for all humans, we began a new project with collaborators at UC Santa Cruz to sequence multiple diploid human genomes using the trio-binning approach first described in 2018's annual report and successfully published in this reporting year (ref 9). This year we selected our initial 10 human samples and began sequencing on both PacBio and Nanopore platforms. A preprint describing the initial nanopore sequencing and assembly of these samples was released, and work continues to integrate the additional data types and bring these genomes up to a reference-grade quality standard. In the coming years we plan to use this data in order to construct a reference database of complete human haplotypes that accurately captures complex structural variation from across the human population. Beyond the human genome, the Genome Informatics Section (GIS) is also in the process of generating high-quality reference genomes for all (approximately 250) extant vertebrate orders in collaboration with the Vertebrate Genomes Project. Building upon the 16 vertebrate genomes announced last year, we have now completed nearly 100 vertebrate genomes via this project and related efforts. These genomes include that of the Canada lynx, platypus, kakapo, yak, cow, whale shark, goldfish (ref 2), pig, and many others. These genomes will enable powerful comparative genomics studies that will help reveal the function of vertebrate genomes and enhance our understanding of the human genome. In addition to vertebrates, we have completed the genomes of several invertebrates with significant public health impact, including the mosquitos Aedes aegypti (ref 10), Aedes albopictus (in draft), and Anopheles funestus (ref 5), which are vectors of important diseases such as malaria, zika, west nile, yellow fever, dengue, chickungunya, etc. Due to their highly repetitive nature, mosquito genomes present a difficult assembly challenge that will drive further improvements to our methods. Lastly, the GIS continues to work on developing new methods for the alignment and real-time analysis of long-read sequencing data. This year we published an analysis of public microbial genome databases (ref 11), a new tool for accurate HLA typing from long reads (ref 4), tools for aligning for assigning metagenomic reads to their source genome (refs 3 and 6), and assisted the USDA in the analysis of the complete cow rumen metagenome (ref 1). We also applied our tools to the study of microbial speciation and produced evidence that a bacterial species boundary does exist and can be detected and measured using genomic tools we developed (ref 7). In addition to the 12 papers we formally published this year, the section has submitted 12 pre-prints to bioRxiv that are currently undergoing peer review. These pre-prints include some of the genomes mentioned above as well as new methods for comparative genomics, genome assembly, and metagenomics.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIAHG200398-04
Application #: 10022463
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 4
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: National Human Genome Research Institute
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects


NIH 2019 ZIA HG	Single-molecule sequence assembly and analysis Phillippy, Adam / National Human Genome Research Institute
NIH 2018 ZIA HG	Single-molecule sequence assembly and analysis Phillippy, Adam / Human Genome Research
NIH 2017 ZIA HG	Single-molecule sequence assembly and analysis Phillippy, Adam / Human Genome Research
NIH 2016 ZIA HG	Single-molecule sequence assembly and analysis Phillippy, Adam / Human Genome Research

Publications

Jain, Chirag; Dilthey, Alexander; Koren, Sergey et al. (2018) A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases. J Comput Biol 25:766-779

Kim, Jung-Hyun; Dilthey, Alexander T; Nagaraja, Ramaiah et al. (2018) Variation in human chromosome 21 ribosomal RNA genes characterized by TAR cloning and long-read sequencing. Nucleic Acids Res 46:6712-6725

Miller, Jason R; Koren, Sergey; Dilley, Kari A et al. (2018) Analysis of the Aedes albopictus C6/36 genome provides insight into cell line utility for viral propagation. Gigascience 7:1-13

Marçais, Guillaume; Delcher, Arthur L; Phillippy, Adam M et al. (2018) MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14:e1005944

Jain, Miten; Koren, Sergey; Miga, Karen H et al. (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338-345

Koren, Sergey; Walenz, Brian P; Berlin, Konstantin et al. (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722-736

Venkateswaran, Kasthuri; Checinska Sielaff, Aleksandra; Ratnayake, Shashikala et al. (2017) Draft Genome Sequences from a Novel Clade of Bacillus cereus Sensu Lato Strains, Isolated from the International Space Station. Genome Announc 5:

Schwartz, John C; Gibson, Mark S; Heimeier, Dorothea et al. (2017) The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation. Immunogenetics 69:255-269

Bickhart, Derek M; Rosen, Benjamin D; Koren, Sergey et al. (2017) Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49:643-650

Phillippy, Adam M (2017) New advances in sequence assembly. Genome Res 27:xi-xiii

Showing the most recent 10 out of 18 publications

Comments

Be the first to comment on Adam Phillippy's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: