Computational Gene Modeling and Genome Sequence Assembly

Salzberg, Steven

Abstract

New developments in DNA sequencing technology have spurred a tremendous increase in the use of sequencing to answer fundamental questions in biology and medicine. Whole- genome sequencing is being used to study cancer, to discover disease-causing gene variants in patient genomes, and to study human genetic diversity. Numerous WGS projects are being launched for species whose genomes have not yet been sequenced. Sequencing of messenger RNA through RNA-seq has led to an explosion of projects to characterize transcribed genes in multiple cell types and in many species, and simultaneously to discover new genes and new splice variants of known genes. These sequencing-based studies generate enormous amounts of data, which in turn require sophisticated, efficient, and innovative new algorithms that will make it possible to assemble these genomes and identify their gene content. We propose to develop new cloud-computing based assembly algorithms to assemble genomes from short reads generated by the latest sequencing technologies. In parallel, we will continue to improve our existing assemblers, extending them to handle new and diverse data types, including """"""""3rd-generation"""""""" sequences. We will also reach out to outside groups to help them assemble novel species, modifying our software as needed and continuing to push the limits of assembly technology. One of the most exciting recent technology developments in the gene finding arena is RNA- seq, a new protocol for capturing and sequencing the mRNA in a cell. This technique is well on its way to replacing both conventional EST sequencing as a method for capturing transcribed protein-coding genes, and microarray hybridization experiments for measuring transcript levels. We propose to develop new algorithms to take advantage of the flood of new RNA-seq data that has begun to appear. We have already developed two new algorithms, TopHat and Cufflinks, for RNA-seq analysis, which are the first to be able to discover previously unknown splice sites and isoforms. These tools, enhanced with new features to handle a wider variety of sequence data, form the basis of our plans to develop integrated gene finders that can identify novel genes, novel isoforms of known genes, and fusion genes, and to include these methods in a genome annotation pipeline.

Public Health Relevance

Many biomedical researchers are now using large-scale DNA sequencing to study human disease and to understand human biology. The analysis of these new types of sequence data requires highly sophisticated software that can assemble millions or billions of DNA fragments to reconstruct a genome, and that can then identify genes in the assembled sequence. This project will develop new algorithms and software that will help researchers use the latest DNA sequencing technology to sequence, assemble, and find genes in human genomes as well as the genomes of many other species.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG006677-14
Application #: 8530261
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Bonazzi, Vivien

Project Start: 1999-09-01
Project End: 2014-08-31
Budget Start: 2013-09-01
Budget End: 2014-08-31
Support Year: 14
Fiscal Year: 2013
Total Cost: $575,512
Indirect Cost: $192,844

Institution

Name: Johns Hopkins University
Department: Genetics
Type: Schools of Medicine
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21218

Related projects


NIH 2021 R01 HG	Computational Methods for Genome Assembly, Transcript Assembly, and Gene Discovery Salzberg, Steven L. / Johns Hopkins University
NIH 2020 R01 HG	Computational Methods for Genome Assembly, Transcript Assembly, and Gene Discovery Salzberg, Steven L. / Johns Hopkins University
NIH 2018 R01 HG	Computational Methods for Genome Assembly, Transcript Assembly, and Variant Discovery Salzberg, Steven L. / Johns Hopkins University
NIH 2017 R01 HG	Computational Methods for Genome Assembly, Transcript Assembly, and Variant Discovery Salzberg, Steven L. / Johns Hopkins University
NIH 2016 R01 HG	Computational Methods for Genome Assembly, Transcript Assembly, and Variant Discovery Salzberg, Steven L. / Johns Hopkins University
NIH 2015 R01 HG	Computational Methods for Genome Assembly, Transcript Assembly, and Variant Discovery Salzberg, Steven L. / Johns Hopkins University	$600,000
NIH 2013 R01 HG	Computational Gene Modeling and Genome Sequence Assembly Salzberg, Steven L. / Johns Hopkins University	$575,512
NIH 2012 R01 HG	Computational Gene Modeling and Genome Sequence Assembly Salzberg, Steven L. / Johns Hopkins University	$595,227
NIH 2011 R01 HG	Computational Gene Modeling and Genome Sequence Assembly Salzberg, Steven L. / Johns Hopkins University	$712,968

Publications

Simner, Patricia J; Antar, Annukka A R; Hao, Stephanie et al. (2018) Antibiotic pressure on the acquisition and loss of antibiotic resistance genes in Klebsiella pneumoniae. J Antimicrob Chemother :

Gómez-Romero, Laura; Palacios-Flores, Kim; Reyes, José et al. (2018) Precise detection of de novo single nucleotide variants in human genomes. Proc Natl Acad Sci U S A 115:5516-5521

Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer et al. (2018) Identifying Corneal Infections in Formalin-Fixed Specimens Using Next Generation Sequencing. Invest Ophthalmol Vis Sci 59:280-288

Nattestad, Maria; Goodwin, Sara; Ng, Karen et al. (2018) Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. Genome Res 28:1126-1135

Salzberg, Steven L (2018) Open questions: How many genes do we have? BMC Biol 16:94

Fang, Han; Huang, Yi-Fei; Radhakrishnan, Aditya et al. (2018) Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution. Cell Syst 6:180-191.e4

Pertea, Mihaela; Shumate, Alaina; Pertea, Geo et al. (2018) CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol 19:208

El-Diwany, Ramy; Soliman, Mary; Sugawara, Sho et al. (2018) CMPK2 and BCL-G are associated with type 1 interferon-induced HIV restriction in humans. Sci Adv 4:eaat0843

Breitwieser, F P; Baker, D N; Salzberg, S L (2018) KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol 19:198

Sedlazeck, Fritz J; Rescheneder, Philipp; Smolka, Moritz et al. (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15:461-468

Showing the most recent 10 out of 88 publications

Comments

Be the first to comment on Steven Salzberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: