This project aims to develop an efficient and accurate new computational method for identifying novel transcripts and their expression levels. Transcriptome assembly and gene expression profiling are key components in a vast range of biological experiments today, playing a central role in unraveling the complexity of cell types, cell differentiation, responses to stress, and myriad other conditions. Although transcript assemblers have been developed previously, most of them perform poorly on real, large-scale RNA sequencing data sets, severely limiting their impact. To produce better transcript models, an innovative new method will be developed, combining ideas from several scientific disciplines. By ensuring that this method works on the very large data sets that are routinely produced by modern next-generation sequencing instruments, this project will have an impact on a wide range of studies across the spectrum of eukaryotic species. It will also enhance the research infrastructure by providing free, open source software that can be re-used by other scientists for commercial, educational, or basic research endeavors.

This new method uses an optimization technique known as maximum flow in a specially-constructed flow network to determine gene expression levels, and it does this while simultaneously assembling each splice variant of a gene. It also incorporates techniques from whole-genome assembly, which has the potential to dramatically improve detection of alternative splice variants. By using pre-assembled reads, the computational load and memory storage requirements associated with transcriptome assembly will be greatly reduced, as many of the short reads will be combined into longer contigs that span multiple exons. Furthermore, the new method will address a critical need for a transcriptome assembly method that is able to handle the numerous gaps present in draft genomes, and to produce better-assembled transcripts by stitching together portions of transcripts situated on multiple fragments of the genome. The results of this project will be disseminated at http://ccb.jhu.edu.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1458178
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2015-06-01
Budget End
2019-05-31
Support Year
Fiscal Year
2014
Total Cost
$662,839
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218