This project aims at improving the efficiency and accuracy of the computational method for transcript identification and quantification, StringTie. Due to its unparalleled speed and accuracy, StringTie has become one of the leading tools in the field and has a rapidly growing user base. Identifying the transcripts being expressed by a cell is a critical step in studying cell development, disease, the response to infection, specific gene pathways, and much more. By producing better models and expression levels for genes and transcripts, StringTie will, therefore, have an impact on many different areas of research that study biological diversity in our world. The software developed during this project will be made available under an open-source license, thereby enhancing the research infrastructure of the US by enabling the broad reuse of the code base by other scientists investigating similar research topics.

The most significant result of this project extends StringTie's usability to a larger community of scientists interested in eukaryotic gene annotation by the addition of a de novo assembly method, which will incorporate genome assembly technology with sequencing coverage information and optimization techniques. By solving a maximum flow problem on a splicing graph built directly from uniquely assembled reads, this new assembly method has the potential to reduce false positives typically associated with methods that use a de Bruijn graph approach. Furthermore, two additional features will improve the accuracy of the transcriptome assembly: one will make StringTie efficiently handle long reads typically produced by third-generation sequencing technologies, and another one will incorporate annotation of open reading frames as information describing the assembled transcripts. Both additions have the potential to significantly improve the transcript structures inferred from short-read RNA-sequencing data. The results of this project will be disseminated via scientific publications and the StringTie website: http://ccb.jhu.edu/software/stringtie.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1759518
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2018-07-15
Budget End
2021-06-30
Support Year
Fiscal Year
2017
Total Cost
$993,407
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218