This project aims at improving the efficiency and accuracy of the computational method for transcript identification and quantification, StringTie. Due to its unparalleled speed and accuracy, StringTie has become one of the leading tools in the field and has a rapidly growing user base. Identifying the transcripts being expressed by a cell is a critical step in studying cell development, disease, the response to infection, specific gene pathways, and much more. By producing better models and expression levels for genes and transcripts, StringTie will, therefore, have an impact on many different areas of research that study biological diversity in our world. The software developed during this project will be made available under an open-source license, thereby enhancing the research infrastructure of the US by enabling the broad reuse of the code base by other scientists investigating similar research topics.
The most significant result of this project extends StringTie's usability to a larger community of scientists interested in eukaryotic gene annotation by the addition of a de novo assembly method, which will incorporate genome assembly technology with sequencing coverage information and optimization techniques. By solving a maximum flow problem on a splicing graph built directly from uniquely assembled reads, this new assembly method has the potential to reduce false positives typically associated with methods that use a de Bruijn graph approach. Furthermore, two additional features will improve the accuracy of the transcriptome assembly: one will make StringTie efficiently handle long reads typically produced by third-generation sequencing technologies, and another one will incorporate annotation of open reading frames as information describing the assembled transcripts. Both additions have the potential to significantly improve the transcript structures inferred from short-read RNA-sequencing data. The results of this project will be disseminated via scientific publications and the StringTie website: http://ccb.jhu.edu/software/stringtie.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.