Novel algorithm development, user support and maintenance for STAR

Dobin, Alexander

Abstract

. Sequencing of transcribed RNA molecules (RNA-seq) is an invaluable tool for studying cell transcriptomes at high resolution and depth. STAR is a popular RNA-seq analysis suite that combines high accuracy and ultra- fast speed of mapping with a reach collection of built-in features and tools. STAR is used by hundreds of researchers, including several major consortia and institutions. We propose to significantly enhance and expand STAR capabilities in the following important areas. 1. Develop novel algorithms and tools integrated directly into STAR. RNA-seq analyses require combining multiple tools into ?processing pipelines? which is demanding task owing to bottlenecks and compatibility issues.
We aim to overcome these impediments by integrating novel tools directly into STAR software: (i) mapping of RNA-seq reads to personal genomes utilizing genotype information to produce more accurate allele aware alignments, thus increasing precision of personal genomics analyses; (ii) mapping of long RNA reads from emerging sequencing technologies such as PacBio and Oxford Nanopore. 2. Increase accuracy and speed and of the core mapping algorithm. New applications, such as personal genomics, require significant improvements in mapping accuracy. We will enhance STAR mapping algorithm with (i) spliced seed extension through mismatches/indels; and (ii) limited local alignment so of the read ends. Tremendous increase of sequencing throughput has put a significant emphasis on the efficiency of the computational algorithms. To keep up with the increasing sequencing throughput, we will boost STAR algorithm with (i) vectorization of query-text comparisons using SIMD/SSE instructions; (ii) dynamical programming for seed stitching. The improvements in accuracy and speed will be validated in both simulated and real RNA-seq data. Mapping accuracy depends strongly on choosing the best mapping parameters for a particular dataset. We will devise automated parameter optimization procedures to eliminate guesswork in parameter selection. 3. Enhance user-friendliness, user support/education, and software maintenance. User-friendliness is crucial for bioinformatics software usefulness to the broadest audience.
We aim to significantly enhance users' experience by developing STAR web user interfaces for both pre-run data input, and post-run exploring of results. To enable STAR analysis in the cloud, we will create STAR virtual machines on popular Amazon and Google cloud computing services, and develop Hadoop-based tools for distribute processing of the big datasets. We will also expand user support and education, continue to implement user- requested features and debug user-reported issues.

Public Health Relevance

Sequencing of transcribed RNA molecules (RNA-seq) provides invaluable insight about gene expression and functions, which directly affect various clinically important aspects, such as development, disease susceptibility, and therapy/drug responses. The goal of this project is to significantly enhance capabilities of our RNA-seq analysis suite STAR, turning it into an ultimate one-stop solution for the majority of RNA-seq analyses. These enhancements, in conjunction with continued user support and software maintenance, will be beneficial to hundreds of medical researchers using RNA-seq to develop better diagnostics and treatments for major diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG009318-04
Application #: 9932464
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Sen, Shurjo Kumar

Project Start: 2017-08-18
Project End: 2022-05-31
Budget Start: 2020-06-01
Budget End: 2021-05-31
Support Year: 4
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Cold Spring Harbor Laboratory
Department
Type
DUNS #: 065968786

City: Cold Spring Harbor
State: NY
Country: United States
Zip Code: 11724

Related projects


NIH 2020 R01 HG	Novel algorithm development, user support and maintenance for STAR Dobin, Alexander / Cold Spring Harbor Laboratory
NIH 2019 R01 HG	Novel algorithm development, user support and maintenance for STAR Dobin, Alexander / Cold Spring Harbor Laboratory
NIH 2018 R01 HG	Novel algorithm development, user support and maintenance for STAR Dobin, Alexander / Cold Spring Harbor Laboratory
NIH 2017 R01 HG	Novel algorithm development, user support and maintenance for STAR Dobin, Alexander / Cold Spring Harbor Laboratory

Comments

Be the first to comment on Alexander Dobin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: