Current methods for RNA-seq library preparation attempt to uniformly sample all sequences across every mRNA molecule, optimally with sufficient overlap to allow de novo reassembly of the mRNA sequences from which they derive, or alternatively, to allow inference of mRNA sequence by alignment with reference sequences. Genes that encode mRNAs in multiple isoforms present a challenge: given a complete set of short sequence reads that span every exon and splice junction, certain alternative underlying mRNA isoform models cannot be deconvoluted using data of this nature. This confounding situation occurs when more than one isoform model can explain the frequencies of exon and junction sequence reads, and it is mathematically unavoidable: ultimately, short sequence reads do not contain the information needed to unambiguously identify the correct isoform model for certain common splicing patterns. We propose to test a method to preserve the information to reconstruct isoform models. In this method, we generate a small barcoded collection of overlapping sequence reads for every individual mRNA molecule, such that sequence reads from the same mRNA molecule contain the same barcode, but other transcripts from the same gene are each associated with a different molecule-specific barcode. Assembly of contigs entails the alignment of the gene-derived sequences associated with the same barcode. This will be done by random primed synthesis of cDNA in an emulsion format using beads each of which carries random primers flanked by a bead-specific barcode. A novelty in this proposal is a method to generate a bead library in which each bead carries only one barcode, but in which the overall complexity of barcodes is very high. These beads are used to generate barcoded random primers in emulsion droplets that also contain cDNA, such that multiple randomly primed products all contain the same barcode. In principle, this method produces molecule-specific collections of barcoded cDNAs, which, upon high throughput sequencing, can be aligned to reveal the specific structural details of mRNA isoforms on a molecule-by-molecule basis. This approach would solve the isoform model identifiability problem. 1

Public Health Relevance

High throughput sequencing methods that generate short sequence reads cannot be used to reliably describe or quantify the different mRNA isoforms that can arise due to alternative splicing from genes that have 5 or more exons. The method we propose solves this problem by barcoding each subsequence from a single mRNA with exactly the same unique barcode.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43GM130245-01
Application #
9622969
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ravichandran, Veerasamy
Project Start
2018-09-01
Project End
2019-02-28
Budget Start
2018-09-01
Budget End
2019-02-28
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
Vlp Biotech, Inc.
Department
Type
DUNS #
160242579
City
San Diego
State
CA
Country
United States
Zip Code
92121