Cost-effective, and accurate sequencing of RNA, composed of both canonical and modified bases, of any length, without conversion to cDNA, and without amplification are the objectives of this project, and the ultimate goal is to sequence the transcriptome, and determine in a time-sensitive manner relative distribution of its components. Such accomplishment will directly impact prevention, diagnosis, and cure of disease and materialize the promise of personalized medicine. Current methods, such as Illumina's RNA-Seq, and the single molecule approaches of Pacific Biosciences and of Oxford Nanopore Technologies, still lag behind in many of the critical attributes mentioned above. The unresolved issue with nanopore-based sequencing is the observation that ion current vs. time recording does not refer to a single nucleobase, but to a short sequence of 4 or more bases. The problem, partially resolved with the use of sophisticated algorithms and learning machines, appears intractable for RNA that includes numerous post-transcriptional base modifications. As an illustration, if a nanopore reads a sequence of 4 bases and the specific RNA to be sequenced has a total of 8 different nucleobases (4 canonical and 4 modified), then 48 = 65,536 signals need to be discriminated from within an ion current range of 20 to 40 pA with a standard deviation of 1 pA; this is an impossible computational task. However, if the nanopore could sense one base at a time and yield distinct ion current for each base, there will be only 8 different recordings to distinguish from, a much simpler task. Our own published results indicate that oligodeoxynucleotides conjugated with a pyrimidine-specific tag (Osmium tetroxide 2,2'-bipyrimidine or OsBp) yield enzyme-free, slow/readable translocation via ?- Hemolysin, and distinct ion current levels for intact, T(OsBp), and C(OsBp) bases, suggesting that a single tag can yield sequencing information on purine, T, and C. The latter leads to the conjecture that the presence of a second, purine-specific, label would allow identification of all four canonical bases. Furthermore each tag has intrinsic selectivity for one base over another, and this will provide a handle for additional discrimination among the modified bases. In this phase I proposal we aim to demonstrate (i) near 100% labeling (true positives) with 0% internucleotide bond cleavage, and 0% false positives for RNA(OsBp), as we have already shown for DNA(OsBp), (ii) comparable labeling attributes for a purine-specific tag, and (iii) readable translocation with single pyrimidine base discrimination for RNA(OsBp). Success in these efforts will lead to single base discrimination and sequencing of RNA, including a number of post-transcriptionally modified bases, and pave the road for sequencing the transcriptome. !
Advances in personalized medicine for diagnosis and treatment of disease require sequencing the RNA transcriptome with technologies that are currently unavailable. Nanopore-systems that exhibit single-base discrimination, like the one addressed in this proposal, will allow sequencing the transcriptome in an accurate, timely, and cost-effective manner.