Cost-effective, and accurate sequencing of RNA, composed of both canonical and modified bases, of any length, without conversion to cDNA, and without amplification are the objectives of this project, and the ultimate goal is to sequence the transcriptome, and determine in a time-sensitive manner relative distribution of its components. Such accomplishment will directly impact prevention, diagnosis, and cure of disease, and materialize the promise of personalized medicine. Current methods, such as Illumina?s RNA-Seq, and the single molecule approaches of Pacific Biosciences and of Oxford Nanopore Technologies (ONT), still lag behind in many of the critical attributes mentioned above. Nanopore-based sequencing has made amazing strides the last few years, and its ability to provide results in the field can?t be understated. The unresolved issue with nanopore-based sequencing is the observation that ion current vs. time recording does not refer to a single nucleobase, but to a short sequence of 4 to 5 bases. The problem, partially resolved with the use of sophisticated algorithms and match to a reference, appears intractable for de-novo sequencing of RNA known to include numerous post-transcriptional base modifications. As an example, for a nanopore that senses a sequence of 4 bases and a specific RNA with a total of 6 different nucleobases (4 canonical and 2 modified), sequencing will require 4096 distinct signals, to be discriminated from within an ion current range of 70 to 150 pA with a standard deviation of 2 pA, an impossible computational task. On the contrary, if the nanopore could sense, let us say, two bases at a time and yield distinct ion current for each doublet, there will be only 64 different recordings to distinguish from, a computationally simple task. Published results in collaboration with Northeastern University in Boston and University of Utah in Salt Lake City clearly indicate that oligodeoxynucleotides tagged with a pyrimidine-specific label (Osmium tetroxide 2,2?-bipyrimidine (OsBp)) yield enzyme-free, slow/readable translocation via solid-state pore as well as via a protein pore, ?-Hemolysin. Distinct ion current levels were observed for intact, dT(OsBp), and dC(OsBp) bases, suggesting that a single tag can yield sequencing information on deoxypurine, dT, and dC. Preliminary results at Yenos?, under a Phase I R43 SBIR 2018 grant, using synthetic RNAs and the MinION device from ONT extend the above findings to RNA, and suggest a two-nucleobase sensing regime for RNA(OsBp). Here the proposition is made that the presence of a second, guanosine- selective, label will facilitate identification of all four canonical bases, and, likely, extend identification to many modified RNA bases. Most importantly the intrinsic selectivity of a label for one base over another, will provide a handle for additional discrimination among the modified bases. Labeling is inexpensive, takes couple of hours at room temperature, requires a simple mixing of sample with the label, followed by a 5-minute validated purification step, and could be accomplished using a kit. In this Phase I proposal we aim to demonstrate: ? Near 100% labeling (true positives), practically negligible internucleotide bond cleavage, and negligible false positives for RNA that is osmylated at the pyrimidines, as well as platinated at guanosine (RNA-OsBp/Pt). ? Readable translocation via the MinION, or an alternative nanopore platform, with a two-base sensing regime for RNA-OsBp/Pt, and ? Profiling/sequencing 20 to 70 nucleotide long synthetic RNAs in a mixture, as a proof-of-principle for a miRNA profile assay, and the extension of the technology to medium size RNAs. Success in these efforts may lead to the development of an inexpensive, non-invasive medical diagnostic test for a broad range of disease and well-being conditions, i.e., a miRNA profile assay from biological fluids. A nanopore-based miRNA panel assay will also pave the way for sequencing the transcriptome, including identification of post-transcriptionally modified bases, without the need for sample amplification, sample library preparation, or cDNA synthesis.
Advances in personalized medicine for diagnosis and treatment of disease require sequencing the RNA transcriptome with technologies that are cheap, fast, and accurate. Current technologies have made big strides, but still lag behind in accuracy and/or expense, take weeks from time of taking sample to medical results, and leave post-transcriptionally modified RNA bases unidentified. Nanopore- based platforms/devices that exhibit a two-base sensing, like the one addressed in this proposal, will improve sequencing of the transcriptome by simplifying sample processing, by identifying all the nucleobases, canonical or modified, by including all RNAs, i.e. short, long and ones with no poly(A) tail, and by improving accuracy from current 85% to over 90%.