Methods for in-depth characterization of transcriptomes and quantification of transcript levels have emerged as valuable tools for understanding cellular physiology and human disease biology, and have begun to be utilized in various clinical diagnostic applications. Current methods, however, typically require RNA to be converted to cDNA prior to measurements. This step has been shown to introduce many biases and artifacts. In order to best characterize the "true" transcriptome, we propose the application of single molecule, true Direct RNA Sequencing (tDRS) in which RNA is sequenced without prior conversion to cDNA. The benefits of tDRS include the ability to use minute quantities (e.g. on the order of picograms) of RNA with minimal/no sample manipulation, the ability to analyze short RNAs which pose unique challenges for analysis using cDNA-based approaches, and the ability to perform these analyses in a low-cost and high-throughput manner. This application proposes strategies to adapt tDRS for multiple transcriptome analysis methods routinely used by the research and medical community. Combined with strategies for incremental improvements in read lengths, throughput and error rates, substantial progress will be made towards the ultimate goal of obtaining a bias-free view of transcriptomes. We will apply this technology to diverse RNA samples, including those from historical formalin-fixed, paraffin-embedded (FFPE) tissue specimens, leading to revolutionary advances in the understanding of biological and disease pathways. Our research plan brings together Helicos researchers and world experts in the field of RNA, genomics and medicine to achieve this goal.
Our aims will be to:
Specific Aim 1. Develop transcriptome analysis methods and tools for single-molecule Direct RNA sequencing of standard samples. Samples including yeast and ENCODE cell line RNAs will be analyzed with unprecedented detection and quantitation performance. We will develop: 1) prep-free gene expression, 2) whole transcriptome Direct RNA Sequencing, and 3) paired ends/paired reads with Direct RNA sequencing.
Specific Aim 2. Develop tools to process RNA samples from FFPE tissue samples with single- molecule Direct RNA Sequencing, and apply these for the sequencing of influenza virus genomes from circa 1918. Short RNA species, which can not be satisfactorily analyzed with cDNA-based methodologies, can be analyzed with tDRS. Our goals include: 1) tailing and sequencing minute quantities of FFPE Tissue RNA, 2) development of FFPE tissue RNA quantification methods, 3) rRNA reduction from fragmented RNA samples, and 4) sequence analysis of archival influenza virus genomes from circa 1918 autopsy tissues.
Specific Aim 3. Optimize chemical and enzymatic reagents for use with single-molecule direct RNA sequencing. Short-term and long-term improvements in tDRS performance will allow longer read lengths, higher throughput and reduced errors. We will improve tDRS by: 1) screening and synthesis of modified nucleotides for superior performance, and 2) screening and mutagenesis of polymerases for optimal behavior.

Public Health Relevance

In 2003, Human Genome Project (HGP) released a working draft of the human genome sequence. This unprecedented scientific achievement was the result of 13 years of effort by an international coalition of scientists and some $3 billion in U.S. government funding. The project has facilitated research by providing a framework for the genome, which is now being used for further investigation of the biological mechanisms underlying human disease. The HGP was followed closely by the ENCODE Consortium which characterized the genomic architecture of a fraction of the human genome - revealing for the first time the dynamic state and unexpected complexity of the transcriptome. Technological advancements now enable sequencing of genomes and studies of the transcriptome at a fraction of the time and cost. Yet transcriptome studies to date have relied on indirect measurements by sequencing complementary DNA (cDNA). Conversion of RNA into cDNA and subsequent sample manipulation steps have been shown to introduce many biases and artifacts. This ground breaking project is designed to improve Helicos'single molecule true Direct RNA Sequencing technology- a new technology which allows for an unprecedented characterization of transcriptomes and quantification of RNA transcript levels which more closely reflect the true biology of the cell. The proposed research extends this technological advancement to gene expression and whole transcriptome analyses and to the analysis of RNAs obtained from formalin-fixed paraffin-embedded tissues, which offers new opportunities in cancer research. The ability to utilize low picogram quantities of RNA will enable research into rare and limiting cell types including circulating tumor cells without the complications of sample manipulation/amplification steps. It will also allow archival samples to be analyzed with unprecedented efficiency, and be particularly useful for understanding the pathogenesis surrounding the 1918 influenza virus. This high-resolution, unbiased transcriptome view will ultimately open new research avenues and lead to a better understanding of the biological mechanisms underlying disease states, such as cancer, heart disease, diabetes, and others and will ultimately aid in identifying revolutionary new ways to diagnose, treat and prevent human disease.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Schloss, Jeffery
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Helicos Bioscience Corporation
United States
Zip Code
Ozsolak, Fatih (2014) Quantitative polyadenylation site mapping with single-molecule direct RNA sequencing. Methods Mol Biol 1125:145-55
Ferrari, Francesco; Plachetka, Annette; Alekseyenko, Artyom A et al. (2013) "Jump start and gain" model for dosage compensation in Drosophila based on direct sequencing of nascent transcripts. Cell Rep 5:629-36
Ozsolak, Fatih (2012) Third-generation sequencing techniques and applications to drug discovery. Expert Opin Drug Discov 7:231-43
Kapranov, Philipp; Ozsolak, Fatih; Milos, Patrice M (2012) Profiling of short RNAs using Helicos single-molecule sequencing. Methods Mol Biol 822:219-32
Ozsolak, Fatih; Milos, Patrice M (2011) Transcriptome profiling using single-molecule direct RNA sequencing. Methods Mol Biol 733:51-61
Ozsolak, Fatih; Milos, Patrice M (2011) Single-molecule direct RNA sequencing without cDNA synthesis. Wiley Interdiscip Rev RNA 2:565-70
Ozsolak, Fatih; Milos, Patrice M (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12:87-98
Hart, Christopher; Lipson, Doron; Ozsolak, Fatih et al. (2010) Single-molecule sequencing: sequence methods to enable accurate quantitation. Methods Enzymol 472:407-30
Ozsolak, Fatih; Kapranov, Philipp; Foissac, Sylvain et al. (2010) Comprehensive polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143:1018-29
Ozsolak, Fatih; Platt, Adam R; Jones, Dan R et al. (2009) Direct RNA sequencing. Nature 461:814-8