While RNA-Seq experiments based on Second Generation Sequencing (SGS) short reads have enabled remarkable advances in our ability to analyze the transcriptome, a few fundamental problems remain unsolved due to the high complexity of the genome and the inability to identify combinatorial genomic events. Third Generation Sequencing (TGS), including PacBio sequencing and Oxford Nanopore Technologies (ONT) which provide much longer reads (1-100kb), has the potential to overcome these problems. However, the current high-cost and laborious strategy of only using PacBio data is not practical for mid-size labs. Hybrid sequencing (?Hybrid-Seq?), which integrates TGS and SGS data, has emerged as an approach to address the limitations associated with analysis of short SGS reads and the error rate of TGS reads. However, tools to analyze Hybrid- Seq transcriptome data are not currently available because the majority of methodological developments have focused on Hybrid-Seq genomic data. In order to improve our understanding of transcriptome complexity, we will develop a comprehensive Hybrid-Seq platform of novel statistical and computational methods to analyze TGS long reads with the aid of SGS short reads, and to identify gene isoforms, fusion transcripts and allele- specific expression (ASE). The proposed studies build on our published and preliminary work where we developed methods for error correction for TGS data and detection of novel gene isoforms, which were applied to Hybrid-Seq transcriptome data from human embryonic stem cells (hESCs).
In Aim 1, we will develop computational and statistical approaches to identify and quantify gene isoforms.
In Aim 2, we will develop computational methods to discover fusion transcripts.
In Aim 3, we will determine the haplotypes of gene alleles and quantify ASE using Hybrid-Seq data. The methods developed in this proposal will be integrated into a software platform for analysis of Hybrid-Seq transcriptome data. This user-friendly bioinformatics platform will have important positive impacts by providing an unprecedented opportunity for comprehensive transcriptome profiling, with broad applicability and higher resolution. In addition, these tools will enable more researchers to apply Hybrid-Seq to their transcriptome studies.

Public Health Relevance

/ PUBLIC HEALTH RELEVANCE STATEMENT Hybrid-Seq strategy combines the strengths of Third Generation Sequencing and Second Generation Sequencing and overcomes the weakness of two techniques. Our sophisticated data analysis platform will provide a set of robust and handy tools to fully analyze Hybrid-Seq transcriptome data, such that this cutting-edge technology can be affordable and feasible in biomedical research laboratories of all sizes. The analysis of Hybrid-Seq data can identify real products of functional genes, providing a solid foundation for human transcriptome research.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG008759-05
Application #
9733297
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pillai, Ajay
Project Start
2016-09-09
Project End
2021-06-30
Budget Start
2019-07-01
Budget End
2020-06-30
Support Year
5
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
832127323
City
Columbus
State
OH
Country
United States
Zip Code
43210
Fu, Shuhua; Ma, Yingke; Yao, Hui et al. (2018) IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing. Bioinformatics 34:2168-2176
Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason et al. (2017) IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res 45:e32
He, Liu; Fu, Shuhua; Xu, Zhichao et al. (2017) Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale. Genes (Basel) 8:
Weirather, Jason L; de Cesare, Mariateresa; Wang, Yunhao et al. (2017) Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 6:100
Cook, Daniel P; Adam, Ryan J; Zarei, Keyan et al. (2017) CF airway smooth muscle transcriptome reveals a role for PYK2. JCI Insight 2:
Sahraeian, Sayed Mohammad Ebrahim; Mohiyuddin, Marghoob; Sebra, Robert et al. (2017) Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat Commun 8:59
Wong, Wing Tak; Matrone, Gianfranco; Tian, XiaoYu et al. (2017) Discovery of novel determinants of endothelial lineage using chimeric heterokaryons. Elife 6: