Bioinformatics platform for Hybrid-Seq transcriptome data analysis

Au, Kin

Abstract

While RNA-Seq experiments based on Second Generation Sequencing (SGS) short reads have enabled remarkable advances in our ability to analyze the transcriptome, a few fundamental problems remain unsolved due to the high complexity of the genome and the inability to identify combinatorial genomic events. Third Generation Sequencing (TGS), including PacBio sequencing and Oxford Nanopore Technologies (ONT) which provide much longer reads (1-100kb), has the potential to overcome these problems. However, the current high-cost and laborious strategy of only using PacBio data is not practical for mid-size labs. Hybrid sequencing (?Hybrid-Seq?), which integrates TGS and SGS data, has emerged as an approach to address the limitations associated with analysis of short SGS reads and the error rate of TGS reads. However, tools to analyze Hybrid- Seq transcriptome data are not currently available because the majority of methodological developments have focused on Hybrid-Seq genomic data. In order to improve our understanding of transcriptome complexity, we will develop a comprehensive Hybrid-Seq platform of novel statistical and computational methods to analyze TGS long reads with the aid of SGS short reads, and to identify gene isoforms, fusion transcripts and allele- specific expression (ASE). The proposed studies build on our published and preliminary work where we developed methods for error correction for TGS data and detection of novel gene isoforms, which were applied to Hybrid-Seq transcriptome data from human embryonic stem cells (hESCs).
In Aim 1, we will develop computational and statistical approaches to identify and quantify gene isoforms.
In Aim 2, we will develop computational methods to discover fusion transcripts.
In Aim 3, we will determine the haplotypes of gene alleles and quantify ASE using Hybrid-Seq data. The methods developed in this proposal will be integrated into a software platform for analysis of Hybrid-Seq transcriptome data. This user-friendly bioinformatics platform will have important positive impacts by providing an unprecedented opportunity for comprehensive transcriptome profiling, with broad applicability and higher resolution. In addition, these tools will enable more researchers to apply Hybrid-Seq to their transcriptome studies.

Public Health Relevance

/ PUBLIC HEALTH RELEVANCE STATEMENT Hybrid-Seq strategy combines the strengths of Third Generation Sequencing and Second Generation Sequencing and overcomes the weakness of two techniques. Our sophisticated data analysis platform will provide a set of robust and handy tools to fully analyze Hybrid-Seq transcriptome data, such that this cutting-edge technology can be affordable and feasible in biomedical research laboratories of all sizes. The analysis of Hybrid-Seq data can identify real products of functional genes, providing a solid foundation for human transcriptome research.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG008759-05
Application #: 9733297
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Pillai, Ajay

Project Start: 2016-09-09
Project End: 2021-06-30
Budget Start: 2019-07-01
Budget End: 2020-06-30
Support Year: 5
Fiscal Year: 2019
Total Cost
Indirect Cost

Institution

Name: Ohio State University
Department
Type
DUNS #: 832127323

City: Columbus
State: OH
Country: United States
Zip Code: 43210

Related projects


NIH 2020 R01 HG	Bioinformatics platform for Hybrid-Seq transcriptome data analysis Au, Kin Fai / Ohio State University
NIH 2019 R01 HG	Bioinformatics platform for Hybrid-Seq transcriptome data analysis Au, Kin Fai / Ohio State University
NIH 2018 R01 HG	Bioinformatics platform for Hybrid-Seq transcriptome data analysis Au, Kin Fai / University of Iowa
NIH 2018 R01 HG	Bioinformatics platform for Hybrid-Seq transcriptome data analysis Au, Kin Fai / Ohio State University
NIH 2017 R01 HG	Bioinformatics platform for Hybrid-Seq transcriptome data analysis Au, Kin Fai / University of Iowa
NIH 2016 R01 HG	Bioinformatics platform for Hybrid-Seq transcriptome data analysis Au, Kin Fai / University of Iowa	$388,851

Publications

Fu, Shuhua; Ma, Yingke; Yao, Hui et al. (2018) IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing. Bioinformatics 34:2168-2176

Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason et al. (2017) IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res 45:e32

He, Liu; Fu, Shuhua; Xu, Zhichao et al. (2017) Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale. Genes (Basel) 8:

Weirather, Jason L; de Cesare, Mariateresa; Wang, Yunhao et al. (2017) Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 6:100

Cook, Daniel P; Adam, Ryan J; Zarei, Keyan et al. (2017) CF airway smooth muscle transcriptome reveals a role for PYK2. JCI Insight 2:

Sahraeian, Sayed Mohammad Ebrahim; Mohiyuddin, Marghoob; Sebra, Robert et al. (2017) Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat Commun 8:59

Wong, Wing Tak; Matrone, Gianfranco; Tian, XiaoYu et al. (2017) Discovery of novel determinants of endothelial lineage using chimeric heterokaryons. Elife 6:

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: