While RNA-Seq experiments based on Second Generation Sequencing (SGS) short reads have enabled remarkable advances in our ability to analyze the transcriptome, a few fundamental problems remain unsolved due to the high complexity of the genome and the inability to identify combinatorial genomic events. Third Generation Sequencing (TGS), including PacBio sequencing and Oxford Nanopore Technologies (ONT) which provide much longer reads (1-100kb), has the potential to overcome these problems. However, the current high-cost and laborious strategy of only using PacBio data is not practical for mid-size labs. Hybrid sequencing (?Hybrid-Seq?), which integrates TGS and SGS data, has emerged as an approach to address the limitations associated with analysis of short SGS reads and the error rate of TGS reads. However, tools to analyze Hybrid- Seq transcriptome data are not currently available because the majority of methodological developments have focused on Hybrid-Seq genomic data. In order to improve our understanding of transcriptome complexity, we will develop a comprehensive Hybrid-Seq platform of novel statistical and computational methods to analyze TGS long reads with the aid of SGS short reads, and to identify gene isoforms, fusion transcripts and allele- specific expression (ASE). The proposed studies build on our published and preliminary work where we developed methods for error correction for TGS data and detection of novel gene isoforms, which were applied to Hybrid-Seq transcriptome data from human embryonic stem cells (hESCs).
In Aim 1, we will develop computational and statistical approaches to identify and quantify gene isoforms.
In Aim 2, we will develop computational methods to discover fusion transcripts.
In Aim 3, we will determine the haplotypes of gene alleles and quantify ASE using Hybrid-Seq data. The methods developed in this proposal will be integrated into a software platform for analysis of Hybrid-Seq transcriptome data. This user-friendly bioinformatics platform will have important positive impacts by providing an unprecedented opportunity for comprehensive transcriptome profiling, with broad applicability and higher resolution. In addition, these tools will enable more researchers to apply Hybrid-Seq to their transcriptome studies.

Public Health Relevance

/ PUBLIC HEALTH RELEVANCE STATEMENT Hybrid-Seq strategy combines the strengths of Third Generation Sequencing and Second Generation Sequencing and overcomes the weakness of two techniques. Our sophisticated data analysis platform will provide a set of robust and handy tools to fully analyze Hybrid-Seq transcriptome data, such that this cutting-edge technology can be affordable and feasible in biomedical research laboratories of all sizes. The analysis of Hybrid-Seq data can identify real products of functional genes, providing a solid foundation for human transcriptome research.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG008759-01A1
Application #
9176845
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pillai, Ajay
Project Start
2016-09-09
Project End
2021-06-30
Budget Start
2016-09-09
Budget End
2017-06-30
Support Year
1
Fiscal Year
2016
Total Cost
$388,851
Indirect Cost
$133,867
Name
University of Iowa
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
062761671
City
Iowa City
State
IA
Country
United States
Zip Code
52246