The rapid development of Third Generation, Long Read Sequencing (LRS) platforms such as Pacbio and Oxford Nanopore Technologies (ONT) have enabled increasing precision and higher-throughput sequencing of transcripts. Long reads can produce full-length transcript sequences, overcoming much of the uncertainty of short-read methods to accurately define transcripts, particularity for those genes with alternative splicing (more than 90% of human genes), for which short read sequencing has thus far proved difficult. LRS is therefore the natural choice for the study of the expression of transcript variants and of the role of alternative isoforms in disease and development. While the first iterations of the long-read technologies did not produce enough reads to quantify more than the highest expressed transcripts, the current sequencing depth of up to 8 million reads per SMRT cells on the Sequel 2 platforms promises reliable quantifiability for more modestly expressed genes. Also significant yield increases have been reported for Nanopore. This suggests that LRS may have reached sufficient throughput to enable accurate quantification of gene expression and differential expression analyses. LRS transcriptomics data have, however, specific properties that are absent in other transcriptomics technologies, such are partial matches of reference transcript models. Therefore specific methods for quantification and statistical analysis need to be developed. In this Project, we aim to characterize in detail the data distribution in long reads data, propose strategies to deal with their particular read uncertainty issues and develop new strategies for differential expression analysis. The overarching goal is to create the analytical framework to fully leverage LRS technologies for the study of isoform dynamics in relation of biomedical relevant questions.

Public Health Relevance

The goal of this project is to develop the SQANTI-QDE software, the first integral framework for the management of long read sequencing Iso-seq experiments. SQANTI-QDE will provide, in one tool, functionalities for the annotation and processing of multiple samples, improved definition of bona-fide transcripts, quantification of transcript expression, flexible creation of count matrices, data normalization, and differential expression and isoform usage analysis. Highly replicated, deep sequenced long read sequencing transcriptomics datasets will be created as part of this project.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21HG011280-01
Application #
10041221
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Gilchrist, Daniel A
Project Start
2020-09-01
Project End
2022-08-30
Budget Start
2020-09-01
Budget End
2022-08-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Florida
Department
Microbiology/Immun/Virology
Type
Earth Sciences/Resources
DUNS #
969663814
City
Gainesville
State
FL
Country
United States
Zip Code
32611