The ability to sequence cellular RNA molecules has become an essential tool in clinical practice and biomedical research because it provides critical information on gene expression and variation. RNA sequencing technology is made possible by a unique type of enzyme: the reverse transcriptase (RT), which copies RNA molecules into DNA strands that can then be amplified by PCR. Unfortunately, conventional RTs are not highly processive and they only make short copies of RNA templates (short reads), which are then computationally stitched together using a reference genome in order to infer the sequence of intact RNA molecules. During this process, information on the relative prevalence and linkage between distal mutations, RNA editing sites and alternative splice sites within individual transcripts is lost. Many important cellular and viral RNAs are quite long (>1000 nts), and therefore accurate end-to-end sequencing will be required to conduct meaningful studies of their function and diversification. We recently discovered an ultra-processive RT from a eubacterial group II intron (the E.r. RT). Even without extensive optimization, the E.r. RT copies highly structured viral transcripts that are >9kb in size. Our goal is to optimize and enhance the E.r. RT to produce a robust reagent that is widely suitable for diverse biotechnology applications (Aim 1). We will then employ it to address two areas of major unmet need. The E.r. RT will be incorporated into Next Generation Sequencing (NGS) pipelines for monitoring viral evolution and drug resistance in HIV-infected patients (Aim 2). It will then be optimized to generate full-length cDNA libraries from known, but complex mixtures of RNA molecules and then utilized for whole-transcriptome sequencing in order to monitor the tissue specificity of alternative splicing in Drosophila (Aim 3). By performing biochemical optimization in parallel with real-world applications of the E.r. RT, we aim to create a powerful new RT reagent that fundamentally improves NGS, making it possible to study any mixture of RNA transcripts, allowing for genomic phasing and linkage analyses.

Public Health Relevance

Long RNA molecules are abundant in human cells and they play an important role in the lifecycle of many viruses. Sequence changes in long RNAs play a key role in human development and disease, however we lack the tools for accurately monitoring these changes and for understanding their impact on cellular function. To meet this need, we aim to develop a powerful new reverse-transcriptase enzyme for accurate end-to-end sequencing of long RNA molecules.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG009622-03
Application #
9743858
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Smith, Michael
Project Start
2017-09-01
Project End
2021-06-30
Budget Start
2019-07-01
Budget End
2021-06-30
Support Year
3
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Yale University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520