MicroRNAs (miRNAs) are a class of small (18-24 nucleotide) RNAs that are essential regulators of gene expression, which act within the RNA-induced silencing complex (RISC) to bind mRNAs and suppress translation. Alterations in miRNA expression have been shown to disrupt entire cellular pathways, substantially contributing to a variety of human diseases. Despite nearly 25 years of research, miRNAs remain dicult to measure due to their short length, relatively small number, sequence similarity, and diculty to isolate from other small RNA fragments. While qPCR- and microarray-based miRNA assays are still widely used, the majority of recent studies use small RNA-seq (sRNA-seq) because it allows for the quanti cation of isomiRs (miRNA isoforms) and the possibility of identifying novel miRNAs. The processing of reads generated from sRNA-seq data globally distinguish between miRNA reads and those from other small RNAs, but do not necessarily capture the full spectrum of miRNA variation. Subsequent statistical analyses of processed sRNA-seq data are still performed using methods developed for mRNA-seq data despite the fact that sRNA-seq data violate several of the assumptions of these methods. Speci cally, methods for mRNA-seq data assume approximate independence between feature counts; however, the small total number of miRNAs and presence of a small number of very highly expressed miRNAs result in a lack of independence between miRNA counts. Additionally, normalization methods for mRNA-seq data assume either the overall level of transcription is constant across samples or an equal number of features are over- and under-expressed when comparing any two samples, neither of which hold for sRNA-seq data. The development of statistical methods that address the challenges of sRNA-seq data represents a critical need for miRNA research. Our long-term goal is to advance miRNA research by developing statistical methods that are tailored to the speci c complexities of miRNA expression data. The overall objective of this application is to improve the analysis of sRNA-seq data by developing statistical methods that account for challenges speci c to sRNA-seq data and outperform methods designed for mRNA-seq data. This addresses an urgent need for statistical methods to appropriately analyze sRNA-seq data, which are now routinely generated by large consortia such as TCGA and FANTOM. The rationale that underlies the proposed research is that methods that explicitly address the challenges inherent in measuring miRNAs are necessary to fully elucidate the role miRNAs play in many human disease processes.

Public Health Relevance

MicroRNAs are essential regulators of gene expression, alterations in which have been shown to disrupt entire cellu- lar pathways, substantially contributing to a variety of human diseases. Statistical methods that explicitly address the challenges inherent in measuring microRNA expression are necessary to fully elucidate the role microRNAs play in many human disease processes.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM139928-01
Application #
10092662
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2020-09-11
Project End
2025-06-30
Budget Start
2020-09-11
Budget End
2021-06-30
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Rochester
Department
Type
DUNS #
041294109
City
Rochester
State
NY
Country
United States
Zip Code
14627