Various array normalization methods have been developed for gene expression microarrays. Most of these methods assume few or symmetric differential expression between sample groups. There has been no systematic study of the properties of these methods in normalizing microRNA expression arrays utilizing heterogeneous samples such as tumors. MicroRNA arrays contain only a few hundred microRNAs, and are likely to have a relatively large proportion being differentially expressed between diverse tumor groups. The assessment of normalization methods in this setting is difficult because of the lack of a benchmark dataset that has no confounding array effects. We propose to design and generate such benchmark datasets, perform a systematic assessment of normalization methods with a particular emphasis on the utility of these models for detecting markers with differential expression, and from the benchmark data design derive statistical models that acknowledge heterogeneities inherent to tumor samples.

Public Health Relevance

Microarrays are being widely used in cancer research. A critical step for processing microarray data is to normalize the arrays so that measurements from different arrays are comparable. There is a great need to evaluate the properties of statistical methods for array normalization when they are applied to microRNA arrays utilizing heterogeneous samples such as tumors.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Cancer Biomarkers Study Section (CBSS)
Program Officer
Dunn, Michelle C
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Sloan-Kettering Institute for Cancer Research
New York
United States
Zip Code
Qin, Li-Xuan; Huang, Huei-Chung; Villafania, Liliana et al. (2018) A pair of datasets for microRNA expression profiling to examine the use of careful study design for assigning arrays to samples. Sci Data 5:180084
Qin, Li-Xuan; Tuschl, Thomas; Singer, Samuel (2016) Empirical insights into the stochasticity of small RNA sequencing. Sci Rep 6:24061
Qin, Li-Xuan; Huang, Huei-Chung; Begg, Colin B (2016) Cautionary Note on Using Cross-Validation for Molecular Classification. J Clin Oncol 34:3931-3938
Qin, Li-Xuan; Levine, Douglas A (2016) Study design and data analysis considerations for the discovery of prognostic molecular biomarkers: a case study of progression free survival in advanced serous ovarian cancer. BMC Med Genomics 9:27
Barlin, Joyce N; Zhou, Qin C; Leitao, Mario M et al. (2015) Molecular subtypes of uterine leiomyosarcoma and correlation with clinical outcome. Neoplasia 17:183-9
Huang, Huei-Chung; Niu, Yi; Qin, Li-Xuan (2015) Differential Expression Analysis for RNA-Seq: An Overview of Statistical Methods and Computational Software. Cancer Inform 14:57-67
Shi, Jiejun; Qin, Li-Xuan (2014) CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering. Cancer Inform 13:11-3
Qin, Li-Xuan; Huang, Huei-Chung; Zhou, Qin (2014) Preprocessing Steps for Agilent MicroRNA Arrays: Does the Order Matter? Cancer Inform 13:105-9
Qin, Li-Xuan; Breeden, Linda; Self, Steven G (2014) Finding gene clusters for a replicated time course study. BMC Res Notes 7:60
Qin, Li-Xuan; Zhou, Qin; Bogomolniy, Faina et al. (2014) Blocking and randomization to improve molecular biomarker discovery. Clin Cancer Res 20:3371-8

Showing the most recent 10 out of 12 publications