Next-generation sequencing has brought revolutionary genome-wide, dense resolution and high-throughput capability to perform various types of omics analyses, including gene expression, methylation, fusion gene, somatic mutation and many others. With the dropping costs, the technology is gaining popularity. The experimental expenses, however, remain significant and power calculation tools are essential to adequately design and guide an NGS analysis. Unlike power calculation in traditional experiments or microarrays, power calculation in NGS require simultaneous consideration of sample size and sequencing depth and count-data also bring statistical challenges. We propose the following aims in this proposal: (1a) Develop power calculation tools for differential expression analysis from RNA-seq experiments. Optimal sample size and sequencing depth are jointly determined by power function and budget constraints. (1b) Develop power calculation tools for differential methylation in methyl-seq experiments. (2a) Develop power calculation tools for fusion gene detection in cancer using RNA-seq. Identify sample size and sequencing depth needed for fusion genes with low prevalence and low allelic-fraction. (2b) Perform additional ultra-deep sequencing in the preliminary prostate study to identify additional low-allelic-fraction and prognosis predictive fusion genes. Successful completion of these aims will provide state-of-the-art power calculation tools for the fast growing projects using NGS technology for candidate marker and fusion gene detection.

Public Health Relevance

Next generation sequencing studies have been widely conducted for biomarker detection, including differentially expressed genes, differentially methylated genes and fusion genes. These types of studies are costly and have considerable experimental design issues. We will develop power calculation tools to provide practical guidance to the planning of these studies.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Biostatistics & Other Math Sci
Graduate Schools
United States
Zip Code
Kim, Sunghwan; Oesterreich, Steffi; Kim, Seyoung et al. (2017) Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization. Biostatistics 18:165-179
Huo, Zhiguang; Tseng, George (2017) Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery. Ann Appl Stat 11:1011-1039
French, Leon; Ma, TianZhou; Oh, Hyunjung et al. (2017) Age-Related Gene Expression in the Frontal Cortex Suggests Synaptic Function Changes in Specific Inhibitory Neuron Subtypes. Front Aging Neurosci 9:162
Ma, Tianzhou; Liang, Faming; Tseng, George (2017) Biomarker detection and categorization in ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models. J R Stat Soc Ser C Appl Stat 66:847-867
Huo, Zhiguang; Ding, Ying; Liu, Silvia et al. (2016) Meta-analytic framework for sparse K-means to identify disease subtypes in multiple transcriptomic studies. J Am Stat Assoc 111:27-42
Richardson, Sylvia; Tseng, George C; Sun, Wei (2016) Statistical Methods in Integrative Genomics. Annu Rev Stat Appl 3:181-209
Liu, Silvia; Tsai, Wei-Hsiang; Ding, Ying et al. (2016) Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Res 44:e47
Luo, Jian-Hua; Liu, Silvia; Zuo, Ze-Hua et al. (2015) Discovery and Classification of Fusion Transcripts in Prostate Cancer and Normal Prostate Tissue. Am J Pathol 185:1834-45