Next-generation sequencing has brought revolutionary genome-wide, dense resolution and high-throughput capability to perform various types of omics analyses, including gene expression, methylation, fusion gene, somatic mutation and many others. With the dropping costs, the technology is gaining popularity. The experimental expenses, however, remain significant and power calculation tools are essential to adequately design and guide an NGS analysis. Unlike power calculation in traditional experiments or microarrays, power calculation in NGS require simultaneous consideration of sample size and sequencing depth and count-data also bring statistical challenges. We propose the following aims in this proposal: (1a) Develop power calculation tools for differential expression analysis from RNA-seq experiments. Optimal sample size and sequencing depth are jointly determined by power function and budget constraints. (1b) Develop power calculation tools for differential methylation in methyl-seq experiments. (2a) Develop power calculation tools for fusion gene detection in cancer using RNA-seq. Identify sample size and sequencing depth needed for fusion genes with low prevalence and low allelic-fraction. (2b) Perform additional ultra-deep sequencing in the preliminary prostate study to identify additional low-allelic-fraction and prognosis predictive fusion genes. Successful completion of these aims will provide state-of-the-art power calculation tools for the fast growing projects using NGS technology for candidate marker and fusion gene detection.
Next generation sequencing studies have been widely conducted for biomarker detection, including differentially expressed genes, differentially methylated genes and fusion genes. These types of studies are costly and have considerable experimental design issues. We will develop power calculation tools to provide practical guidance to the planning of these studies.
|Kim, Sunghwan; Oesterreich, Steffi; Kim, Seyoung et al. (2017) Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization. Biostatistics 18:165-179|
|Huo, Zhiguang; Tseng, George (2017) Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery. Ann Appl Stat 11:1011-1039|
|French, Leon; Ma, TianZhou; Oh, Hyunjung et al. (2017) Age-Related Gene Expression in the Frontal Cortex Suggests Synaptic Function Changes in Specific Inhibitory Neuron Subtypes. Front Aging Neurosci 9:162|
|Ma, Tianzhou; Liang, Faming; Tseng, George (2017) Biomarker detection and categorization in ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models. J R Stat Soc Ser C Appl Stat 66:847-867|
|Huo, Zhiguang; Ding, Ying; Liu, Silvia et al. (2016) Meta-analytic framework for sparse K-means to identify disease subtypes in multiple transcriptomic studies. J Am Stat Assoc 111:27-42|
|Richardson, Sylvia; Tseng, George C; Sun, Wei (2016) Statistical Methods in Integrative Genomics. Annu Rev Stat Appl 3:181-209|
|Liu, Silvia; Tsai, Wei-Hsiang; Ding, Ying et al. (2016) Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Res 44:e47|
|Luo, Jian-Hua; Liu, Silvia; Zuo, Ze-Hua et al. (2015) Discovery and Classification of Fusion Transcripts in Prostate Cancer and Normal Prostate Tissue. Am J Pathol 185:1834-45|