Next-generation sequencing has brought revolutionary genome-wide, dense resolution and high-throughput capability to perform various types of omics analyses, including gene expression, methylation, fusion gene, somatic mutation and many others. With the dropping costs, the technology is gaining popularity. The experimental expenses, however, remain significant and power calculation tools are essential to adequately design and guide an NGS analysis. Unlike power calculation in traditional experiments or microarrays, power calculation in NGS require simultaneous consideration of sample size and sequencing depth and count-data also bring statistical challenges. We propose the following aims in this proposal: (1a) Develop power calculation tools for differential expression analysis from RNA-seq experiments. Optimal sample size and sequencing depth are jointly determined by power function and budget constraints. (1b) Develop power calculation tools for differential methylation in methyl-seq experiments. (2a) Develop power calculation tools for fusion gene detection in cancer using RNA-seq. Identify sample size and sequencing depth needed for fusion genes with low prevalence and low allelic-fraction. (2b) Perform additional ultra-deep sequencing in the preliminary prostate study to identify additional low-allelic-fraction and prognosis predictive fusion genes. Successful completion of these aims will provide state-of-the-art power calculation tools for the fast growing projects using NGS technology for candidate marker and fusion gene detection.

Public Health Relevance

Next generation sequencing studies have been widely conducted for biomarker detection, including differentially expressed genes, differentially methylated genes and fusion genes. These types of studies are costly and have considerable experimental design issues. We will develop power calculation tools to provide practical guidance to the planning of these studies.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
1R01CA190766-01A1
Application #
8963892
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Li, Jerry
Project Start
2015-07-07
Project End
2019-06-30
Budget Start
2015-07-07
Budget End
2016-06-30
Support Year
1
Fiscal Year
2015
Total Cost
$225,022
Indirect Cost
$78,904
Name
University of Pittsburgh
Department
Biostatistics & Other Math Sci
Type
Other Domestic Higher Education
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Huo, Zhiguang; Tseng, George (2017) Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery. Ann Appl Stat 11:1011-1039
Ma, Tianzhou; Liang, Faming; Tseng, George (2017) Biomarker detection and categorization in ribonucleic acid sequencing meta-analysis using Bayesian hierarchical models. J R Stat Soc Ser C Appl Stat 66:847-867
Liu, Silvia; Tsai, Wei-Hsiang; Ding, Ying et al. (2016) Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data. Nucleic Acids Res 44:e47
Huo, Zhiguang; Ding, Ying; Liu, Silvia et al. (2016) Meta-analytic framework for sparse K-means to identify disease subtypes in multiple transcriptomic studies. J Am Stat Assoc 111:27-42
Richardson, Sylvia; Tseng, George C; Sun, Wei (2016) Statistical Methods in Integrative Genomics. Annu Rev Stat Appl 3:181-209
Luo, Jian-Hua; Liu, Silvia; Zuo, Ze-Hua et al. (2015) Discovery and Classification of Fusion Transcripts in Prostate Cancer and Normal Prostate Tissue. Am J Pathol 185:1834-45