Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. We have been developing tools for assessing predictability of common measures of statistical significance of research findings. These methods operate on test statistics or P-values as summaries of data and also incorporate external or prior information for making inference about uncertainty in statistics or parameters of interest, such as P-values or risk of disease. In particular, we proposed Bayesian intervals for prediction of P-value variability in replication studies which are resistant to selection bias and have endpoints that are directly interpretable as probabilistic bounds for replication P-values. Our intervals equip researchers with quantitative assessment of what they may expect if they would have repeated their statistical analysis using an independent confirmatory sample. In related work, that is currently completed and submitted for publication, and available at https://arxiv.org/abs/1806.04251 and at https://arxiv.org/abs/1802.04321 we 1. Introduce a new measure of risk to disease (g') which ranges from -1 to 1 and offer a straightforward statistical method to enable posterior inference for both g' as well as for traditional effect size measures, such as the logarithm of the odds ratio. A great advantage of our proposal is that it can be implemented without access to actual data, because only commonly published summaries of data need to be known, such as the value of a test statistic and the associated standard error. 2. Introduce new methods to combine top-ranking statistical associations. These methods can be used in observational studies to detect an aggregated effect of multiple weak predictors on complex disorders. They are also being applied in collaborative project with Dr. Gordenin's group to explore patterns of somatic mutations in cancer genomes. Without doubt, practical applications, as well as methodological extensions of methods based on top-ranking statistics, are hindered by their computational complexity. In the course of this work we derived the exact distribution of the rank truncated product (RTP) that substantially simplifies its evaluation to a single line of R code. Previously published papers that tackled the RTP distribution (including our work in Zaykin, 2007) resulted in page-long mathematical formulas that involved multiple integrals. Further, (a) we suggested an efficient adaptive method that does not require time consuming computer simulations; (b) developed extensions for combining correlated effects with substantial gain in power compared to previously published methods; and (c) proposed a highly promising combination statistic that captures main features of RTP but has higher power and can be implemented using an elementary R code.
Showing the most recent 10 out of 29 publications