Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. We have been developing tools for assessing predictability of common measures of statistical significance of research findings. These methods operate on test statistics or P-values as summaries of data and also incorporate external or prior information for making inference about uncertainty in statistics or parameters of interest, such as P-values or risk of disease. In particular, we proposed Bayesian intervals for prediction of P-value variability in replication studies which are resistant to selection bias and have endpoints that are directly interpretable as probabilistic bounds for replication P-values. Our intervals equip researchers with quantitative assessment of what they may expect if they would have repeated their statistical analysis using an independent confirmatory sample. In related work, that is currently completed and submitted for publication, and available at https://arxiv.org/abs/1806.04251 and at https://arxiv.org/abs/1802.04321 we 1. Introduce a new measure of risk to disease (g') which ranges from -1 to 1 and offer a straightforward statistical method to enable posterior inference for both g' as well as for traditional effect size measures, such as the logarithm of the odds ratio. A great advantage of our proposal is that it can be implemented without access to actual data, because only commonly published summaries of data need to be known, such as the value of a test statistic and the associated standard error. 2. Introduce new methods to combine top-ranking statistical associations. These methods can be used in observational studies to detect an aggregated effect of multiple weak predictors on complex disorders. They are also being applied in collaborative project with Dr. Gordenin's group to explore patterns of somatic mutations in cancer genomes. Without doubt, practical applications, as well as methodological extensions of methods based on top-ranking statistics, are hindered by their computational complexity. In the course of this work we derived the exact distribution of the rank truncated product (RTP) that substantially simplifies its evaluation to a single line of R code. Previously published papers that tackled the RTP distribution (including our work in Zaykin, 2007) resulted in page-long mathematical formulas that involved multiple integrals. Further, (a) we suggested an efficient adaptive method that does not require time consuming computer simulations; (b) developed extensions for combining correlated effects with substantial gain in power compared to previously published methods; and (c) proposed a highly promising combination statistic that captures main features of RTP but has higher power and can be implemented using an elementary R code.

Project Start
Project End
Budget Start
Budget End
Support Year
14
Fiscal Year
2018
Total Cost
Indirect Cost
Name
U.S. National Inst of Environ Hlth Scis
Department
Type
DUNS #
City
State
Country
Zip Code
Dong, Jing; Wyss, Annah; Yang, Jingyun et al. (2017) Genome-Wide Association Analysis of the Sense of Smell in U.S. Older Adults: Identification of Novel Risk Loci in African-Americans and European-Americans. Mol Neurobiol 54:8021-8032
Shi, Min; O'Brien, Katie M; Sandler, Dale P et al. (2017) Previous GWAS hits in relation to young-onset breast cancer. Breast Cancer Res Treat 161:333-344
Martin, Loren J; Smith, Shad B; Khoutorsky, Arkady et al. (2017) Epiregulin and EGFR interactions are involved in pain processing. J Clin Invest 127:3353-3366
Vsevolozhskaya, Olga; Ruiz, Gabriel; Zaykin, Dmitri (2017) Bayesian prediction intervals for assessing P-value variability in prospective replication studies. Transl Psychiatry 7:1271
Vsevolozhskaya, Olga A; Kuo, Chia-Ling; Ruiz, Gabriel et al. (2017) The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol 41:726-743
O'Brien, Katie M; Shi, Min; Sandler, Dale P et al. (2016) A family-based, genome-wide association study of young-onset breast cancer: inherited variants and maternally mediated effects. Eur J Hum Genet 24:1316-23
Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A et al. (2016) Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models. Genet Epidemiol 40:210-221
Vsevolozhskaya, Olga A; Greenwood, Mark C; Powell, Scott L et al. (2015) Resampling-based multiple comparison procedure with application to point-wise testing with functional data. Environ Ecol Stat 22:45-59
Meloto, Carolina B; Segall, Samantha K; Smith, Shad et al. (2015) COMT gene locus: new functional variants. Pain 156:2072-83
Weinberg, Clarice R; Zaykin, Dmitri (2015) Response. J Natl Cancer Inst 107:

Showing the most recent 10 out of 29 publications