Statistical, population genetics and genetic epidemiology

Zaykin, Dmitri

Abstract

Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. We have been developing tools for assessing predictability of common measures of statistical significance of research findings. These methods operate on test statistics or P-values as summaries of data and also incorporate external or prior information for making inference about uncertainty in statistics or parameters of interest, such as P-values or risk of disease. In particular, we proposed Bayesian intervals for prediction of P-value variability in replication studies which are resistant to selection bias and have endpoints that are directly interpretable as probabilistic bounds for replication P-values. Our intervals equip researchers with quantitative assessment of what they may expect if they would have repeated their statistical analysis using an independent confirmatory sample. In related work, that is currently completed and submitted for publication, and available at https://arxiv.org/abs/1806.04251 and at https://arxiv.org/abs/1802.04321 we 1. Introduce a new measure of risk to disease (g') which ranges from -1 to 1 and offer a straightforward statistical method to enable posterior inference for both g' as well as for traditional effect size measures, such as the logarithm of the odds ratio. A great advantage of our proposal is that it can be implemented without access to actual data, because only commonly published summaries of data need to be known, such as the value of a test statistic and the associated standard error. 2. Introduce new methods to combine top-ranking statistical associations. These methods can be used in observational studies to detect an aggregated effect of multiple weak predictors on complex disorders. They are also being applied in collaborative project with Dr. Gordenin's group to explore patterns of somatic mutations in cancer genomes. Without doubt, practical applications, as well as methodological extensions of methods based on top-ranking statistics, are hindered by their computational complexity. In the course of this work we derived the exact distribution of the rank truncated product (RTP) that substantially simplifies its evaluation to a single line of R code. Previously published papers that tackled the RTP distribution (including our work in Zaykin, 2007) resulted in page-long mathematical formulas that involved multiple integrals. Further, (a) we suggested an efficient adaptive method that does not require time consuming computer simulations; (b) developed extensions for combining correlated effects with substantial gain in power compared to previously published methods; and (c) proposed a highly promising combination statistic that captures main features of RTP but has higher power and can be implemented using an elementary R code.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Environmental Health Sciences (NIEHS)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIAES101866-14
Application #: 9785213
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 14
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: U.S. National Inst of Environ Hlth Scis
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Martin, Loren J; Smith, Shad B; Khoutorsky, Arkady et al. (2017) Epiregulin and EGFR interactions are involved in pain processing. J Clin Invest 127:3353-3366

Vsevolozhskaya, Olga; Ruiz, Gabriel; Zaykin, Dmitri (2017) Bayesian prediction intervals for assessing P-value variability in prospective replication studies. Transl Psychiatry 7:1271

Vsevolozhskaya, Olga A; Kuo, Chia-Ling; Ruiz, Gabriel et al. (2017) The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol 41:726-743

Dong, Jing; Wyss, Annah; Yang, Jingyun et al. (2017) Genome-Wide Association Analysis of the Sense of Smell in U.S. Older Adults: Identification of Novel Risk Loci in African-Americans and European-Americans. Mol Neurobiol 54:8021-8032

Shi, Min; O'Brien, Katie M; Sandler, Dale P et al. (2017) Previous GWAS hits in relation to young-onset breast cancer. Breast Cancer Res Treat 161:333-344

O'Brien, Katie M; Shi, Min; Sandler, Dale P et al. (2016) A family-based, genome-wide association study of young-onset breast cancer: inherited variants and maternally mediated effects. Eur J Hum Genet 24:1316-23

Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A et al. (2016) Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models. Genet Epidemiol 40:210-221

Weinberg, Clarice R; Zaykin, Dmitri (2015) Response. J Natl Cancer Inst 107:

Wieskopf, Jeffrey S; Mathur, Jayanti; Limapichat, Walrati et al. (2015) The nicotinic ?6 subunit gene determines variability in chronic pain sensitivity via cross-inhibition of P2X2/3 receptors. Sci Transl Med 7:287ra72

Kuo, Chia-Ling; Vsevolozhskaya, Olga A; Zaykin, Dmitri V (2015) Assessing the Probability that a Finding Is Genuine for Large-Scale Genetic Association Studies. PLoS One 10:e0124107

Showing the most recent 10 out of 29 publications

Comments

Be the first to comment on Dmitri Zaykin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: