Statistical, population genetics and genetic epidemiology

Zaykin, Dmitri

Abstract

Our current research is motivated by problems of finding heritable components of common human diseases. Inheritance of common diseases reflects the complex interplay between genetic factors and environmental exposures. A formal way of characterizing the relative importance of genetic components is as a proportion of total phenotypic variability that is due to heritable variation. Identification of variants conferring disease risk has important public health implications. Risk variants discovered by genetic studies may reveal new therapeutic targets for common diseases. Close monitoring of carriers of genetic variants associated with efficacy and side effects to drugs can be a part of a strategy for successful management of diseases. Carriers of risk variants may respond in a specific way to environmental exposures. To this end, we develop statistical approaches for associating single genetic variants, multiple variants, and haplotypes with phenotypes related to diseases. Last year, we proposed to describe probabilistically the location of true (genuine) signals in the list of all results sorted by a measure of statistical significance: a true signal has a particular probability to rank first, second, third and so on. This defines the distribution of ranks for true signals. This approach has an immediate practical applicability. First, it allows researchers to prioritize statistical results of a large-scale study. Second, sample size can be calculated at the study design stage that is sufficient for a high proportion of true signals to aggregate among a specified number of top hits. Third, for a given sample size, the number of top hits to carry forward into the replication stage can be determined. We continue to advance this research direction. In the ongoing research, we address a typical situation of designing a replication study based on top hits selected from a large-scale, discovery stage study. The number of top hits is determined in such a way that the proportion of true associations is controlled at a desired level. In related research, we design models for the distribution of P‑values among non-associated variants. Knowledge of this distribution is essential for estimation of the proportion of genuine signals. We propose a novel method, the """"""""generalized genomic control"""""""", which extends methods of correcting for population stratification in important ways. Our method works off P‑values, thus, it is not tied to a particular statistical test. P‑values from sophisticated approaches that differ from simple chi-square tests for which genomic control is available can be employed and adjusted. It is understood that the amount of distortion due to population stratification is not the same for different loci and our method allows separate adjustments for different subsets of tests. In a related research direction, we devised methods for estimation of the """"""""effect size"""""""" distribution in large scale studies. Such distribution is of general interest, for example, for a disease phenotype it provides an approximate number of loci in the genome that carry a certain relative risk. We are also specifically interested in that distribution, because its characterization leads to more precise estimates of the proportion of genuine signals among top hits of a study. The research directions just described take advantage of information contained in measures of statistical significance (P‑values). Collectively, P‑values of a many tests considered at once provide information over and above that contained in any particular test statistic. A closely related topic of our research is the problem of combining many P‑values. The P‑value combination methods we developed were motivated by statistical genetics problems, yet they proved to be useful outside statistical genetics applications. These methods are useful in studies with multiple tests where the focus is on collective evidence from """"""""top hits"""""""". P‑values may originate from association tests for different genetic loci within a particular study, alternatively, they can correspond to association signals for the same locus among different studies. An active area of our research is development and applications of methods for combination of heterogeneous signals across studies of related phenotypes. We continue collaborative studies that involve researchers from NIH, and from UNC's Center for Neurosensory Disorders. Our collaborative work strives to incorporate well-defined analytical components: methods that we develop for a particular collaborative project and applications of our current methodological work. For practical use, we developed software that allows one to estimate ranking probabilities, number of tests that would contain true associations with a specified probability, as well as to plan discovery and replication stages in large-scale studies. Using this software, we guided a replication study for a genetic association scan of fibromyalgia. The study identified several genetic loci that may be involved in the pathophysiology of fibromyalgia and represent potential targets for therapeutic action.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of Environmental Health Sciences (NIEHS)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIAES101866-08
Application #: 8553776
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 8
Fiscal Year: 2012
Total Cost: $638,093
Indirect Cost

Institution

Name: National Institute of Environmental Health Sciences
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Martin, Loren J; Smith, Shad B; Khoutorsky, Arkady et al. (2017) Epiregulin and EGFR interactions are involved in pain processing. J Clin Invest 127:3353-3366

Vsevolozhskaya, Olga; Ruiz, Gabriel; Zaykin, Dmitri (2017) Bayesian prediction intervals for assessing P-value variability in prospective replication studies. Transl Psychiatry 7:1271

Vsevolozhskaya, Olga A; Kuo, Chia-Ling; Ruiz, Gabriel et al. (2017) The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol 41:726-743

Dong, Jing; Wyss, Annah; Yang, Jingyun et al. (2017) Genome-Wide Association Analysis of the Sense of Smell in U.S. Older Adults: Identification of Novel Risk Loci in African-Americans and European-Americans. Mol Neurobiol 54:8021-8032

Shi, Min; O'Brien, Katie M; Sandler, Dale P et al. (2017) Previous GWAS hits in relation to young-onset breast cancer. Breast Cancer Res Treat 161:333-344

O'Brien, Katie M; Shi, Min; Sandler, Dale P et al. (2016) A family-based, genome-wide association study of young-onset breast cancer: inherited variants and maternally mediated effects. Eur J Hum Genet 24:1316-23

Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A et al. (2016) Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models. Genet Epidemiol 40:210-221

Vsevolozhskaya, Olga A; Greenwood, Mark C; Powell, Scott L et al. (2015) Resampling-based multiple comparison procedure with application to point-wise testing with functional data. Environ Ecol Stat 22:45-59

Meloto, Carolina B; Segall, Samantha K; Smith, Shad et al. (2015) COMT gene locus: new functional variants. Pain 156:2072-83

Weinberg, Clarice R; Zaykin, Dmitri (2015) Response. J Natl Cancer Inst 107:

Showing the most recent 10 out of 29 publications

Comments

Be the first to comment on Dmitri Zaykin's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: