Our current research is motivated by problems of finding heritable components of common human diseases. Inheritance of common diseases reflects the complex interplay between genetic factors and environmental exposures. A formal way of characterizing the relative importance of genetic components is as a proportion of total phenotypic variability that is due to heritable variation. Identification of variants conferring disease risk has important public health implications. Risk variants discovered by genetic studies may reveal new therapeutic targets for common diseases. Close monitoring of carriers of genetic variants associated with efficacy and side effects to drugs can be a part of a strategy for successful management of diseases. Carriers of risk variants may respond in a specific way to environmental exposures. To this end, we develop statistical approaches for associating single genetic variants, multiple variants, and haplotypes with phenotypes related to diseases. Last year, we proposed to describe probabilistically the location of true (genuine) signals in the list of all results sorted by a measure of statistical significance: a true signal has a particular probability to rank first, second, third and so on. This defines the distribution of ranks for true signals. This approach has an immediate practical applicability. First, it allows researchers to prioritize statistical results of a large-scale study. Second, sample size can be calculated at the study design stage that is sufficient for a high proportion of true signals to aggregate among a specified number of top hits. Third, for a given sample size, the number of top hits to carry forward into the replication stage can be determined. We continue to advance this research direction. In the ongoing research, we address a typical situation of designing a replication study based on top hits selected from a large-scale, discovery stage study. The number of top hits is determined in such a way that the proportion of true associations is controlled at a desired level. In related research, we design models for the distribution of P‑values among non-associated variants. Knowledge of this distribution is essential for estimation of the proportion of genuine signals. We propose a novel method, the """"""""generalized genomic control"""""""", which extends methods of correcting for population stratification in important ways. Our method works off P‑values, thus, it is not tied to a particular statistical test. P‑values from sophisticated approaches that differ from simple chi-square tests for which genomic control is available can be employed and adjusted. It is understood that the amount of distortion due to population stratification is not the same for different loci and our method allows separate adjustments for different subsets of tests. In a related research direction, we devised methods for estimation of the """"""""effect size"""""""" distribution in large scale studies. Such distribution is of general interest, for example, for a disease phenotype it provides an approximate number of loci in the genome that carry a certain relative risk. We are also specifically interested in that distribution, because its characterization leads to more precise estimates of the proportion of genuine signals among top hits of a study. The research directions just described take advantage of information contained in measures of statistical significance (P‑values). Collectively, P‑values of a many tests considered at once provide information over and above that contained in any particular test statistic. A closely related topic of our research is the problem of combining many P‑values. The P‑value combination methods we developed were motivated by statistical genetics problems, yet they proved to be useful outside statistical genetics applications. These methods are useful in studies with multiple tests where the focus is on collective evidence from """"""""top hits"""""""". P‑values may originate from association tests for different genetic loci within a particular study, alternatively, they can correspond to association signals for the same locus among different studies. An active area of our research is development and applications of methods for combination of heterogeneous signals across studies of related phenotypes. We continue collaborative studies that involve researchers from NIH, and from UNC's Center for Neurosensory Disorders. Our collaborative work strives to incorporate well-defined analytical components: methods that we develop for a particular collaborative project and applications of our current methodological work. For practical use, we developed software that allows one to estimate ranking probabilities, number of tests that would contain true associations with a specified probability, as well as to plan discovery and replication stages in large-scale studies. Using this software, we guided a replication study for a genetic association scan of fibromyalgia. The study identified several genetic loci that may be involved in the pathophysiology of fibromyalgia and represent potential targets for therapeutic action.
Showing the most recent 10 out of 29 publications