The genetic makeup of a person can be thought of as shaping their propensity to complex diseases, while environmental factors trigger onset of diseases and, together with genetic factors, can modify their progression. Research of my group reflects our continuing involvement in the design and analysis of large-scale genetic and genomic studies. We continue to devise methodology that is useful not only for genetic applications but also generally applicable for analysis of other kinds of multidimensional data where many statistical hypotheses are being evaluated simultaneously. 1. Strategies for design of large-scale studies and methods for analysis of top-ranking signals in high-dimensional data. Studies with large numbers of genetic association tests such as genome-wide association and sequencing studies are commonly aimed at human diseases which are already known to have heritable components. Therefore, these studies do contain truly associated signals with high likelihood, implying that testing a family-wise null hypothesis is not of much interest. The question is not """"""""if"""""""" the data contain genuine signals, but """"""""where"""""""" these signals are located among the multitude of tested variants. A particular signal can empirically rank first, second and so on, and these possible ranks will have different probabilities. These ranking probabilities can be evaluated and used in useful ways for prediction of the number of true discoveries expected in a study We continue statistical research pursuing efficient ways to estimate and utilize probabilistic rankings of true signals. 2. Using genomics to better separate the wheat from the chaff. This research is aimed to develop approaches that exploit expanding genomic data sources, including next generation sequencing data. We have begun to design statistical approaches for analyzing somatic mutagenesis in cancer cell line populations utilizing allelic fractions of different types of mutations and patterns of mutagenesis. This project is starting to produce preliminary results that enable us to pinpoint mutations that with high chance have occurred simultaneously based on similarity of allelic fractions, mutation signatures and co-localization within the cancer genome. 3. Statistical methods for detecting genetic associations with disease for next-generation genomic data and rare variants. Our continuing research related to the design of methods for mapping genetic determinants of disease is being extended to accommodate whole genome sequencing data. One advantage of the sequencing approach is that new rare and low frequency variants can be assessed, including those that are chiefly carried by subjects with the condition under study. Statistical approaches for association of rare variants are being rapidly developed, despite a major statistical challenge: low power of association tests at each particular rare variant. Improved statistical methods are needed for pooling information across both rare and common variants within genetic regions. We are developing methods based on the functional analysis of variance framework. In contrast with the traditional analysis of variance (ANOVA), where fixed group means are compared, functional analysis of variance (FANOVA) compares varying functions where, for example, a groups function may depend on time. In the genetic context, the genomic position of a variant within a gene plays the role of time, i.e., serving as the argument of the function. The function-valued approaches have a number of attractive features, including smoothing capabilities and inherent ability to account for linkage disequilibrium among genetic variants. FANOVA appears to be a powerful approach. Specifically, we compared performance of our extension of the FANOVA approach with other published approaches. We found that FANOVA has considerably greater power than that of competing methods for studied disease models, including those where most of the rare variants are deleterious as well as those with a mix of protective and deleterious variants. We will be further improving the functional approach to enhance its flexibility and power.

Project Start
Project End
Budget Start
Budget End
Support Year
10
Fiscal Year
2014
Total Cost
Indirect Cost
Name
U.S. National Inst of Environ Hlth Scis
Department
Type
DUNS #
City
State
Country
Zip Code
Martin, Loren J; Smith, Shad B; Khoutorsky, Arkady et al. (2017) Epiregulin and EGFR interactions are involved in pain processing. J Clin Invest 127:3353-3366
Vsevolozhskaya, Olga; Ruiz, Gabriel; Zaykin, Dmitri (2017) Bayesian prediction intervals for assessing P-value variability in prospective replication studies. Transl Psychiatry 7:1271
Vsevolozhskaya, Olga A; Kuo, Chia-Ling; Ruiz, Gabriel et al. (2017) The more you test, the more you find: The smallest P-values become increasingly enriched with real findings as more tests are conducted. Genet Epidemiol 41:726-743
Dong, Jing; Wyss, Annah; Yang, Jingyun et al. (2017) Genome-Wide Association Analysis of the Sense of Smell in U.S. Older Adults: Identification of Novel Risk Loci in African-Americans and European-Americans. Mol Neurobiol 54:8021-8032
Shi, Min; O'Brien, Katie M; Sandler, Dale P et al. (2017) Previous GWAS hits in relation to young-onset breast cancer. Breast Cancer Res Treat 161:333-344
O'Brien, Katie M; Shi, Min; Sandler, Dale P et al. (2016) A family-based, genome-wide association study of young-onset breast cancer: inherited variants and maternally mediated effects. Eur J Hum Genet 24:1316-23
Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A et al. (2016) Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models. Genet Epidemiol 40:210-221
Vsevolozhskaya, Olga A; Greenwood, Mark C; Powell, Scott L et al. (2015) Resampling-based multiple comparison procedure with application to point-wise testing with functional data. Environ Ecol Stat 22:45-59
Meloto, Carolina B; Segall, Samantha K; Smith, Shad et al. (2015) COMT gene locus: new functional variants. Pain 156:2072-83
Weinberg, Clarice R; Zaykin, Dmitri (2015) Response. J Natl Cancer Inst 107:

Showing the most recent 10 out of 29 publications