The genetic makeup of a person can be thought of as shaping their propensity to complex diseases, while environmental factors trigger onset of diseases and, together with genetic factors, can modify their progression. Research of my group reflects our continuing involvement in the design and analysis of large-scale genetic and genomic studies. We continue to devise methodology that is useful not only for genetic applications but also generally applicable for analysis of other kinds of multidimensional data where many statistical hypotheses are being evaluated simultaneously. 1. Strategies for design of large-scale studies and methods for analysis of top-ranking signals in high-dimensional data. Studies with large numbers of genetic association tests such as genome-wide association and sequencing studies are commonly aimed at human diseases which are already known to have heritable components. Therefore, these studies do contain truly associated signals with high likelihood, implying that testing a family-wise null hypothesis is not of much interest. The question is not """"""""if"""""""" the data contain genuine signals, but """"""""where"""""""" these signals are located among the multitude of tested variants. A particular signal can empirically rank first, second and so on, and these possible ranks will have different probabilities. These ranking probabilities can be evaluated and used in useful ways for prediction of the number of true discoveries expected in a study We continue statistical research pursuing efficient ways to estimate and utilize probabilistic rankings of true signals. 2. Using genomics to better separate the wheat from the chaff. This research is aimed to develop approaches that exploit expanding genomic data sources, including next generation sequencing data. We have begun to design statistical approaches for analyzing somatic mutagenesis in cancer cell line populations utilizing allelic fractions of different types of mutations and patterns of mutagenesis. This project is starting to produce preliminary results that enable us to pinpoint mutations that with high chance have occurred simultaneously based on similarity of allelic fractions, mutation signatures and co-localization within the cancer genome. 3. Statistical methods for detecting genetic associations with disease for next-generation genomic data and rare variants. Our continuing research related to the design of methods for mapping genetic determinants of disease is being extended to accommodate whole genome sequencing data. One advantage of the sequencing approach is that new rare and low frequency variants can be assessed, including those that are chiefly carried by subjects with the condition under study. Statistical approaches for association of rare variants are being rapidly developed, despite a major statistical challenge: low power of association tests at each particular rare variant. Improved statistical methods are needed for pooling information across both rare and common variants within genetic regions. We are developing methods based on the functional analysis of variance framework. In contrast with the traditional analysis of variance (ANOVA), where fixed group means are compared, functional analysis of variance (FANOVA) compares varying functions where, for example, a groups function may depend on time. In the genetic context, the genomic position of a variant within a gene plays the role of time, i.e., serving as the argument of the function. The function-valued approaches have a number of attractive features, including smoothing capabilities and inherent ability to account for linkage disequilibrium among genetic variants. FANOVA appears to be a powerful approach. Specifically, we compared performance of our extension of the FANOVA approach with other published approaches. We found that FANOVA has considerably greater power than that of competing methods for studied disease models, including those where most of the rare variants are deleterious as well as those with a mix of protective and deleterious variants. We will be further improving the functional approach to enhance its flexibility and power.
Showing the most recent 10 out of 29 publications