The main theme of this research is haplotype, multilocus, general genetic association methods, and statistical issues that arise in large scale genetic data analysis, such as in genome-wide association scans. We have been developing methods to characterize and combine genetic association signals within and between studies, and researched approaches for design of discovery and replication phases of analysis. In recent years, genome-wide association studies have uncovered a large number of susceptibility variants. Without replication, large-scale studies provide only tentative evidence of association, and follow up studies focusing on top hits are required to establish their validity. The number of top hits to carry forward into the replication step is often determined ad hoc. We developed a novel statistical approach based on controlling the proportion of genuine associations among a specified number of top hits. This approach is useful for designing large-scale studies and for selection of promising results for following up. We derived very accurate approximations for probabilities governing rankings of true positives in a study. Using these approximations, we were able to accommodate genomic linkage disequilibrium and evaluate its influence on rankings of true positives. Based on our approach, a study-specific proportion of true associations among top hits can be estimated from P-values, and its expected value can be predicted for study design applications. While we primarily work with samples of unrelated individuals, some research also included investigation of methods for estimation of relative risk for imprecisely scored genotypes using family data (in collaboration with Drs. Weinberg and London). Ongoing research includes development of statistical approaches to estimate the distribution of effect sizes using information extracted form top association signals in a large scale study. Characterization of this distribution will be useful in a variety of statistical genetics applications, including estimation of false discovery rates and of individual probabilities that an association signal is a true positive.
Showing the most recent 10 out of 29 publications