The goal of this proposal is to develop improved methods for statistical inference from data arising in genomic studies, specifically from microarray platforms. Statistical algorithms, particularly those based on Markov chain Monte Carlo (MCMC) have become widely used in data analysis in all fields. In applications to genomic studies they have become particularly prevalent, in part due to the enormous amount of data collected and their ability to handle complex models. We address three specific aims:
Specific Aim 1 : Develop missing data methods applicable to SNP association genetics. In this process, where one is looking to associate a quantitative trait with SNPs, it is typical to get information on a large number of SNPs. As the information is typically not complete, we must deal with missing data, which causes two difficulties: (i) Accurate modeling must take into account the SNP correlation structure, which causes problems for standard missing data methods, and (ii) The large number of SNPs brings along computational and statistical problems. We are developing a Gibbs sampler that shows great promise in allowing efficient estimation of SNP effects in these problems.
Specific Aim 2 : Clustering and classification methods for time-course microarray data. We continue our development of clustering methods for time-course data based on Bayesian hierarchical models and Metropolis-Hastings search algorithm with the specific goal of developing a new classifier that associates clusters, or gene patterns, with clinical outcomes.
Specific Aim 3 : Testing for the existence of clusters. Although there are many methods for clustering data, there are few methods for assessing whether the clusters are significant. We propose a Bayesian model selection methodology to derive a test for the existence of clusters. As many phenotypes show quantitative variation, detection of clusters is a preliminary step that would suggest further genomic analysis to determine the existence SNPs controlling the observes quantitative traits.
The methods that will be developed are motivated by a number of studies that promise to have impact on disease management. In particular, we look to apply our missing data methods to a SNP discovery data set from lupus patients to find associations between SNPs and disease status, and our gene-based classifier can aid physicians in managing the treatment of trauma patients. The proposed cluster test can provide a screening tool to identify data with possible genetic associations, again leading to information on genetic associations.
León-Novelo, Luis G; Müller, Peter; Arap, Wahid et al. (2013) Bayesian decision theoretic multiple comparison procedures: an application to phage display data. Biom J 55:478-89 |
León-Novelo, Luis G; Müller, Peter; Arap, Wadih et al. (2013) Semiparametric Bayesian inference for phage display data. Biometrics 69:174-83 |
León-Novelo, Luis; Kemppainen, Kaisa M; Ardissone, Alexandria et al. (2013) TWO APPLICATIONS OF PERMUTATION TESTS IN BIOSTASTICS. Bol Soc Mat Mex 19:255-266 |
Graze, R M; Novelo, L L; Amin, V et al. (2012) Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol 29:1521-32 |
Leon-Novelo, Luis; Moreno, Elias; Casella, George (2012) Objective Bayes model selection in probit models. Stat Med 31:353-65 |
Yang, Jie; Casella, George; McIntyre, Lauren M (2011) Generalized shrinkage F-like statistics for testing an interaction term in gene expression analysis in the presence of heteroscedasticity. BMC Bioinformatics 12:427 |
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G et al. (2010) PANGEA: pipeline for analysis of next generation amplicons. ISME J 4:852-61 |
Joo, Yongsung; Casella, G; Hobert, J (2010) Bayesian model-based tight clustering for time course data. Comput Stat 25:17-38 |
Verhoeven, Koen J F; Casella, George; McIntyre, Lauren M (2010) Epistasis: obstacle or advantage for mapping complex traits? PLoS One 5:e12264 |
Fuentes, Claudio; Casella, George (2009) Testing for the existence of clusters. Sort (Barc) 33:115-157 |
Showing the most recent 10 out of 12 publications