Statistical Algorithms for Valid Inferences in Genomic Data

Young, Linda

Abstract

The goal of this proposal is to develop improved methods for statistical inference from data arising in genomic studies, specifically from microarray platforms. Statistical algorithms, particularly those based on Markov chain Monte Carlo (MCMC) have become widely used in data analysis in all fields. In applications to genomic studies they have become particularly prevalent, in part due to the enormous amount of data collected and their ability to handle complex models. We address three specific aims:
Specific Aim 1 : Develop missing data methods applicable to SNP association genetics. In this process, where one is looking to associate a quantitative trait with SNPs, it is typical to get information on a large number of SNPs. As the information is typically not complete, we must deal with missing data, which causes two difficulties: (i) Accurate modeling must take into account the SNP correlation structure, which causes problems for standard missing data methods, and (ii) The large number of SNPs brings along computational and statistical problems. We are developing a Gibbs sampler that shows great promise in allowing efficient estimation of SNP effects in these problems.
Specific Aim 2 : Clustering and classification methods for time-course microarray data. We continue our development of clustering methods for time-course data based on Bayesian hierarchical models and Metropolis-Hastings search algorithm with the specific goal of developing a new classifier that associates clusters, or gene patterns, with clinical outcomes.
Specific Aim 3 : Testing for the existence of clusters. Although there are many methods for clustering data, there are few methods for assessing whether the clusters are significant. We propose a Bayesian model selection methodology to derive a test for the existence of clusters. As many phenotypes show quantitative variation, detection of clusters is a preliminary step that would suggest further genomic analysis to determine the existence SNPs controlling the observes quantitative traits.

Public Health Relevance

The methods that will be developed are motivated by a number of studies that promise to have impact on disease management. In particular, we look to apply our missing data methods to a SNP discovery data set from lupus patients to find associations between SNPs and disease status, and our gene-based classifier can aid physicians in managing the treatment of trauma patients. The proposed cluster test can provide a screening tool to identify data with possible genetic associations, again leading to information on genetic associations.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM081704-03
Application #: 7924856
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brazhnik, Paul

Project Start: 2008-09-01
Project End: 2013-02-28
Budget Start: 2010-09-01
Budget End: 2013-02-28
Support Year: 3
Fiscal Year: 2010
Total Cost: $168,864
Indirect Cost

Institution

Name: University of Florida
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 969663814

City: Gainesville
State: FL
Country: United States
Zip Code: 32611

Related projects


NIH 2010 R01 GM	Statistical Algorithms for Valid Inferences in Genomic Data Young, Linda J. / University of Florida	$168,864
NIH 2009 R01 GM	Statistical Algorithms for Valid Inferences in Genomic Data Casella, George / University of Florida	$145,105
NIH 2008 R01 GM	Statistical Algorithms for Valid Inferences in Genomic Data Casella, George / University of Florida	$145,105

Publications

León-Novelo, Luis G; Müller, Peter; Arap, Wahid et al. (2013) Bayesian decision theoretic multiple comparison procedures: an application to phage display data. Biom J 55:478-89

León-Novelo, Luis G; Müller, Peter; Arap, Wadih et al. (2013) Semiparametric Bayesian inference for phage display data. Biometrics 69:174-83

León-Novelo, Luis; Kemppainen, Kaisa M; Ardissone, Alexandria et al. (2013) TWO APPLICATIONS OF PERMUTATION TESTS IN BIOSTASTICS. Bol Soc Mat Mex 19:255-266

Graze, R M; Novelo, L L; Amin, V et al. (2012) Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol 29:1521-32

Leon-Novelo, Luis; Moreno, Elias; Casella, George (2012) Objective Bayes model selection in probit models. Stat Med 31:353-65

Yang, Jie; Casella, George; McIntyre, Lauren M (2011) Generalized shrinkage F-like statistics for testing an interaction term in gene expression analysis in the presence of heteroscedasticity. BMC Bioinformatics 12:427

Joo, Yongsung; Casella, G; Hobert, J (2010) Bayesian model-based tight clustering for time course data. Comput Stat 25:17-38

Verhoeven, Koen J F; Casella, George; McIntyre, Lauren M (2010) Epistasis: obstacle or advantage for mapping complex traits? PLoS One 5:e12264

Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G et al. (2010) PANGEA: pipeline for analysis of next generation amplicons. ISME J 4:852-61

Fuentes, Claudio; Casella, George (2009) Testing for the existence of clusters. Sort (Barc) 33:115-157

Showing the most recent 10 out of 12 publications

Comments

Be the first to comment on Linda Young's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: