This project has two distinct parts, each suggested by problems of inference in genomic experiments. The first problem arises because, typically, thousands of genes are screened, and a smaller number are selected for further study. Statistical inference must take this selection mechanism into account, otherwise the actual confidence coefficient is smaller than the nominal level, and approaches zero as the number of genes increases. The goal is to construct valid frequentist confidence intervals for the means of the selected populations. This will provide a confidence interval alternative to the False Discovery Rate. The second problem deals with inference under model uncertainty, where the goal is to account for the variability induced by the collection of models. Here a Bayesian approach is taken, seeking to construct intervals accounting for model uncertainty, investigate the impact of the choice of priors on model space, and construct new search algorithms that take advantage of parallel processing and can be used in the case when there are more covariates than observations.

The work will have impact in both genomic studies and high performance computing. First, for inference from genomic studies, a valid statistical procedure to screen results will be provided. Insuring that the inferences are valid is of crucial importance, as illustrated by a recent NY Times article where a genomic disease therapy was found to be useless, because of faulty statistical inference (``How Bright Promise in Cancer Testing Fell Apart", NY Times, July 7, 2011). Second, parallel processing algorithms, using high performance computing, will be developed. These algorithms take advantage of the abundance of processors typically available, and split the large genomic selection problem across the many processors. This results in answers from these statistical procedures that can be available in real time, and thus be relevant in a clinical setting.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1105127
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2011-09-15
Budget End
2015-08-31
Support Year
Fiscal Year
2011
Total Cost
$172,353
Indirect Cost
Name
University of Florida
Department
Type
DUNS #
City
Gainesville
State
FL
Country
United States
Zip Code
32611