This project has two distinct parts, each suggested by problems of inference in genomic experiments. The first problem arises because, typically, thousands of genes are screened, and a smaller number are selected for further study. Statistical inference must take this selection mechanism into account, otherwise the actual confidence coefficient is smaller than the nominal level, and approaches zero as the number of genes increases. The goal is to construct valid frequentist confidence intervals for the means of the selected populations. This will provide a confidence interval alternative to the False Discovery Rate. The second problem deals with inference under model uncertainty, where the goal is to account for the variability induced by the collection of models. Here a Bayesian approach is taken, seeking to construct intervals accounting for model uncertainty, investigate the impact of the choice of priors on model space, and construct new search algorithms that take advantage of parallel processing and can be used in the case when there are more covariates than observations.
The work will have impact in both genomic studies and high performance computing. First, for inference from genomic studies, a valid statistical procedure to screen results will be provided. Insuring that the inferences are valid is of crucial importance, as illustrated by a recent NY Times article where a genomic disease therapy was found to be useless, because of faulty statistical inference (``How Bright Promise in Cancer Testing Fell Apart", NY Times, July 7, 2011). Second, parallel processing algorithms, using high performance computing, will be developed. These algorithms take advantage of the abundance of processors typically available, and split the large genomic selection problem across the many processors. This results in answers from these statistical procedures that can be available in real time, and thus be relevant in a clinical setting.