The overall objective of this research proposal is to develop new Bayesian methodologies for the analysis of data that arise in genomics. Of particular interest are situations where a large number of variables is available and selection of a predictive subset is one of the goals. The theoretical developments we propose are motivated by a variety of studies, some conducted by our biomedical collaborators, using DNA microarray technologies. One of the goals of this project is to contribute novel theoretical developments in variable and feature selection in statistics. Another goal is to provide the biomedical community with sound methods for the analysis of high-dimensional data. The identification of important biomarkers will provide a better understanding of the molecular mechanisms involved in specific diseases, and will in turn improve diagnosis, drug development, and treatment of patients.
The specific aims of our proposed research are: 1. Clustering of High-Dimensional Data: We will develop novel Bayesian methods for simultaneously clustering experimental units and identifying the variables that best discriminate the different groups. 2. Analysis of High-Dimensional Data with Censored Survival Outcomes: We will investigate novel methods for variable selection in parametric survival models. The methods will lead to estimates of the survival and to the identification of the predictive variables. 3. Application to Microarray Studies: We will apply the methods of Specific Aims #1 and #2 to a series of biomedical studies involving microarray data. These include studies on rheumatoid arthritis and osteoarthritis and adult acute lymphobiastic leukemia. 4. Application to Proteomic Data: We will adapt our methodologies to the problem of extracting important features in proteomics data, incorporating dimension reduction wavelet techniques. 5. Software development: We will develop statistical software and will make it available to the public.
Showing the most recent 10 out of 21 publications