In the past few years, we have witnessed a dramatic increase of the amount of data available to biomedical research. An example is the recent advances of high-throughput biotechnologies, making it possible to access genome-wide gene expressions. To address biomedical issues at molecular levels, extraction of the relevant information from massive data of complex structures is essential. This calls for advanced mechanisms for statistical prediction and inference, especially in genomic discovery and prediction, where statistical uncertainty involved in a discovery process is high. The proposed approach focuses on the development of mixture model-based and large margin approaches in semisupervised and unsupervised learning, motivated from biomedical studies in gene discovery and prediction. In particular, we propose to investigate how to improve accuracy and efficiency of mixture model-based and large margin learning systems in generalization. In addition, we will develop innovative methods taking the structure of sparseness and the grouping effect into account to battle the curse of dimensionality, and blend them with the new learning tools. A number of technical issues will be investigated, including: a) developing model selection criteria and performing automatic feature selection, especially when the number of features greatly exceeds that of samples;b) developing large margin approaches for multi-class learning, with most effort towards sparse as well as structured learning;c) implementing efficient computation for real-time applications, and d) analyzing two biological datasets for i) gene function discovery and prediction for E. coli, and ii) new class discovery and prediction for BOEC samples;e) developing public-domain software. Furthermore, computational strategies will be explored based on global optimization techniques, particularly convex programming and difference convex programming.
Showing the most recent 10 out of 60 publications