Contemporary data-driven science and engineering problems require the development of statistical methods that do not compromise statistical accuracy, yet are computationally feasible. Data quality, particularly the heterogeneity in data measurements, is a critical factor that affects statistical accuracy in the analysis of large datasets. This project will explore and demonstrate the impact and feasibility of improving computational and statistical performances simultaneously for Big Data problems with massive datasets. The research will advance the state of knowledge in predictive statistical learning with Big Data, and be extremely valuable in applications related to financial risk management or commercial operations employing recommender systems, biology, and image analysis.
A key phenomenon motivating this project is the notion that some refined ensemble methods combined with random projections can simultaneously enable the fast analysis of massive data while enhancing statistical performance. Specifically, the aims of the project are: (1) Develop new classification methods based on random projections and the random forest. By defining appropriate projections, the proposed method is shown to improve statistical accuracy for massive datasets with a large number of irrelevant noisy measurements. The theoretical properties of this method will be analyzed, and an adaptive version of the algorithm developed to optimize the computational and statistical efficiency gains; (2) Propose boosting algorithms with random projections. The statistical properties, practical performance, and implementation of the proposed random projected boosting algorithms will be investigated; (3) Develop classification methods with heterogeneities. A classification method that involves the weighted bootstrap and ensemble learning to handle heterogeneity or covariate shifts in measurements in large datasets will be developed. The random projection method will be applied to improve the proposed method for high-dimensional datasets.