BIGDATA: Collaborative Research: F: IA: Statistical Learning for Big Data with Random Projections

Tang, Cheng Yong

Abstract

Contemporary data-driven science and engineering problems require the development of statistical methods that do not compromise statistical accuracy, yet are computationally feasible. Data quality, particularly the heterogeneity in data measurements, is a critical factor that affects statistical accuracy in the analysis of large datasets. This project will explore and demonstrate the impact and feasibility of improving computational and statistical performances simultaneously for Big Data problems with massive datasets. The research will advance the state of knowledge in predictive statistical learning with Big Data, and be extremely valuable in applications related to financial risk management or commercial operations employing recommender systems, biology, and image analysis.

A key phenomenon motivating this project is the notion that some refined ensemble methods combined with random projections can simultaneously enable the fast analysis of massive data while enhancing statistical performance. Specifically, the aims of the project are: (1) Develop new classification methods based on random projections and the random forest. By defining appropriate projections, the proposed method is shown to improve statistical accuracy for massive datasets with a large number of irrelevant noisy measurements. The theoretical properties of this method will be analyzed, and an adaptive version of the algorithm developed to optimize the computational and statistical efficiency gains; (2) Propose boosting algorithms with random projections. The statistical properties, practical performance, and implementation of the proposed random projected boosting algorithms will be investigated; (3) Develop classification methods with heterogeneities. A classification method that involves the weighted bootstrap and ensemble learning to handle heterogeneity or covariate shifts in measurements in large datasets will be developed. The random projection method will be applied to improve the proposed method for high-dimensional datasets.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1546087
Program Officer: Victor Roytburd

Project Start
Project End
Budget Start: 2015-09-01
Budget End: 2019-08-31
Support Year
Fiscal Year: 2015
Total Cost: $161,155
Indirect Cost

BIGDATA: Collaborative Research: F: IA: Statistical Learning for Big Data with Random Projections
Tang, Cheng Yong
Temple University, Philadelphia, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments