III: Medium: Collaborative Research: Scaling Machine Learning to Massive Datasets---A Logic Based Approach

Condie, Tyson; Zaniolo, Carlo

Abstract

Machine learning (ML) algorithms have become ubiquitous across applications as diverse as science, engineering, business, finance, education and healthcare. However, development of ML software that can scale to massive datasets and that are also easy-to-use remains a challenge in part due to the fact that developing an ML tool currently requires the implementation of a deep software stack, from the actual runtime (i.e., how an ML algorithm is executed) to the API exposed to the users.

This project aims to develop DeML, a system to support the authoring and execution of ML tools. Specifically, DeML would allow ML algorithms to be formulated in the form of a declarative query over the training dataset. DeML optimizes the execution of the query over a computing platform (e.g., Amazon EC2 or SQL Azure), taking into account the characteristics of the algorithm, the data, and the available computational resources. Adoption of DeML would greatly reduce the effort required to develop scalable implementations of ML algorithms. The project is organized around three thrusts: (i) Development of a declarative query language, based on extensions of Datalog; (ii) Analysis of runtime of DeML queries; (iii) Optimization of dataflow of DeML queries based on the characteristics of data sources and the capabilities of the underlying execution platform. The resulting open source DeML prototype implementation will be made freely available to the community through the project web page at: http://deml.cs.ucla.edu.

The availability of the DeML could greatly lower the effort needed to author scalable implementations of ML algorithms for analysis of massive datasets, which in turn would increase the availability of such tools to the broader community. Experience gained by implementing and deploying ML algorithms at scale over modern cloud-computing platforms, could help inform critical design choices in the development of future cloud computing platforms for big data analytics, and hence impact a broad range of scientific, engineering, national security, healthcare and business applications of big data analytics. The project offers enhanced opportunities for research-based advanced training of graduate and undergraduate students, including members of groups that are currently under-represented in computer science, in databases, machine learning, and cloud computing.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 1302698
Program Officer: Sylvia Spengler

Project Start
Project End
Budget Start: 2013-09-01
Budget End: 2017-08-31
Support Year
Fiscal Year: 2013
Total Cost: $667,000
Indirect Cost

III: Medium: Collaborative Research: Scaling Machine Learning to Massive Datasets---A Logic Based Approach
Condie, Tyson Zaniolo, Carlo
University of California Los Angeles, Los Angeles, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments