DC: Large: Collaborative Research: ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis

Tsotras, Vassilis

Abstract

The evolution of the "human Web," powered by HTML and HTTP, has revolutionized the way that people find information, buy things, communicate, and collaborate. Web services and semi-structured data formats are having a similar impact on the "machine Web." XML is enriching our ability to find and interchange information today; industry verticals have created XML-based data exchange standards; and XML backbones have gained adoption in support of service-oriented architectures and software-as-a-service initiatives. Other semi-structured formats, like JSON, are playing similar roles, and XML is increasingly being used for its original purpose of semantic document markup. As a result, the world will soon be awash in a sea of semi-structured information.

The ASTERIX project is developing new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing to vast quantities of semi-structured information. The project is combining ideas from three distinct areas - semi-structured data, parallel databases, and data-intensive computing - to create a next-generation, open source software platform that scales by running on large, shared-nothing computing clusters. ASTERIX targets a wide range of semi-structured information, ranging from "data" use cases - where information is well-tagged and highly regular - to "content" use cases - where data is irregular and much of each datum is textual. ASTERIX is taking an open stance on data formats and addressing research issues including highly scalable data storage and indexing, semi-structured query processing on very large clusters, and merging parallel database techniques with today's data-intensive computing techniques to support performant yet declarative solutions to the problem of analyzing semi-structured information.

Project website: http://asterix.ics.uci.edu/

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0910859
Program Officer: Frank Olken

Project Start
Project End
Budget Start: 2009-08-15
Budget End: 2013-07-31
Support Year
Fiscal Year: 2009
Total Cost: $429,261
Indirect Cost

DC: Large: Collaborative Research: ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis
Tsotras, Vassilis
University of California Riverside, Riverside, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments