The evolution of the "human Web," powered by HTML and HTTP, has revolutionized the way that people find information, buy things, communicate, and collaborate. Web services and semi-structured data formats are having a similar impact on the "machine Web." XML is enriching our ability to find and interchange information today; industry verticals have created XML-based data exchange standards; and XML backbones have gained adoption in support of service-oriented architectures and software-as-a-service initiatives. Other semi-structured formats, like JSON, are playing similar roles, and XML is increasingly being used for its original purpose of semantic document markup. As a result, the world will soon be awash in a sea of semi-structured information.

The ASTERIX project is developing new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing to vast quantities of semi-structured information. The project is combining ideas from three distinct areas - semi-structured data, parallel databases, and data-intensive computing - to create a next-generation, open source software platform that scales by running on large, shared-nothing computing clusters. ASTERIX targets a wide range of semi-structured information, ranging from "data" use cases - where information is well-tagged and highly regular - to "content" use cases - where data is irregular and much of each datum is textual. ASTERIX is taking an open stance on data formats and addressing research issues including highly scalable data storage and indexing, semi-structured query processing on very large clusters, and merging parallel database techniques with today's data-intensive computing techniques to support performant yet declarative solutions to the problem of analyzing semi-structured information.

Project website:

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California San Diego
La Jolla
United States
Zip Code