SGER: Data Analytics over Hidden Databases

Das, Gautam; Zhang, Nan

Abstract

Structured hidden databases are widely prevalent on the Web. They provide restricted form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuples, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable ranking function. Although search interfaces for hidden databases are designed with focused search queries in mind, for certain applications it may be advantageous to infer more aggregated views of the data from the returned results of search queries. Such aggregated information will facilitate learning data distributions or building mining models, which can then be used to power and optimize a multitude of emerging data analytical applications.

This research involves developing effective techniques for performing data analytics, especially sampling, over hidden structured databases via their public interfaces. The outcomes include efficient algorithms for sampling hidden databases with a heterogeneous mix of data types, achievability results for sampling different types of search interfaces, and a prototypical toolset which demonstrates the sampling of real-world hidden databases. The ability to pose high-level analytical queries over hidden databases is needed by knowledge workers in a wide variety of corporations, governments, and security agencies. Parts of this project will be integrated into teaching and carried out by students as part of advanced class projects, which will potentially attract motivated students to pursue doctoral degrees. The project Web site (http://dbxlab.uta.edu/dataAnalytics.html) will be used for results dissemination.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0845644
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2008-09-01
Budget End: 2010-08-31
Support Year
Fiscal Year: 2008
Total Cost: $136,001
Indirect Cost

SGER: Data Analytics over Hidden Databases
Das, Gautam Zhang, Nan
University of Texas at Arlington, Arlington, TX, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments