Structured hidden databases are widely prevalent on the Web. They provide restricted form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuples, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable ranking function. Although search interfaces for hidden databases are designed with focused search queries in mind, for certain applications it may be advantageous to infer more aggregated views of the data from the returned results of search queries. Such aggregated information will facilitate learning data distributions or building mining models, which can then be used to power and optimize a multitude of emerging data analytical applications.

This research involves developing effective techniques for performing data analytics, especially sampling, over hidden structured databases via their public interfaces. The outcomes include efficient algorithms for sampling hidden databases with a heterogeneous mix of data types, achievability results for sampling different types of search interfaces, and a prototypical toolset which demonstrates the sampling of real-world hidden databases. The ability to pose high-level analytical queries over hidden databases is needed by knowledge workers in a wide variety of corporations, governments, and security agencies. Parts of this project will be integrated into teaching and carried out by students as part of advanced class projects, which will potentially attract motivated students to pursue doctoral degrees. The project Web site (http://dbxlab.uta.edu/dataAnalytics.html) will be used for results dissemination.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0845644
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2008-09-01
Budget End
2010-08-31
Support Year
Fiscal Year
2008
Total Cost
$136,001
Indirect Cost
Name
University of Texas at Arlington
Department
Type
DUNS #
City
Arlington
State
TX
Country
United States
Zip Code
76019