Today's world is characterized by a deluge of information, which needs to be extracted in an efficient manner from a variety of data sources. There is a great need for finding ways to process the data quickly, without compromising on their accuracy. The need becomes even more pressing in natural language applications related to national security (e.g., text classification, language identification), where the data may be noisy or very diverse. This research focuses on investigating new ways of performing denoising, dimensionality reduction and structure extraction from data efficiently; the ultimate goal is to significantly improve upon the state-of-the-art in the aforementioned applications.

The main agenda driving this research is the use of Integrated Sensing and Processing Decision Trees (ISPDTs), which are inherently suitable for processing high-dimensional data. The main characteristic of ISPDTs is that they perform joint dimensionality reduction and clustering (or classification), with the ultimate goal of optimizing a desired objective function. Preliminary experiments with ISPDTs have shown that they are very efficient in revealing structure and interesting statistical connections between text documents, with performance that surpasses the state-of-the-art. The strength of ISPDTs lies in the fact that they are adaptable, and can be trained to match the data characteristics in a variety of ways.

Project Start
Project End
Budget Start
2007-09-01
Budget End
2013-02-28
Support Year
Fiscal Year
2007
Total Cost
$299,996
Indirect Cost
Name
Johns Hopkins University
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21218