III-COR: Collaborative Research: Mining Biomedical and Network Data Using Tensors Christos Faloutsos (christos@cs.cmu.edu) CMU Vasileios Megalooikonomou (vasilis@cis.temple.edu) Temple Univ.

Given a large collection of functional Magnetic Resonance (fMR) images over time, how can one find patterns and correlations? Similarly, given a never-ending stream of network traffic information, how can one monitor for anomalies, intrusions, and potential failures? The main idea behind this proposal is to treat both problems using the theory of tensors. Despite the seemingly wide differences in the two settings, they both boil down to finding patterns in multidimensional arrays, sparse or dense. Tensors are exactly generalizations of matrices, and correspond roughly to ``DataCubes'' of data mining. Matrix analysis and decompositions are part of the standard toolbox for data mining, providing methods for dimensionality reduction, pattern discovery and ``hidden variable'' discovery. Extending these tools to higher dimensionalities is valuable and tensors provide the tools to do this generalization. However, these tools have not yet been put to use in large volume data mining. This is the main contribution of this proposal. The investigators propose (a) to design tensor decomposition algorithms that scale for large datasets, with special attention to sparse datasets, and to never-ending streams of data and (b) to apply them on two driving applications, fMRI data analysis and network data analysis.

The investigators propose to analyze large volumes of fMRI data performing the following sub-tasks: cluster voxels with similar behavior over time for a given subject and/or task or across subjects and/or tasks, classify patterns of brain activity, and detect lag correlations and spatio-temporal patterns among fMRI time sequences. The investigators also propose to perform the following inter-related tasks on multiple GigaBytes of network flow data: anomaly detection, pattern discovery, and compression.

Both of these applications are important for medicine, health management, and for computer and national security. Analysis of fMRI data can help understanding how the brain functions, which parts of the brain collaborate with what other parts, and whether there are variations across subjects and across task-related activities. For the network traffic monitoring setting, fast detection of anomalies is important, to spot malware, port-scanning attempts, and just plain non-malicious failures.

The educational goals include incorporating the research findings in advanced graduate courses at CMU (15-826) and at Temple (9664, 9665) and proposing tutorials in leading conferences in databases, data mining and bio-informatics audiences.

For further information see the web page: http://knight.cis.temple.edu/~vasilis/research/tensors.html

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0705359
Program Officer
Vasant G. Honavar
Project Start
Project End
Budget Start
2007-09-15
Budget End
2010-08-31
Support Year
Fiscal Year
2007
Total Cost
$307,985
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213