BIGDATA: Mid-Scale: DA : Collaborative Research Big Tensor Mining Theory

Faloutsos, Christos

Abstract

Given triplets of facts (subject-verb-object), like ('Washington''is the capital of 'USA'), can we find patterns, new objects, new verbs, anomalies? Can we correlate with brain scans, to discover which parts of the brain get activated, say, by tool-like nouns ('hammer'), or action-like verbs ('run')? We propose three research thrusts: i) Motivating Applications: Real-world tera-scale applications, including 'read-the-web', NeuroSemanticslarge recommendation systems ii) New theory and methods for big sparse tensor and coupled tensor/matrix factorization;and iii) Scalability to tera- and peta-byte data, using the map-reduce paradigm and extending it to multi-core settings. Intellectual Merit. We will be the first to address scalability issues for tensor and coupled tensor/matrix factorizations. We will leverage our Pegasusmining system which runs on Hadoop, and we will carefully exploit sparsity to avoid intermediate data explosion. On the theory/methods side, we propose a brand new multi-way compressed sensing framework for tensors and coupled tensor/matrix data that (a) dramatically redijces complexity and memory requirements, (b) is amenable to map-reduce and multi-core computation and (c) allows principled imputation of missing values. Heavily motivated by the above applications and influenced by their needs, our principled and scalable algorithms will enable new discoveries. Education: Graduate students in CS and ECE will be trained in a cross-disciplinary topic at the confluence of the two disciplines. The PIs have a record of successful collaboration, and the co-PIs bring together complementary strengths in theory, applications, and software development (METIS). They have also routinely involved female Ph.D. students and undergraduates in their research.

Public Health Relevance

The proposed research seeks to develop a better understanding of human language processing, by relating fMRI and MEG human brain activity during reading to very large scale corpus statistics of the words and phrases being read. Focusing on scalability, we will study large datasets, which are outside the capabilities of typical, current methods.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM108339-01
Application #: 8599832
Study Section: Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer: Lyster, Peter

Project Start: 2013-09-01
Project End: 2014-08-31
Budget Start: 2013-09-01
Budget End: 2014-08-31
Support Year: 1
Fiscal Year: 2013
Total Cost: $149,797
Indirect Cost: $36,825

BIGDATA: Mid-Scale: DA : Collaborative Research Big Tensor Mining Theory
Faloutsos, Christos
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Public Health Relevance

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Public Health Relevance

Funding Agency

Institution

Comments