Given triplets of facts (subject-verb-object), like ('Washington''is the capital of 'USA'), can we find patterns, new objects, new verbs, anomalies? Can we correlate with brain scans, to discover which parts of the brain get activated, say, by tool-like nouns ('hammer'), or action-like verbs ('run')? We propose three research thrusts: i) Motivating Applications: Real-world tera-scale applications, including 'read-the-web', NeuroSemanticslarge recommendation systems ii) New theory and methods for big sparse tensor and coupled tensor/matrix factorization;and iii) Scalability to tera- and peta-byte data, using the map-reduce paradigm and extending it to multi-core settings. Intellectual Merit. We will be the first to address scalability issues for tensor and coupled tensor/matrix factorizations. We will leverage our Pegasusmining system which runs on Hadoop, and we will carefully exploit sparsity to avoid intermediate data explosion. On the theory/methods side, we propose a brand new multi-way compressed sensing framework for tensors and coupled tensor/matrix data that (a) dramatically redijces complexity and memory requirements, (b) is amenable to map-reduce and multi-core computation and (c) allows principled imputation of missing values. Heavily motivated by the above applications and influenced by their needs, our principled and scalable algorithms will enable new discoveries. Education: Graduate students in CS and ECE will be trained in a cross-disciplinary topic at the confluence of the two disciplines. The PIs have a record of successful collaboration, and the co-PIs bring together complementary strengths in theory, applications, and software development (METIS). They have also routinely involved female Ph.D. students and undergraduates in their research.

Public Health Relevance

The proposed research seeks to develop a better understanding of human language processing, by relating fMRI and MEG human brain activity during reading to very large scale corpus statistics of the words and phrases being read. Focusing on scalability, we will study large datasets, which are outside the capabilities of typical, current methods.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM108339-01
Application #
8599832
Study Section
Special Emphasis Panel (ZRG1-BST-N (52))
Program Officer
Lyster, Peter
Project Start
2013-09-01
Project End
2014-08-31
Budget Start
2013-09-01
Budget End
2014-08-31
Support Year
1
Fiscal Year
2013
Total Cost
$149,797
Indirect Cost
$36,825
Name
Carnegie-Mellon University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
052184116
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213