Given triplets of facts (subject-verb-object), like ('Washington''is the capital of 'USA'), can we find patterns, new objects, new verbs, anomalies? Can we correlate with brain scans, to discover which parts of the brain get activated, say, by tool-like nouns ('hammer'), or action-like verbs ('run')? We propose three research thrusts: i) Motivating Applications: Real-world tera-scale applications, including 'read-the-web', NeuroSemanticslarge recommendation systems ii) New theory and methods for big sparse tensor and coupled tensor/matrix factorization;and iii) Scalability to tera- and peta-byte data, using the map-reduce paradigm and extending it to multi-core settings. Intellectual Merit. We will be the first to address scalability issues for tensor and coupled tensor/matrix factorizations. We will leverage our Pegasusmining system which runs on Hadoop, and we will carefully exploit sparsity to avoid intermediate data explosion. On the theory/methods side, we propose a brand new multi-way compressed sensing framework for tensors and coupled tensor/matrix data that (a) dramatically redijces complexity and memory requirements, (b) is amenable to map-reduce and multi-core computation and (c) allows principled imputation of missing values. Heavily motivated by the above applications and influenced by their needs, our principled and scalable algorithms will enable new discoveries. Education: Graduate students in CS and ECE will be trained in a cross-disciplinary topic at the confluence of the two disciplines. The PIs have a record of successful collaboration, and the co-PIs bring together complementary strengths in theory, applications, and software development (METIS). They have also routinely involved female Ph.D. students and undergraduates in their research.
The proposed research seeks to develop a better understanding of human language processing, by relating fMRI and MEG human brain activity during reading to very large scale corpus statistics of the words and phrases being read. Focusing on scalability, we will study large datasets, which are outside the capabilities of typical, current methods.