Knowledge repository refers to a machine-readable structure that stores knowledge about various entities (e.g., organizations, events, genes), which facilitates efficient information seeking. In many domains, knowledge varies with respect to contexts, and a flat structure that is commonly adopted by existing knowledge repositories cannot capture the complicated knowledge associated with different contexts. To make knowledge resources more findable, accessible, interoperable, and reusable (FAIR), this project plans to conceptualize a new structure, Knowledge Hypercube, for organizing and retrieving knowledge that could support complex applications in various domains. A knowledge hybercube organizes knowledge with respect to selected important dimensions (e.g., time, locations, conditions), and thus it allows people to easily access knowledge in any context, encapsulate distinctive entities and facts, and conduct cross-dimensional comparison and inference. This project impacts how people find and use knowledge, advances knowledge-based data analytics approaches, and benefits a wide range of domains which have gigantic literature and unsolved complex tasks by building a bridge between them. Knowledge hypercubes can also support educational innovation and contributes to educational tasks such as knowledge tracing.
The major objective of this proposal is to form a paradigm of mining knowledge hybercubes from massive collection of text documents and leveraging such hybercubes for complex exploration and prediction tasks. To meet this goal, this project tackles a series of technical challenges. First, to automatically construct a knowledge hypercube from massive texts, innovative weakly supervised approaches are designed to organize text documents based on the hypercube structure, extract open entity and relationship information and organize cell-specific and cross-cell knowledge in a multi-dimensional manner. Second, novel refinement approaches are developed to automatically verify the information quality within and across cells in knowledge hypercubes by cross-checking within the hypercubes and with external information. Third, knowledge hypercubes motivate the development towards new discovery and learning tasks. In particular, the project introduces an automatic knowledge search pipeline for leveraging knowledge hypercubes for downstream prediction tasks, and a hypothesis generation approach for scoring unknown associations between concepts. The planned paradigm is realized in two specific domains (i.e., biomedical and news events), demonstrating the power of knowledge hypercubes to enable new insights into these domains.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.