Science disciplines have been generating huge volumes of research publications, which are of tremendous value but far beyond researchers' capacity to digest and analyze. There is a critical need to automatically (with the help of widely available, general knowledge-bases) transform research text into structured information networks on which advanced search and analytics tools can be developed to facilitate researchers and practitioners to quickly locate knowledge, make inferences, and even generate new scientific hypotheses.

This project aims at developing a new data-to-network-to-knowledge (D2N2K) paradigm to transform massive, unstructured but interconnected research text data into actionable knowledge, by integrating semi-structured and unstructured data. First, organized heterogeneous information networks (hence called StructNet) are constructed, and then powerful mining mechanisms on such organized networks are developed. With a focus on biomedical sciences, the project investigates the principles, methodologies and algorithms for (i) construction of relatively structured heterogeneous information networks (called MediNet) by mining biomedical research corpora via attribute extraction, relation typing, and claim mining, and (ii) exploration and mining of the networks so constructed via graph OLAP and task-guided embedding. The project develops an extensible framework to facilitate literature-based scientific research. The study on construction and exploration of MediNet not only impacts biomedical research but also consolidates this data-to-network-to knowledge methodology, readily to be transferred to other domains, for automatic transformation of massive unstructured text data in those domains into structured and actionable knowledge.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1705169
Program Officer
Wei Ding
Project Start
Project End
Budget Start
2017-07-01
Budget End
2021-06-30
Support Year
Fiscal Year
2017
Total Cost
$411,730
Indirect Cost
Name
University of California Los Angeles
Department
Type
DUNS #
City
Los Angeles
State
CA
Country
United States
Zip Code
90095