It is understood that textual information is growing at an astounding pace, creating an enormous challenge for analysts trying to discover valuable information that is buried within. For example, new non-trivial trends, patterns, and associations among entities of interest, such as associations between genes, proteins and diseases, and the connections between different places or the commonalities of people, are such forms of underlying knowledge. The goal of this research is to explore automated solutions for sifting through these extensive document collections to detect interesting links and hidden information that connect facts, propositions or hypotheses. In addition, a more comprehensive view of discovered knowledge will be provided by generating an in-depth and concise cross-document summary explaining the underlying meaning of each connection, along with relevant links and explanations acquired from the Wikipedia knowledge base, which serves as the primary means of complementing or enhancing existing information in text collections. The project will impact many areas, such as homeland security, aviation safety, biomedical and healthcare applications. The techniques will have the potential to expose new information available in large document collections and to provide a multi-view perspective of discovered hypotheses by integrating domain knowledge and relevant information acquired from Wikipedia. Research-based education and training opportunities will be offered by this project to prepare students at all levels in information analysis and discovery. Specific attention will also be paid to promoting the participation of underrepresented groups in the research efforts.

This project focuses on the exploration of a novel textual knowledge representation, integration, and mining framework that will cover the following areas: (i) automatic construction of graphical frameworks for entity relationship discovery, a new representation conducive to fine-grained information search and discovery; (ii) effective integration of information from multiple sources, including knowledge contained in representative data collections, domain-specific knowledge (e.g., domain ontologies), and world knowledge (e.g., lexical resources such as WordNet and large-scale knowledge repositories such as Wikipedia); (iii) new discovery algorithms and tools that identify hidden connections among entities; (iv) enhancement of domain modeling through enabling automatic ontology-driven scenario detection and topic-level modeling; and (v) interactive visualization tools for the graphical framework and discovered hypotheses. This research proposes that next-generation search tools require the capability of integrating information from multiple interrelated units and combining various evidence sources, which will make fundamental advances in the current state of the art for information search and discovery. A combination of techniques in Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), Data Mining, Machine Learning, and Semantic Web will be explored to attack critical information discovery problems. For further information see the project web site: www.cs.ndsu.nodak.edu/~wjin/WSD-RelMiner.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1452898
Program Officer
James French
Project Start
Project End
Budget Start
2015-02-01
Budget End
2017-04-30
Support Year
Fiscal Year
2014
Total Cost
$249,216
Indirect Cost
Name
North Dakota State University Fargo
Department
Type
DUNS #
City
Fargo
State
ND
Country
United States
Zip Code
58108