Knowledge bases today are central to the successful utilization of information available in the large and growing amounts of digital data on the Web. Such technologies have started to unleash a transformation of Web search from a keyword match to discovery, learning, and creativity, which are crucial to promoting the goal of knowledge discovery. Unfortunately, the search for information remains inherently difficult for significant portions of the Web such as the Scholarly Web, which contains many millions of scientific documents. For example, PubMed has over 20 million documents, whereas Google Scholar is estimated to have more than 100 million. Open-access digital libraries such as CiteSeerX, which acquire freely-available research articles from the Web, witness an increase in their document collections as well. Despite attractive advancements by scholarly search portals, semantic search technologies that "understand" complex concepts and their relations and can systematically satisfy users' intricate information needs have yet to be investigated on the Scholarly Web. The goal of this project is to design solutions to make information more accessible and comprehensible to Scholarly Web users in particular, and Web users in general, and to help them discover knowledge more effectively and efficiently. The approach taken will be to develop an integrated framework, focusing on the extraction and utilization of scholarly knowledge graphs in online scholarly environments. Educationally, this work will involve: training of graduate, undergraduate, and high-school students, particularly encouraging the participation of women and underrepresented groups in the research efforts; curriculum development and integration of research into courses taught by the PI; exposure of students to industry and international experiences; and education for the general public.

The project will target the following research objectives: (1) explore the construction of scholarly knowledge graphs that combine evidence from multiple resources in an open information extraction framework; (2) design and develop novel algorithms for the detection and analysis of interesting and previously unknown connections between concepts, in order to enforce knowledge discovery on the Scholarly Web; and (3) investigate the utility of scholarly knowledge graphs in a question answering system. The results of this research will be integrated into the CiteSeerX digital library (http://citeseerx.ist.psu.edu). The software, tools, and benchmark datasets, which will be developed during the course of this project will be made publicly available. All findings will be shared with the research community through publications in academic journals and presented in Information Retrieval, Text Mining and Natural Language Processing conferences. For further information, see the project web page: www.cse.unt.edu/~ccaragea/skg.html.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1914575
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2018-08-26
Budget End
2022-05-31
Support Year
Fiscal Year
2019
Total Cost
$395,379
Indirect Cost
Name
University of Illinois at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60612