Although big data has had a huge impact in several areas, this impact is limited by the high cost and poor quality of analyzing unstructured data, and the costs of integrating data of multiple types. Lowering these costs will bring the benefits of big data based research to many new areas. Against this background, this project aims to develop machine-learning methods that read, analyze, and integrate web-scale collections of text and other data. The project can be expected to yield fundamental advances in data integration, machine learning, natural language understanding, and automated inference.

The project includes research thrusts in (1) robust semi-supervised bootstrap learning algorithms that can cope with ambiguity in text, (2) algorithms for detecting and aligning the schemas implicit in semi-structured sources relative to a shared common ontology, (3) NLP algorithms that perform deeper analysis on text to extract infrequently mentioned yet important facts, and (4) targeted reading agents capable of pursuing specific queries or conjectures based on the scientist's current focus.

Anticipated results of the project include fundamental advances in each of the research thrusts and their synergistic integration into software system (NESSIE) designed to help scientists in exploring scientific hypotheses in their respective domains of interest, by supporting targeted extraction of knowledge from large amounts of textual sources in relevant areas.

Broader impacts of the research include advanced techniques for extracting and organizing structured knowledge from text, and integrate the learned information with existing structured knowledge in multiple domains. The Additional broader impacts of the research include enhanced opportunities fore advanced research-based training of graduate students. The softare and data resulting from the research will be made freely available to the larger scientific community.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1250956
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2013-08-01
Budget End
2017-05-31
Support Year
Fiscal Year
2012
Total Cost
$548,417
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213