Vast quantities of electronic information provide a unique opportunity for scientists identify candidate solutions for grand challenges as scientists, policy makers, and students have never had access to more electronic information than they do today. The goal in this research is to develop new text mining methods that are consistent with the manual processes that experts currently used to resolve contradictory and redundant evidence. Both discovery and synthesis are difficult activities even for people, so the team plans a socio-technical strategy to achieve this goal. This study includes a longitudinal study of manual discovery and synthesis behaviors of a diverse network of faculty, policy makers, and students from UNC and the Research Triangle Park. The majority of effort will be to advance natural language processing methods that automatically identify concepts and relationships, detect entailment and paraphrasing, and generate multi-document summaries. Lastly, a series of qualitative and quantitative studies that accurately reflect the degree to which text mining methods assist in discovery and synthesis activities will be conducted. This project will advance language processing methods that detect concepts and relationships, recognize paraphrases and entailment, and generate multiple documents summaries; provide the natural language community with a collection of gold standards that reflect diverse and realistic information needs; train the next generation of scientists to explore complex research that span disciplines; promote the ?human side of discovery? via a sponsored workshop. The socio-technical solution to text mining proposed in this project will ensure broad impact of the subsequent text mining theory and tools. This project will accelerate scientific discovery by enabling experts to follow connections between disciplines; and accelerate policy development by reducing the time required to resolve seemingly redundant and contradictory evidence within a discipline. Involving policy champions from the Environmental Protection Agency and the Cecil G. Sheps Center for Health Services will ensure that the theory and technology produced from this research are consistent with the complex environment in which discovery and synthesis takes place. Claim Jumper will accelerate their existing policy efforts, but more importantly, tools from this project will enable studies that are not feasible with manual methods.

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Illinois Urbana-Champaign
United States
Zip Code