This project seeks dramatically improved access to, and dissemination of, scientific information. Working with cooperating scientific users, it exploits synergies among three important innovations. These are: (1) adaptive and domain specific automatic derivation of topical representations. These topics describe both the documents in the collection, and the interests of the users, during particular searches. The topics support mechanisms for collaborative recommendation, and for exploring the precise contours of each user,s need. (2) Recognition that a combination or set of several items, together, is worth much more (or perhaps much less) than the sum of the values of the items individually. The arXiv experimental system (arXiv_XS) uses topics, and user feedback, to model the complexity of the user's need and interests. (3) Based on these innovations, the system can probe user's interest, selecting items where the user's feedback greatly improves the system's model of that user and his or her search. This "exploration" is designed to improve the systems performance, with minimal degradation of the current search. All these innovations are studied together with complex experimental design and statistical analysis; users may also volunteer to be interviewed, by the researchers, to provide richer information about their experiences with the system. Researchers from Rutgers, Cornell and Princeton lead the project.

This exploratory project focuses on the following tasks: (1) develop a richly instrumented voluntary alternative interface to the arXiv, with suitable IRB consent materials supporting active user feedback in the research process, as users search; (2) implement three specific innovative technologies (topics, sets, probes); (3) study their impact on system effectiveness, using experimental design and well-defined performance measures; (4) collect rich user assessments, by telephone and online interviews; (5) assess scalability with respect to the size of the collection, and the size of the "communities of interest" that define the topical user models; (6) seek relations at other domain-specific archives, for potential future studies. If successful, this research will refute a perception that improvement in access and dissemination of scientific literature requires massive techniques adapted from the commercial models for recommender systems and crowd-sourcing. This research will also add to on experimental design, user modeling, and the study of active learning and exploratory system designs.

This research will accelerate the production and sharing of scientific information, initially at the arXiv, and subsequently, wherever these innovations are implemented. The research aims to enable researchers who never meet each other to form an "invisible college" by enriching the arXiv systems understanding of all of its users. The project entails some risks, as users may be unwilling to share information about their research interests. While malevolent persons might seek to spam the system, falsely marking information as useful, it is anticipated that scientific communities will generate far less spam than does the world at large. Results of the research will be made available to other researchers, and incorporated in courses at all three universities. The Web site (http://arxiv_xs.rutgers.edu) is used to disseminate information and results from this project.

Project Report

This project has built a foundation for studying novel and powerful approaches to help scientists cope with the deluge of scientific reports. Working with the arXiv repository, at Cornell University, the investigators have developed approaches to deal with some of the most subtle problems in finding information. The project continues as an NSF-funded Big Data project. The researchers have already built a research interface to the arXiv system, which lets scientists record their assessments of articles, and which will use that information to suggest other valuable reports that reinforce or complement the standard response of the arXiv system. The research has built a very rich repository of information about the past use of the arXiv, which reveals relations among distinct scientific fields. Interested scientists may explore the new interface at http://my.arxiv.org, and may invite the system to develop personalized recommendations for them. In addition to over a dozen scientific publications, the researchers have extended the ideas developed in this project to collaborate with scientists in fields as diverse as oncology and political science. The project has built a large database containing anonymized actions that users have taken over the years. This will be a rich resource for other scholars. In addition, co-PI Paul Ginsparg has prepared a presentation for a general audience, which explains the role and principles of the arXiv, and increases public understanding of the processes of scientific scholarship. That presentation is available at the Aspen web site (http://vod.grassrootstv.org/cablecast/public/Show.aspx?ChannelID=1&ShowID=11785). Paul Ginsparg was named a White House champion of Change. The project web site is: http://arxiv_xs.rutgers.edu/

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1142251
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2011-08-15
Budget End
2013-07-31
Support Year
Fiscal Year
2011
Total Cost
$299,501
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854