This Small Business Innovation Research (SBIR) Phase I project will assess the feasibility of creating a new kind of collaboration-enabled information retrieval application that can handle multiple sources of information, including real-time sources whose data changes frequently. The proposed application will use application-programming interface (API) adaptors to access diverse sources of information and will provide a repository where collaborators can share not only query results but also the documents referenced by those results. Research objectives will include devising a procedure whereby a collaborator can resume the browsing of a result set initiated by a different collaborator even if the result set has changed in the meantime, and designing a cryptographic protocol that allows a user to cause the direct transfer of a document from an information source to the repository, with authentication of the user to the source and to the repository, and authentication of the origin of the document to the repository.
By handling multiple and diverse sources of information, the proposed application will provide effective support for joint information retrieval efforts, filling a need that has been repeatedly document by information scientists, and increasing productivity. The proposed innovation applies to both Web search and information retrieval from databases, and the potential market for the proposed application is broad.
The overall goal of this multi-phase project is to create a new kind of collaboration-enabled information retrieval application that allows people to work together on an information retrieval project using diverse sources of information that change frequently (in "real time"), and gather the documents they retrieve in a shared repository. This is a report on Phase I of the project. The goal of Phase I was to carry out preliminary research to demonstrate feasibility. We achieved that goal, and went beyond it by obtaining two important practical results. First, we invented a method for browsing real time search results. When search results change rapidly, it is difficult to keep track of what results have been seen and it is easy to miss results. For example, results that jump from page 2 of a result set to page 1 as the user advances from page 1 to page 2 may be missed. Some real time search engines such as Google/Realtime and Twitter Search show results ordered by recency, most recent first, in a fast scrolling display. This is amusing to watch, but not useful as a practical information retrieval tool. (They also show three old results that change less frequently, but that is not a useful information retrieval tool either.) Our invention, on the other hand, provides an effective method of browsing results ranked by a balanced combination of recency and relevance, without missing results or loosing track of what results have already been seen. We will shortly make it available to the public in our multi-search engine Noflail Search, which can be found at noflail.com. Second, we designed a cryptographic protocol, which we called PKAuth, for the secure transfer of documents from a confidential or proprietary information source to the shared repository. A similar protocol, called OAuth, existed already, but it is not suitable for our purposes because it requires prior registration of the information retrieval application with every confidential or proprietary information source that the collaborators may want to use. As it turned out, PKAuth serves a much broader purpose than facilitating collaborative information retrieval from confidential or proprietary sources, and provides an important benefit for the World Wide Web. More and more Web applications delegate authentication of their users to social sites such as Facebook or Twitter. After the user has logged in with a social site identity, the application has access to the social context of the user and can publish updates on the social site on behalf of the user. OAuth is used to implement social login, but OAuth requires registration of the Web application with the social site. This means that, if the trend towards social login continues and social login becomes the de facto user authentication standard on the Web, all Web applications will have to register with the dominant social site (currently Facebook) just to be able to authenticate their users. The dominant social site will then have the power to disable any Web application by revoking its registration. This would clearly be an undesirable situation for all parties involved, including the dominant social site whcih would no doubt face regulation by many governments. A replacement for OAuth that does not requires registraton is therefore needed, and PKAuth fits the bill. Whereas OAuth uses the registration process to provide the social site with information about the application that the social site later uses to authenticate the application and identify the application to the user, PKAuth uses the application's ordinary digital certificate and associated private key for that purpose. Additional information can be found in our Web site at pomcor.com.