Helping users find relevant information is undeniably an important problem vital to the functioning of today's information-based societies. It is therefore no surprise that millions of people worldwide make use of search engine technologies each and every day. Although existing search technologies work well, there is still considerable room for improvement. Search engine innovation is driven by the ability to rapidly, and repeatedly, measure the quality of the results produced by a given system. This type of measurement typically requires some form of human input. For example, a human expert may be hired to assess the relevance of search results, or the search engine may log user interactions, such as the queries entered and the results clicked. After a sufficiently large amount of data has been collected, it can then be used to accurately measure search engine quality. It can also be used to improve the quality of existing search engines via a process known as "tuning" or "training". However, gathering large amounts of this information typically requires a significant amount of human effort or computational resources. Therefore, sustained innovation is only possible at a very steep cost.
Techniques for constructing large information retrieval test collections that require no human effort are the primary focus of this research study. Rather than relying on human-curated information, implicit relevance signals from the Web are mined to automatically construct large, reusable test collections for a variety of search tasks, including Web search, news search, and enterprise search. The observation that the Web contains a large number of implicit relevance signals is the starting point of the research. The simplest example of an implicit relevance signal is the hyperlink, which can be interpreted as a signal acknowledging the relevance of the target page by the source author. The hypothesis that such implicit relevance signals can be effectively mined and aggregated in a completely unsupervised manner to create test collections without any human effort is investigated in this research. Automatically generated test collections are evaluated in two different ways. First, the test collections are evaluated according to their ability to accurately measure the quality of search systems compared to human-generated test collections. Second, the quality of search engines tuned using the automated test collections are compared against engines tuned using manual test collections.
The broader impact of this project is derived from automatically constructed test collections that are freely distributed to the broader research community. Advances in search engine technologies are expected as the result of increased availability of training data to systematically evaluate and tune search engines, both in industrial and academic settings. Additional broader impact is expected from the integration of research and education at both the graduate and undergraduate levels and from engaging women and underrepresented students through various outreach programs.