Information retrieval (IR) performance is typically measured in terms of relevancy: every document is known to be either relevant or non-relevant to a particular query. Furthermore, more relevant documents are expected to receive a higher rank than lower less relevant documents. However, determination of relevance and rank by users is not practical. Therefore, it is crucial to develop evaluation metrics and ranking functions that can be derived automatically from judgment data and user behavior data, rather than ad-hoc heuristics. This exploratory project investigates machine learning approaches for constructing evaluation metrics for Web search and information retrieval that consider along important directions other than relevance such as diversity, balance and coverage.
The approach is based on fundamentally extending the popular evaluation metric Discounted Cumulated Gains (DCG). Research focuses on developing optimization methods for learning DCG that can incorporate the degree of difference in pair-wise comparison of ranking lists. Machine learning methods that can learn DCG for the more realistic scenarios where the relevance grades are not readily available are explored, and nonlinear utility functions as evaluation metrics that can accurately capture the quality of search result sets in terms of relevance, diversity, coverage, balance and novelty are investigated.
The project has a number of broad impacts. Research results are expected to provide foundations for further research in evaluation metrics. Active collaborations with industry leaders in Web search will enable the resulting methods to have real impacts on search engine as well as large IR system performance improvements. Improving the quality of search results will have significant impacts on satisfying people's information needs as well as their quality of life in general. The set of research topics lies at the interface between information retrieval and machine learning applications and it provides an ideal setting for training undergraduate and graduate students in the emerging interdisciplinary field of Web of science and engineering research. The project Web site (www.cc.gatech.edu/~zha/metrics.html) will be used for results dissemination.
One important performance measure of a search engine is relevancy: every document is known to be degree of relevant to a particular query. Furthermore, more relevant documents are expected to receive a higher rank than lower less relevant documents. It is crucial to develop evaluation metrics and ranking functions that can be derived automatically from judgment data and user behavior data, rather than ad-hoc heuristics. This exploratory project investigates machine learning approaches based on quadratic programming for constructing evaluation metrics for Web search. Improving the quality of search results will have significant impacts on satisfying people's information needs as well as their quality of life in general. The approach is based on fundamentally extending the popular evaluation metric Discounted Cumulated Gains (DCG). Our research focuses on developing optimization methods for learning DCG that can incorporate the degree of difference in pair-wise comparison of ranking lists. Machine learning methods that can learn DCG for the more realistic scenarios where the relevance grades are not readily available are explored. Based on both simulation data, and Yahoo!'s side-by-side comparison data of search result sets, we conclude that our proposed method can learn the parameter sets for (n)DCG with high accuracy. The algorithm is robust in the sense that it can tolerate certain level of errors in the preference data, and there is a graceful degradation of the performance whne there more noise in the comparison data. We also observed that active learning method obtains better prediction accuracy with less labeled training data compared with training sets with randomly selected training ranking pairs.