The problem of ranking objects occupies a central place in key technologies such as web search and recommendation systems. These technologies have a tremendous daily impact on the lives of millions of people. Moreover, the enormous scale of data on the web makes the use of machine learning especially attractive in constructing ranking algorithms. A huge amount of research effort has been devoted to developing efficient ranking algorithms that can deal with a variety of data sets encountered in web search and recommendation systems.
This project develops unifying mathematical theory that will provide a basis for understanding and categorizing existing algorithms and, more importantly, lead to deeper insights and new algorithms for the problem of learning to rank. The investigators also apply ranking algorithms to new domains. For example, ranking chemical reactions based on their plausibility will help chemists discover much-needed reaction bases for technologies such as carbon dioxide reduction, and conversion of natural gas into gasoline.
Fundamental advances in the statistical theory of ranking will be incorporated into undergraduate and graduate courses. Data sets and software developed will be made freely available to the scientific community. The investigators will also organize a workshop with a focus on interdisciplinary participation and involvement of under-represented groups in computer science and statistics.
The primary technical challenge in developing statistical ranking theory is the absence of a universally agreed-upon loss functions for ranking. This is in contrast to classic machine learning problems such as classification and regression, where there are only a few natural possibilities for the loss function and these are well-understood theoretically. The project addresses this gap by investigating how different loss functions for ranking affect fundamental theoretical properties such as learnability, and by creating a theory of convex surrogates that is applicable when loss functions abound. The project re-examines existing statistical literature on ranking with a computational lens. This will enable development of flexible and efficient plug-in decision rules that model the conditional probability of labels given inputs.
By incorporating the results of this research into courses and survey articles, the PIs help train a new generation of machine learning researchers and practitioners who will view ranking as a learning problem on par with classification and regression in mathematical depth as well as practical importance. Theoretical guidance for practitioners formulating new algorithms for ranking will improve the most common applications on the web.