This is a collaborative research project combining the expertise of Ashish Goel, Stanford University (IIS-0904325) and Sanjeev Khanna, University of Pennsylvania (IIS-0904314).
Traditionally, content has been generated by a limited number of publishers (such as book houses, music companies, and newspapers), and its quality then evaluated by professional editors and reviewers. In recent years, however, individuals have become mass producers of content, generating images, blogs, opinions, and recommendations, in a decentralized manner. This content is then discovered and consumed by other users, and centralized review is rendered infeasible by the sheer magnitude of available content. Consequently, there is a need to utilize user feedback, both explicit and implicit, in order to provide optimum rankings and recommendations to Internet users. The same broad problem occurs in online advertising, automatic moderation of discussion boards, and automated deductions of user preference on social networks. In addition to being very large, user activity data on the Internet is also typically very sparse, since each user only performs a small share of possible actions (e.g., searches for a small fraction of keywords, reviews or purchases a small fraction of products).
This project aims to design algorithms and optimization techniques to effectively utilize such data. The sparse data is treated as a "prior belief" on user preferences. The project also aims to design economic incentives to obtain useful and corrective data, robust to manipulation. The two parts of this research interact strongly with each other, since the algorithmic component can identify valuable pieces of additional information to acquire. Together, these two parts can help users derive optimum value from Internet data.
Results of this project will improve search engine performance and facilitate web applications that employ user feedback. The project Web site (www.stanford.edu/~ashishg/sparse_opt.html) will be used to disseminate results.