There has been an increasing shift away from traditions of individual based scientific research toward more collaborative models via online scientific communities. One famous example of scientific online communities is nanoHUB.org powered by the HUBzero platform. nanoHUB has been well received by nanotechnology community and has attracted more than 90,000 active users by providing thousands of resources such as simulation tools, teaching materials and publications. The rapid growth of information in scientific online communities demands intelligent agents that can identify the most valuable to the users. Existing solutions of information recommendation are not adequate for online scientific communities. For example, users in online scientific communities undertake different types of tasks (e.g., seeking teaching materials or conducting experiments for dissertation work) and require recommendation that distinguishes different tasks, which is not provided by existing recommendation solutions. Furthermore, a substantial amount of information from users of online scientific communities is implicit feedback (e.g., click through data). However, most existing recommendation solutions focus on explicit feedback information (e.g., user ratings of movies).

The proposed research seeks to overcome the limitations of existing recommendation solutions with a new integrated information recommendation framework for online scientific communities. The proposed research thrusts include: (1) Task-Specific Recommendation: estimate possible tasks undertaken and incorporate the estimation results into the process of making recommendation; (2) Intelligent Hybrid Recommendation: integrate collaborative recommendation and content-based recommendation techniques within a single model that intelligently tunes the weights of content based information and collaborative usage information; (3) Pairwise Comparison Approach for Implicit Feedback: model users? implicit feedback information of recommended resources in a probabilistic model with a natural assumption of pairwise comparison; (4) System Development and Evaluation: integrate proposed algorithms into the HUBzero platform. The research results will be evaluated in carefully designed user studies as well as in real world operational environments (i.e., nanoHUB).

The proposed research will yield substantial benefits in broad areas. The information recommendation tool will be incorporated into nanoHUB to benefit a large number of users. The source code of proposed algorithms will be released with the HUBzero platform to enable further advance and development in information recommendation. The proposed information recommendation solutions can be adapted and used in other general purpose social network applications like LinkedIn/Facebook. Some research topics will be integrated into the courses that the PIs teach. The PIs will encourage the involvement of underrepresented students in the research project.

Project Report

Recommender System (also known as Collaborative Filtering) is a technique for recommending important information items for users based on the analysis of users’ previous behavior. Recommender systems have a wide range of applications in e-commerce, education and biomedical applications. Our work focuses on recommendation in online scientific communities. There are a few important issues of existing research that limit the usability of recommender systems in online scientific communities. Most existing research focuses on utilizing explicit feedback (e.g., movie ratings) for building user or item profile. There exists rich information about users and items in online scientific communities, which are generally not fully utilized by existing research. Furthermore, in large scale applications, recommendation efficiency is an important issue that most existing research does not consider it. The outcome of this research project substantially advances the state-of-the-art of recommender systems with the focus of application domain as online scientific communities: 1). Utilize rich information of users and items for more accurate recommendation, 2). Utilize implicit feedback of users for recommendation, 3). Design novel semantic hashing techniques for improving recommendation efficiency. In our work, we investigate the task of recommendation in online scientific communities. In particular, our study is based on the Nanohub hosed by Purdue University. Nanohub is an online scientific community for research, education and collaboration in nanotechnology. It comprises numerous resources with an active user base. These resources include lectures, seminars, tutorials, publications, events and so on. The task is to recommend relevant resources to the users. The scientific communities such as Nanohub exhibit two characteristics: 1) there exists very rich information about resources and users. Most resources contain detailed information such as titles, abstracts and tags. Many registered users also provide detailed profiles about themselves such as research interest, education and affiliation. This information is very indicative for recommendation and thus needs to be taken into consideration. 2) The users in the scientific communities tend not to give explicit ratings to the resources, even though they have clear preference in their minds. There only exists implicit user feedback such as the user clicks on resources. These two characteristics may also be noticeable in many other real-world applications, while they are more prominent in online scientific communities such as Nanohub. Our wok proposes matrix co-factorization techniques to incorporate rich user and resource information into recommendation with implicit feedback. In particular, the user information matrix is decomposed into a shared subspace with the implicit feedback matrix, and so does the item information matrix. In other words, the subspaces between multiple related matrices are jointly learned by sharing information between the matrices. To reflect the confidence level on the implicit feedback, the binary elements in the implicit feedback matrix are weighted according to the frequency of the feedback or the user-resource content similarity. The experiments on Nanohub show that the proposed method can effectively improve the recommendation performance. Recommender systems usually need to compare a large number of items before users' most preferred ones can be found. This process can be very costly if recommendations are frequently made on large scale datasets. In our work, a novel hashing algorithm, named Preference Preserving Hashing (PPH), is proposed to speed up recommendation. Hashing has been widely utilized in large scale similarity search (e.g. similar image search), and the search speed with binary hashing code is significantly faster than that with real-valued features. However, one challenge of applying hashing to recommendation is that, recommendation concerns users' preferences over items rather than their similarities. To address this challenge, PPH contains two novel components that work with the popular matrix factorization (MF) algorithm. In MF, users' preferences over items are calculated as the inner product between the learned real-valued user/item features. The first component of PPH constrains the learning process, so that users' preferences can be well approximated by user-item similarities. The second component, which is a novel quantization algorithm, generates the binary hashing code from the learned real-valued user/item features. Finally, recommendation can be achieved efficiently via fast hashing code search. Experiments on three real world datasets show that the recommendation speed of the proposed PPH algorithm can be hundreds of times faster than original MF with real-valued features, and the recommendation accuracy is significantly better than previous work of hashing for recommendation.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1017837
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2010-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2010
Total Cost
$498,431
Indirect Cost
Name
Purdue University
Department
Type
DUNS #
City
West Lafayette
State
IN
Country
United States
Zip Code
47907