2015 RecommenderSystemsandLinkedOpen

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Semantics-Aware Recommendation System; Linked Open Data.

Notes

Cited By

Quotes

Abstract

The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We present an overview on recommender systems and we sketch how to use Linked Open Data to build a new generation of semantics-aware recommendation engines.

1 Introduction

The recent emergence of social networks and pervasive mobile devices has contributed to the publication of a massive amount of information on the Web. We entered into an era of Information Overload: more information is produced than what we can really consume and process. Just to have an idea of what it means in practice, we know1 that in just one minute about 694,445 searches are performed on Google, more than 6,600 pictures are uploaded on Flickr, about 13,000 hours of music streaming is done by the personalized Internet radio provider Pandora and so on.

2 Recommender Systems

Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user [49]. Such suggestions can relate to different decision-making processes, such as what users to connect to in a social network, what product to buy, what music to listen to, or what movie to watch. Products, music, movies are all examples of items in specific recommendation scenarios. Nowadays, almost every online service has a recommendation feature. Pandora[1], Netflix[2], Linkedin[3] and many others use recommendation functional- ities in their systems to engage the users and offer them a better service. The main aim of RSs is to help users in satisfying their information needs when dealing with huge information spaces. To achieve this, RSs try to select the subset of items which best match the users' preferences and tastes. Among the several definitions given in the literature, we report the one proposed by [15] which says: the recommender system term indicates any system that produces individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options.


Fig. 1. Example of Information Overload scenario.

In Figure 1 an example of typical Information Overload scenario is depicted where the user is exposed to a set of movies and does not know which one to select. If we contextualize this example to real situations where the user is overwhelmed with thousands/millions of items, then it is easy to imagine that it is very hard for her to make the right choice without any assistance.

2.2 Users, Items and Ratings

As described in the formal definition of the recommendation problem, at the base of each RS there are three main essential elements which are: users, items and ratings. Usually such information are represented all together by means of a user-item ratings matrix. Such ratings matrix consists of a table where each row represents a user, each column represents a specific item, and each entry represents the rating given by the user to the particular item. Usually, such matrix results very sparse in practice because users rate only a small portion of items. Figure 2 shows an example of user-item ratings matrix in a movie RS where users express their preferences to the items (movies) by using a five points rating scale. The items with a question mark (unknown rating) are unseen for the corresponding user.


Fig. 2. Example of user-item ratings matrix in a movie recommendation scenario.

Users

Users are those actors of the system who are provided with recommen- dations. Users can be represented in drent ways depending on the recom- mendation techniques used to compute recommendations. In order to provide personalized recommendations the system has to model and maintain informa- tion about their preferences. In a content-based RS users' preferences can be represented in a more transparent way by means of attribute/term vectors in a heuristic-based approach, by means of a model in a model-based approach or by means of knowledge representation tools (ontologies, rules, etc.).

Items

Item is the general term used to denote the resource the system recommends to users. Items may be characterized by their complexity and their value or utility [49]. Examples of items with low complexity and value are: news, Web pages, books, movies. While examples of more complex and higher value items can range from mobile phones, laptops to financial services, jobs and travels. Depending on the system and the recommendation technique the item content can be more or less structured and complex. It can range from just a numeric ID in a collaborative filtering system a to a a bag of keywords or set of attribute value pairs in a content-based system till to an ontology-based description in systems using a domain ontology.

Ratings

The most important thing RSs rely on is the availability of up to date information about users' preferences in the form of users' feedback. Depending on the way such information is collected, users' feedback can be classd as explicit or implicit. In the former case feedback come in the form of ratings. The user is asked to provide her opinion about an item on a rating scale which can be either numerical (e.g. 1-5 stars) or ordinal (strongly agree, agree, neutral, disagree, strongly disagree) or also binary (like/dislike). Although the explicit feedback case is more common in literature mostly due to the availability of many datasets with ratings, in practice is more common the case where the system gathers implicit feedback from the user. A system can infer the user preferences by monitoring user's behaviour without any bother to the user. From Rating Prediction to Ranking. In the formulation of the recommendation problem given above the system is mainly seen as a predictive system in the way that the main goal is to accurately predict ratings. Such problem is known as the rating prediction task. However, the ultimate goal of the system in most situations is to provide the user with a ranked list of recommendations, namely top-N recommendations. As pointed out by [20] in many commercial systems, the best bet recommendations are shown, but the predicted rating values are not. This is usually referred to as a top-N recommendation task, where the goal of the recommender system is to find a few specific items which are supposed to be most appealing to the user. Other researchers [47] have refereed to such task also using a drent terminology, namely item recommendation task, that is the task of predicting a personalized ranking on a set of items.

2.3 Recommendation Techniques

Depending on the the way the utility function is estimated and the availability of additional data about the characteristics of items for example, there are different types of recommendation techniques. The main two are: collaborative filtering and content-based. Besides these two, there also other approaches such as knowledge-based, demographic and community-based just to cite a few. A complete list of techniques is given in [16] and in [49]. An important class of recommender systems which are often used in real systems are the hybrid recommenders [15] which combine different strategies to improve their separate performance and obtain higher recommendation quality.

Collaborative Filtering Recommendation Collaborative Filtering is the process of filtering or evaluating items using the opinions of other people [52]. In this approach personalized recommendations for a target user are generated using opinions of users having similar tastes to those of the target user [48]. The main assumption in this approach is that users with similar preferences in the past will have similar preferences in the future.

Fig. 3. Illustration of a CF-based recommender system.

Drently from any other technique the only input data that CF-RSs need is the user-item ratings matrix. Figure 3 shows a simple example of collaborative �ltering case corresponding to the user-item ratings ratings matrix depicted in Figure 2. If we consider Alice as target user, as said before, recommendations are generated considering the ratings given by other users with similar tastes. In this particular case, both John and Alice have similar tastes because they both rated similarly Argo and Righteous Kill. The system can exploit John's ratings for estimating Alice's unknown ratings. The basic intuition behind this method is that since John really likes Heat then also Alice may like it. According to [12] there are two main types of collaborative filtering methods: memory-based and model-based. Memory-based CF uses a particular type of Machine Learning methods that is the nearest neighborhood (k-NN) algorithm. The main property of such approach is that it does not require any preliminary model building phase because predictions are made by aggregating the ratings of the closest neighbours. On the contrary, model-based techniques first learn a predictive model which is eventually used to make predictions. Memory-based approaches can be classd either in user-based or item- based. The user-based approach consists of predicting the relevance of an item for the target user by a linear combination of her neighbour's ratings, weighted by the similarity between the target user and such neighbours. One of the first implementation of such approach is the one presented in [48] which considers the rating deviations from the user's and neighbour's rating means (�ru). Prediction for the active user u and target item i is computed as:

ru;i = firu +
PK
j=1(ruj ;i 􀀀 firu) � wu;uj PjUj
j=1 wu;uj

where K is the number of neighbors for user u and wu;uj is the similarity weight between the active user u and neighbor uj defined by the Pearson correlation coefficient:

wu;uj =
P
i(ru;i 􀀀 firu) � (ruj ;i 􀀀 firuj ) pP
i=1(ru;i 􀀀 firu)2 �
pP
i=1(ruj ;i 􀀀 firuj )2

For a more detailed list of similarity measures and aggregation function please re- fer to [2]. The item-based CF approach bases on the usage of the same correlation- based or cosine-based techniques to compute similarities between items instead of users. The idea is to derive a notion of item similarity from user rating or purchase behavior and recommend items similar to those the user has already said they like. In [23] such idea has been applied to compute top-N item recom- mendations in e-commerce scenarios.

While at the beginning most of the research in this area focused on memory- based approaches, in the last years more attention has been paid to model-based techniques. In particular mode after the Net

ix competition which showed that

model-based techniques have higher accuracy [32]. The most adopted model- based approaches are the matrix factorization or latent factor models [33] which apply some form of dimensionality reduction on the user item ratings matrix to map both users and items into a joint lower dimensional latent factor space. Even if collaborative filtering is the most widely adopted approach it can suffer from drent drawbacks. First of all, to work properly it needs enough rating data to find meaningful correlations among items or users. This is main known as sparsity or cold-start problem [53]. In relation to that, there are two specific issues which are the new user and new item problem. When a new user enters the system till she has not rated a sufficient number of items the system is unable to compute reliable similarities with other users. When a new item is added to the catalog there is no way to recommend it before till no ratings about it are obtained. A typical way to tackle such cold-start problems is to combine collaborative-�ltering with content-based approaches. Another problem of CF is the so called Grey sheep problem, that is the inability of the system to properly treat users with very unusual preferences since the system is unable to find other similar users. Content-based Recommendation Content-based RSs recommend an item to a user based upon a description of the item and a pro�le of the user's inter- ests [46]. Brie y, the basic process performed by a content-based recommender consists in matching up the attributes of a user pro�le in which preferences and interests are stored, with the attributes of a content object (item) [36].


Fig. 4. Illustration of a content-based RS.

Differently from collaborative filtering, such recommendation approach relies on the availability of content features describing the items. Such features can be extracted from unstructured or semi-structured item descriptions by using proper Natural Language Processing (NLP) techniques or can be obtained from structured data as the case of tabular data in a relational database. A high level architecture of a content-based RS is presented in [36]. Figure 4 shows an example of content-based approach with reference to the user Alice. As we can see, differently from the CF case in this approach movies


Fig. 5. Example of model-based CB-RS.

are provided with attributes, such as actors, genres, etc. The other difference is that only the target user is considered in the recommendation process. The basic intuition behind this approach is that since Alice likes Argo she might like Heat because they both belong to the Drama genre.

There are two main content-based recommendation approaches: heuristic- based or model-based.

Approaches using heuristic functions have their roots in Information Retrieval and Information Filtering. Items are recommended based on a comparison be- tween their content and a user pro�le. The idea is to represent both items and users using typical IR techniques [6], e.g. vectors of terms, and compute a match between their representations. The user pro�le consists in a vector of terms built from the analysis of the items liked by the user. A typical approach is to use the Vector Space Model (VSM) [5] where items and user pro�les can be represented as weighted vectors computed using the tf-idf formula [5]. The match between items and user pro�le vectors can be computed using cosine similarity and even- tually the most similar items to the user pro�le are recommended.

Model-based approaches [45] use Machine Learning techniques to learn a model of the user's preferences by analyzing the content characteristics of items the user rated. Specifically, a regression or classification model is learnt from a collection of items for which past user's ratings are available. The training set consists of item feature vectors labelled with ratings. Eventually, such learnt user model can be used for estimating the unknown ratings. This process is usually done for each user separately.

Drently from the heuristic-based case where the user model can be seen as an explicit representation of the user preferences (a vector containing the most preferred terms by the user), in this case the user pro�le is represented as a function obtained by means of an inductive learning process. Such function can be a complete black box or have a more interpretable form depending on the machine learning algorithm adopted.

A possible limitation of model-based approaches with respect heuristic-based ones is that the learning algorithm does not build a model with acceptable ac- curacy until it sees a relatively large number of examples (e.g. 50) [61]. Content-based methods can have several limitations. Maybe the main one is the content overspecialization which consists in the incapability of the system to recommend relevant items which are drent to the ones the user already knows. Related to the previous issue, there is also the portfolio effect problem consisting in the redundancy and low diversity among the items in the recommendation lists.

Another limitation affecting CB systems is the limited content analysis. The quality of CB recommendations depends on the vailability and quality of features extracted from the items content. For a complete and detailed description of content-based techniques for rec- ommendations please refer to [36,46].

Knowledge-based Recommendation

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 RecommenderSystemsandLinkedOpenTommaso Di Noia
Vito Claudio Ostuni
Recommender Systems and Linked Open Data10.1007/978-3-319-21768-0_42015