Summary of a Recommender Systems Survey paper

Recommender Systems Survey(Summary)

Changsung Moon

CONTENTS

1-1. Fundamentals

1. Foundations

1-2. Cold-start

1-3. Similarity Measures

2. Hybrid CBF/CF

2-1. Challenges of CBF and CF

2-2. Hybrid Approaches

3. Trends3-1. Introduction

3-2. Location-aware RS

3-3. Bio-inspired approaches

3-4. Conclusions

03

08

11

2-3. Social Filtering

4. References15

1. RS Foundations1-1. Fundamentals

Process is based on the following considerations

Considerations

The rest

sparsity level

performance of the system

Objective sought

predictions

top N recommendations

Employed tech

probabilistic approaches

Bayesian networks

nearest neighbors algorithm

Filtering algorithm

demographic

content-based

collaborative

Type of data

ratings

features

content

social relationship

location-aware info

social-basedcontext-awarehybrid

neural networksgenetic algorithms

fuzzy modelsSVD

Model

memory-based

model-based

desired quality of results


Filtering algorithms

Content-based filtering Collaborative filtering Demographic filtering

Based on info about item itself, usually keywords or phrases

occurring in the item

Similarity btw two content itemsis measured by measuringsimilarity associated with theirterm vectors

User’s profile can be developedby analyzing set of content theuser interacted with

Enable you to compute thesimilarities btw a user and an item

Common personal attributes(sex, age, country, etc.) havecommon preferences

Based on interactions of users

Users rate items, and CF findspatterns in the way items havebeen rated by the user and otherusers to find additional items ofinterest for a user

Match a user’s metadata to thatof other similar users and recommend items liked by them

Two main approaches Memory-based Model-based


Two main approaches in Collaborative Filtering (CF)

Memory-based Model-based

Use the matrix of user ratings for items ofthe entire database to find users that aresimilar to the active user, and use their preferences to predict ratings for the active user

Advantage Quality of predictions are rather good Relatively simple algorithm to implement for any situation New data can be added easily and incrementally Need not consider content of items

Disadvantage It depends on human ratings Performance decreases when data gets sparse Prevent scalability and have problems with large datasets

Find patterns based on training data, and these are used to make predictions for real data

Extract some info from dataset, and use that asa “model” to make recommendations withouthaving to use complete dataset every time

Advantage Handle sparsity better than memory based ones Scalable with large datasets Improve prediction speed

Disadvantage Expensive model building Can lose useful info due to reduction models

Approaches Linear algebra, Probabilistic methods, Neural networks,

Clustering, Latent classes, and so on

1. RS Foundations1-2. Cold-start

Cold-start problem

Cold-start

New items and new users can cause the cold-start problem, as there will be insufficient data on these new entries for CF to work accurately

Hybrid Filtering Researches Leung et al. [135]

- cross-level association rules to integrate content info about domains items Kim et al. [118]

- use collaborative tagging by crawling the delicious site Weng et al. [228]

- combine implicit relations btw users’ items preferences and additional taxonomic preferences

Loh et al. [140]- present user’s profiles with info extracted from users’ scientific publications

Martinez et al. [148]- hybrid RS which combines CF with knowledge-based one

Chen and He [56]- a number of common terms / term frequency (NCT/TF) CF based on demographic vector

Saranya and Atsuhiro [199]- utilize latent features extracted from items

Park et al. [173]- use filterbots, and surrogate users that rate items based only on user or item attributes

1. RS Foundations1-3. Similarity Measures

Similarity Measures (SM)

Memory-based Model-based Deal with cold-start

Traditional Pearson correlation, Cosine, Euclidean,

Adjusted cosine, Constrained correlation, Mean Squared Differences

Researches Bobadilla et al. [31]

Jaccard Mean Squared Differences- use non-numerical info besides using numerical info from ratings

Ortega et al. [169] use Pareto dominance to eliminate

less representative users from k-neighbor selection process

Bobadilla et al. [35] SING (singularities)

- use info contained in votes of all users, instead of restricting it to ratings of two users compared or two items compared

Advantage Increase in accuracy, in performance

(time consuming) or in both

Disadvantage Model must be regularly updated

in order to consider most recentlyentered set of ratings

Researches Bobadilla et al. [33]

GEN – use genetic algorithms

Researches Ahn [6]

PIP – heuristic SM Heung-Nam et al. [98]

UERROR – predict first actualratings and subsequently identifyprediction errors for each user

Bobadilla et al. [36] NCS – based on neural learning

(model-based CF) and adaptedfor new user cold-start situations

• (user to user) similarity btw pairs of users: compare ratings of all the items rated by two users• (item to item) similarity btw pairs of items: compare ratings of all users who have rated two items

2. Hybrid CBF / CF2-1. Challenges

Challenges of CBF and CF

CBF CF

Cannot predict quality of item How popular the item is? How a user will like the item? Difficult to acquire feedback from users because with CBF,

users do not typically rate items

Limited content analysis In certain domains (e.g., music, blogs, and videos), it is a

complicated task to generate the attributes for items

Overspecialization Users only receive recommendations for items that are very

similar to items they liked or prefered

Data sparsity Many commercial RSs are based on large datasets. As a

result, the user-item matrix used for CF could be extremelylarge and sparse

Researches- Dimensionality reduction techniques [202] The reduction methods are based on Matrix Factorization- combine model-based tech Latent Semantic Index (LSI) and reduction method Singular Value Decomposition (SVD)

Cold-start problem See the 1-2 slide, “1-2. Cold-start”

Synonyms Same or very similar items having different names or entries Topic Modeling (like Latent Dirichlet Allocation tech) could

solve this by grouping different words belonging to the sametopic

Shilling attacks People may give positive ratings for their own items and

negative ratings for their competitors

2. Hybrid CBF / CF2-2. Hybrid Approaches

Methods, Advantages and Trends

Methods Advantages Trend in CBF

CF solves CBF's problems It can function in any domain It is less affected by overspecialization It acquires feedback from users

CBF adds qualities to CF Improvement to quality of the predictions,

because they are calculated with moreinfo, and reduced impact from cold-startand sparsity problems

Add social info to itemsattributes Tag RS - RS tags attempt to provide personalized item

recommendations to users through the most representative tags

Use of tags in the recommendationprocess

- increase capacity of traditional RS

Calculate CBF and CF separately and subsequently combine them

Incorporate CBF characteristics into CF

Construct a unified model with both CBFand CF characteristics

Incorporate CF characteristics into CBF

2. Hybrid CBF / CF2-3. Social Filtering

Current Researches

Improvement in RS Create or enable RS Trust and Reputation

Use social info to create orenable RS

Researches Siersdorfer and Sergei [210]

- predict utility of items, users or groups based on multi-dimensional social environment of a given user- do a mining of rich set of structures and social relationships that provides folksonomies

Li and Chen [137]- blog recommendation that combines trust model, social relation and semantic analysis

Jason [111]- discover social networks between mobile users

Jyun and Chui [115]- use trading relationship to calculate level of recommendation for trusted online auction sellers

Dell’amico and Capra [69]- users’ trustworthiness has been measured - two criteria: taste similarity and social ties

User trust calculate credibility of users through

info of rest of users or social network Item reputation

calculate reputation of items through

feedback of users or studying how users work with these items

Researches Yuan et al. [239] - choose trust aware RS to

demonstrate advantages by making use of small-world nature of trust network

Li and Kao [138] - RS based on trust of social

networks to enhance the quality of peer production services

Ma et al. [145]- probabilistic factor analysis framework, combining ratings and trusted friends- this framework can be applied to pure user-item rating matrix

Most of research work aims to obtain improvements in the recommendations made by referring to extra info providedsocial info used

Researches• Woerndl and Groh [231]

- use social networks to enhance CF• Arazy et al. [13]

- use data from online social networks and electronic communication tools

• Xin et al. [233]- exploit learners note taking activity to enrich and extend the user profile

• Bonhard and Sasse [41]- similarity and familiarity btw the user and persons who have rated the

items can aid decision making• Fengkun and Hong [75]

- incorporate users’ preference ratings and their social relationships into CF

• Carmagnola et al. [52]- recommending content in social RS based on social network structure and influence relationship among users

• Ramaswamy et al. [189]- analyze info such as address books to estimate level of social affinity

3. Trends3-1. Introduction

Recommender systems trends

Trends

Shilling attack

generate many positive ratings for a product

Privacy and security

Knowledge-based filteringuse knowledge about users and productsto generate recommendations, reasoning about what products meet the user’srequirementsHybrid approach

use current databases tosimultaneously incorporatememory-based, social andcontent-based info

Workflow

user model is based on“users-roles-tasks referenceInformation”

Collection of implicit info

Peer-to-peer (P2P) networks

Incorporation of different types of info

e.g., explicit ratings, social relations, user contents, locations, use trends, knowledge-based info

access to web sites, food purchased,Use of public transport systems, etc

tradeoffs between accuracy and privacy

user info is based on distributed info

3. Trends3-2. Location-aware RS

Location-aware recommender systems

Geographic CF RSs Researches

RS Traditional RS without using geographical info

RS + G• Traditional RS which contributes item’s geographical position• Geographic Info does not play a part in recommendation

process

GRS Geographic RS Ratings are made in a traditional way, whilst recommendations

are made by considering the geographical position of the user

GRS+ Users establish ratings on items by weighting the distance

between them and the items rated

Researches Martinez et al. [149]

- examples of RS + G group Schlieder [205]

- modeling collaborative semantics of geographic folksonomies based on analysis of tags that users assign to composite objects

Wan-Shiou et al. [225]- hybrid content based/geographic RS that analyzes a customer’s history and position so vendor info can be ranked according to the match with preferences of a customer

Matyas and Schlieder [152]- users’ ratings are taken based on photos they have downloaded and uploaded them to the same Web (the photos have a GPS address associated to them)- after this, search of k-neighborhoods based on this data is carried out

Travel GPS traces can be reinforced with social information based on friends (GRS+)

3. Trends3-3. Bio-inspired approaches

Bio-inspired approaches (Model-based RS)

Genetic Algorithms (GA) Neural Networks (NN)

GA have mainly been used in two aspects Clustering

- use common genetic clustering algorithms such as GA-based K-means

Hybrid user models- chromosome structure can contain demographic charateristics and/or those related to content-based filtering

Researches• Dao et al. [68]

- Model-based CF using GA for location-based advertisement

• Bobadilla et al. [33]- use GA to create a similarity metric, weighting a set of very simple similarity measures

• Hwang et al. [106]- GA to learn personal preferences of customers

Focus on hybrid RS, in which NNs are used to learn users profiles, and have been used in clustering processes of some RS

Researches Ren et al. [192]

- use Widrow-Hoff [229] algorithm to learn each user’s profile from contents of rated items

Christakou and Stafylopatis [62]- use combination of CBF / CF RS

Lee and Woo [133]- all users are segmented by demographic characteristics and users in each segment are clustered according to preference of items using Self-Organizing Map(SOM) NN Kohonon’s SOMs are a type of unsupervised learning

Huang et al. [103]- use training back-propagation NN for generating association rules that are mined from transactional DB

Roh et al. [193]- combine CF with SOM and Case Based Reasoning (CBR) by changing unsupervised clustering problem into supervised user preference reasoning problem

Sevarac et al. [207]- use Neuro-fuzzy inference to create pedagogical rules in e-learning

Bobadilla et al. [36]- new cold-start similarity measure has been perfected using optimization based on neural learning

Acilar and Arslan [2]- CF based on Artificial immune network algorithm (aiNet)

3. Trends3-4. Conclusions

Genernations of RS

1st Generation2nd Generation

Use traditional websites to collect info from Content-based data from purchased or used products Demographic data collected in user’s records Memory-based data collected from user’s item preferences

Focus on improving accuracy through filtering

Extensively use web 2.0 by gathering social info

3rd Generation

Will use web 3.0 through info provided byintegrated devices on the Internet

Incorporate location info into existingrecommendation algorithms

Future Research

Advancing existing methods and algorithms to improve quality of RS

New lines of research Proper combination of existing recommendation methods that use different types of available information To get maximum use of individual potential of various sensors and devices on the Internet of Things Acquisition and integration of trends related to habits, consumption and tastes of individual users Data mining from RS databases for non-recommendation uses

(e.g., market research, general trends, visualization of differential characteristics of demographic groups) Enabling security and privacy for RS process New evaluation measures and developing a standard for non-standardized measures Designing flexible frameworks for automated analysis of heterogeneous data

4. References

References

[1] J. Bobadilla, F. Ortega, A. Hernando and A. Gutierrez, “Recommender Systems Survey,” Knowledge Based Systems, Vol. 26, 2013, pp. 109-132.

[2] Book: Collective Intelligence in Action[3] en.wikipedia.org/wiki/Collaborative_filtering[4] www.cs.carleton.edu/cs_comps/0607/recommend/recommender/memorybased.html[5] www.cs.carleton.edu/cs_comps/0607/recommend/recommender/modelbased.html

Technology

Summary of a Recommender Systems Survey paper