Upload
changsung-moon
View
493
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This is a summary of the following paper: J. Bobadilla, F. Ortega, A. Hernando and A. Gutierrez, “Recommender Systems Survey,” Knowledge Based Systems, Vol. 26, 2013, pp. 109-132.
Citation preview
Recommender Systems Survey(Summary)
Changsung Moon
CONTENTS
1-1. Fundamentals
1. Foundations
1-2. Cold-start
1-3. Similarity Measures
2. Hybrid CBF/CF
2-1. Challenges of CBF and CF
2-2. Hybrid Approaches
3. Trends3-1. Introduction
3-2. Location-aware RS
3-3. Bio-inspired approaches
3-4. Conclusions
03
08
11
2-3. Social Filtering
4. References15
1. RS Foundations1-1. Fundamentals
Process is based on the following considerations
Considerations
The rest
sparsity level
performance of the system
Objective sought
predictions
top N recommendations
Employed tech
probabilistic approaches
Bayesian networks
nearest neighbors algorithm
Filtering algorithm
demographic
content-based
collaborative
Type of data
ratings
features
content
social relationship
location-aware info
social-basedcontext-awarehybrid
neural networksgenetic algorithms
fuzzy modelsSVD
Model
memory-based
model-based
desired quality of results
1. RS Foundations1-1. Fundamentals
Filtering algorithms
Content-based filtering Collaborative filtering Demographic filtering
Based on info about item itself, usually keywords or phrases
occurring in the item
Similarity btw two content itemsis measured by measuringsimilarity associated with theirterm vectors
User’s profile can be developedby analyzing set of content theuser interacted with
Enable you to compute thesimilarities btw a user and an item
Common personal attributes(sex, age, country, etc.) havecommon preferences
Based on interactions of users
Users rate items, and CF findspatterns in the way items havebeen rated by the user and otherusers to find additional items ofinterest for a user
Match a user’s metadata to thatof other similar users and recommend items liked by them
Two main approaches Memory-based Model-based
1. RS Foundations1-1. Fundamentals
Two main approaches in Collaborative Filtering (CF)
Memory-based Model-based
Use the matrix of user ratings for items ofthe entire database to find users that aresimilar to the active user, and use their preferences to predict ratings for the active user
Advantage Quality of predictions are rather good Relatively simple algorithm to implement for any situation New data can be added easily and incrementally Need not consider content of items
Disadvantage It depends on human ratings Performance decreases when data gets sparse Prevent scalability and have problems with large datasets
Find patterns based on training data, and these are used to make predictions for real data
Extract some info from dataset, and use that asa “model” to make recommendations withouthaving to use complete dataset every time
Advantage Handle sparsity better than memory based ones Scalable with large datasets Improve prediction speed
Disadvantage Expensive model building Can lose useful info due to reduction models
Approaches Linear algebra, Probabilistic methods, Neural networks,
Clustering, Latent classes, and so on
1. RS Foundations1-2. Cold-start
Cold-start problem
Cold-start
New items and new users can cause the cold-start problem, as there will be insufficient data on these new entries for CF to work accurately
Hybrid Filtering Researches Leung et al. [135]
- cross-level association rules to integrate content info about domains items Kim et al. [118]
- use collaborative tagging by crawling the delicious site Weng et al. [228]
- combine implicit relations btw users’ items preferences and additional taxonomic preferences
Loh et al. [140]- present user’s profiles with info extracted from users’ scientific publications
Martinez et al. [148]- hybrid RS which combines CF with knowledge-based one
Chen and He [56]- a number of common terms / term frequency (NCT/TF) CF based on demographic vector
Saranya and Atsuhiro [199]- utilize latent features extracted from items
Park et al. [173]- use filterbots, and surrogate users that rate items based only on user or item attributes
1. RS Foundations1-3. Similarity Measures
Similarity Measures (SM)
Memory-based Model-based Deal with cold-start
Traditional Pearson correlation, Cosine, Euclidean,
Adjusted cosine, Constrained correlation, Mean Squared Differences
Researches Bobadilla et al. [31]
Jaccard Mean Squared Differences- use non-numerical info besides using numerical info from ratings
Ortega et al. [169] use Pareto dominance to eliminate
less representative users from k-neighbor selection process
Bobadilla et al. [35] SING (singularities)
- use info contained in votes of all users, instead of restricting it to ratings of two users compared or two items compared
Advantage Increase in accuracy, in performance
(time consuming) or in both
Disadvantage Model must be regularly updated
in order to consider most recentlyentered set of ratings
Researches Bobadilla et al. [33]
GEN – use genetic algorithms
Researches Ahn [6]
PIP – heuristic SM Heung-Nam et al. [98]
UERROR – predict first actualratings and subsequently identifyprediction errors for each user
Bobadilla et al. [36] NCS – based on neural learning
(model-based CF) and adaptedfor new user cold-start situations
• (user to user) similarity btw pairs of users: compare ratings of all the items rated by two users• (item to item) similarity btw pairs of items: compare ratings of all users who have rated two items
2. Hybrid CBF / CF2-1. Challenges
Challenges of CBF and CF
CBF CF
Cannot predict quality of item How popular the item is? How a user will like the item? Difficult to acquire feedback from users because with CBF,
users do not typically rate items
Limited content analysis In certain domains (e.g., music, blogs, and videos), it is a
complicated task to generate the attributes for items
Overspecialization Users only receive recommendations for items that are very
similar to items they liked or prefered
Data sparsity Many commercial RSs are based on large datasets. As a
result, the user-item matrix used for CF could be extremelylarge and sparse
Researches- Dimensionality reduction techniques [202] The reduction methods are based on Matrix Factorization- combine model-based tech Latent Semantic Index (LSI) and reduction method Singular Value Decomposition (SVD)
Cold-start problem See the 1-2 slide, “1-2. Cold-start”
Synonyms Same or very similar items having different names or entries Topic Modeling (like Latent Dirichlet Allocation tech) could
solve this by grouping different words belonging to the sametopic
Shilling attacks People may give positive ratings for their own items and
negative ratings for their competitors
2. Hybrid CBF / CF2-2. Hybrid Approaches
Methods, Advantages and Trends
Methods Advantages Trend in CBF
CF solves CBF's problems It can function in any domain It is less affected by overspecialization It acquires feedback from users
CBF adds qualities to CF Improvement to quality of the predictions,
because they are calculated with moreinfo, and reduced impact from cold-startand sparsity problems
Add social info to itemsattributes Tag RS - RS tags attempt to provide personalized item
recommendations to users through the most representative tags
Use of tags in the recommendationprocess
- increase capacity of traditional RS
Calculate CBF and CF separately and subsequently combine them
Incorporate CBF characteristics into CF
Construct a unified model with both CBFand CF characteristics
Incorporate CF characteristics into CBF
2. Hybrid CBF / CF2-3. Social Filtering
Current Researches
Improvement in RS Create or enable RS Trust and Reputation
Use social info to create orenable RS
Researches Siersdorfer and Sergei [210]
- predict utility of items, users or groups based on multi-dimensional social environment of a given user- do a mining of rich set of structures and social relationships that provides folksonomies
Li and Chen [137]- blog recommendation that combines trust model, social relation and semantic analysis
Jason [111]- discover social networks between mobile users
Jyun and Chui [115]- use trading relationship to calculate level of recommendation for trusted online auction sellers
Dell’amico and Capra [69]- users’ trustworthiness has been measured - two criteria: taste similarity and social ties
User trust calculate credibility of users through
info of rest of users or social network Item reputation
calculate reputation of items through
feedback of users or studying how users work with these items
Researches Yuan et al. [239] - choose trust aware RS to
demonstrate advantages by making use of small-world nature of trust network
Li and Kao [138] - RS based on trust of social
networks to enhance the quality of peer production services
Ma et al. [145]- probabilistic factor analysis framework, combining ratings and trusted friends- this framework can be applied to pure user-item rating matrix
Most of research work aims to obtain improvements in the recommendations made by referring to extra info providedsocial info used
Researches• Woerndl and Groh [231]
- use social networks to enhance CF• Arazy et al. [13]
- use data from online social networks and electronic communication tools
• Xin et al. [233]- exploit learners note taking activity to enrich and extend the user profile
• Bonhard and Sasse [41]- similarity and familiarity btw the user and persons who have rated the
items can aid decision making• Fengkun and Hong [75]
- incorporate users’ preference ratings and their social relationships into CF
• Carmagnola et al. [52]- recommending content in social RS based on social network structure and influence relationship among users
• Ramaswamy et al. [189]- analyze info such as address books to estimate level of social affinity
3. Trends3-1. Introduction
Recommender systems trends
Trends
Shilling attack
generate many positive ratings for a product
Privacy and security
Knowledge-based filteringuse knowledge about users and productsto generate recommendations, reasoning about what products meet the user’srequirementsHybrid approach
use current databases tosimultaneously incorporatememory-based, social andcontent-based info
Workflow
user model is based on“users-roles-tasks referenceInformation”
Collection of implicit info
Peer-to-peer (P2P) networks
Incorporation of different types of info
e.g., explicit ratings, social relations, user contents, locations, use trends, knowledge-based info
access to web sites, food purchased,Use of public transport systems, etc
tradeoffs between accuracy and privacy
user info is based on distributed info
3. Trends3-2. Location-aware RS
Location-aware recommender systems
Geographic CF RSs Researches
RS Traditional RS without using geographical info
RS + G• Traditional RS which contributes item’s geographical position• Geographic Info does not play a part in recommendation
process
GRS Geographic RS Ratings are made in a traditional way, whilst recommendations
are made by considering the geographical position of the user
GRS+ Users establish ratings on items by weighting the distance
between them and the items rated
Researches Martinez et al. [149]
- examples of RS + G group Schlieder [205]
- modeling collaborative semantics of geographic folksonomies based on analysis of tags that users assign to composite objects
Wan-Shiou et al. [225]- hybrid content based/geographic RS that analyzes a customer’s history and position so vendor info can be ranked according to the match with preferences of a customer
Matyas and Schlieder [152]- users’ ratings are taken based on photos they have downloaded and uploaded them to the same Web (the photos have a GPS address associated to them)- after this, search of k-neighborhoods based on this data is carried out
Travel GPS traces can be reinforced with social information based on friends (GRS+)
3. Trends3-3. Bio-inspired approaches
Bio-inspired approaches (Model-based RS)
Genetic Algorithms (GA) Neural Networks (NN)
GA have mainly been used in two aspects Clustering
- use common genetic clustering algorithms such as GA-based K-means
Hybrid user models- chromosome structure can contain demographic charateristics and/or those related to content-based filtering
Researches• Dao et al. [68]
- Model-based CF using GA for location-based advertisement
• Bobadilla et al. [33]- use GA to create a similarity metric, weighting a set of very simple similarity measures
• Hwang et al. [106]- GA to learn personal preferences of customers
Focus on hybrid RS, in which NNs are used to learn users profiles, and have been used in clustering processes of some RS
Researches Ren et al. [192]
- use Widrow-Hoff [229] algorithm to learn each user’s profile from contents of rated items
Christakou and Stafylopatis [62]- use combination of CBF / CF RS
Lee and Woo [133]- all users are segmented by demographic characteristics and users in each segment are clustered according to preference of items using Self-Organizing Map(SOM) NN Kohonon’s SOMs are a type of unsupervised learning
Huang et al. [103]- use training back-propagation NN for generating association rules that are mined from transactional DB
Roh et al. [193]- combine CF with SOM and Case Based Reasoning (CBR) by changing unsupervised clustering problem into supervised user preference reasoning problem
Sevarac et al. [207]- use Neuro-fuzzy inference to create pedagogical rules in e-learning
Bobadilla et al. [36]- new cold-start similarity measure has been perfected using optimization based on neural learning
Acilar and Arslan [2]- CF based on Artificial immune network algorithm (aiNet)
3. Trends3-4. Conclusions
Genernations of RS
1st Generation2nd Generation
Use traditional websites to collect info from Content-based data from purchased or used products Demographic data collected in user’s records Memory-based data collected from user’s item preferences
Focus on improving accuracy through filtering
Extensively use web 2.0 by gathering social info
3rd Generation
Will use web 3.0 through info provided byintegrated devices on the Internet
Incorporate location info into existingrecommendation algorithms
Future Research
Advancing existing methods and algorithms to improve quality of RS
New lines of research Proper combination of existing recommendation methods that use different types of available information To get maximum use of individual potential of various sensors and devices on the Internet of Things Acquisition and integration of trends related to habits, consumption and tastes of individual users Data mining from RS databases for non-recommendation uses
(e.g., market research, general trends, visualization of differential characteristics of demographic groups) Enabling security and privacy for RS process New evaluation measures and developing a standard for non-standardized measures Designing flexible frameworks for automated analysis of heterogeneous data
4. References
References
[1] J. Bobadilla, F. Ortega, A. Hernando and A. Gutierrez, “Recommender Systems Survey,” Knowledge Based Systems, Vol. 26, 2013, pp. 109-132.
[2] Book: Collective Intelligence in Action[3] en.wikipedia.org/wiki/Collaborative_filtering[4] www.cs.carleton.edu/cs_comps/0607/recommend/recommender/memorybased.html[5] www.cs.carleton.edu/cs_comps/0607/recommend/recommender/modelbased.html