Upload
holmes-pate
View
33
Download
0
Embed Size (px)
DESCRIPTION
John Hannon , Mike Bennett, Barry Smyth CLARITY Centre for Sensor Web Technologies University College Dublin. Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches. Outline. 1. Problem. 2. Related work & Innovation. Method & Experiment. 3. 4. - PowerPoint PPT Presentation
Citation preview
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches
John Hannon, Mike Bennett, Barry Smyth CLARITY Centre for Sensor Web Technologies
University College Dublin
Outline
Problem1
Method & Experiment3
Result & Analysis4
Related work & Innovation2
Problem
• The paper solves an important
recommendation problem— for a given user,
UT which other users might be recommended
as followers/followees, based on a large
dataset of Twitter users and their tweets.
• The motivation of the paper is to demonstrate
the potential for effective and efficient followee
recommendation.
Related Work
• Analysis of Twitter’s real-time data.– Kwak et al : reciprocity and homophily among Twitter users,
information diffusesion.
• User-generated content like review as an additional source is used in recommender system.– The use of user-generated movie reviews from IMDb as part of
a movie recommender system.
• Research to help users find and contact with people online.– The information such as co-authorships are used to identify
similar users.– Freyne and Geyer et al have done much work about
relationship building.• Make recommendations to new users during their sign-up
process. • Recommend Topics for Self-Descriptions in Online User Profiles
Innovation
• Twitter’s potential as a powerful source of
profiling data. This is a novel take on
profiling and recommendation in itself.
• Focus on noisy, unstructured micro-
blogging data.
• Novel contribution of the paper is that
noisy as Twitter data is, it can still provide
a useful recommendation signal.
twittomender
Approach
• How users are profiled
– Content-based techniques which rely on the
content of tweets.
– Collaborative filtering approaches based on
the followees and followers of users.
• How these profiles can be used to
suggest interesting users to follow.
– Lucene platform are used to develop the
framework.
Profiling Users on Twitter
• 5 basic profiling strategies:
(1) Representing users by their own tweets
(tweets(UT));
(2) By the tweets of their followees
(followeetweets(UT));
(3) By the tweets of their followers
(followertweets(UT));
(4) By the ids of their followees (followees(UT));
(5) By the ids of their followers (followers(UT)).
Indexing & Recommendation
• Using Lucene’s indexing features we can represent
each, UT , as a weighted term-vector, profile (UT,
source).
• profile (UT ,source) = {w1,…,wn}
• Term weighting function: TF-IDF
• Query-based retrieval and profile-based
recommendation are then implemented using Lucene's
standard retrieval function, with the target user's profile
document serving as the search query in the case of
the latter.
Experiment—dataset
• Imported 20,000 users directly using the
Twitter API as dataset. The dataset is split into
two sets of users –one containing 1000 users
to act as test users, and a larger training-set of
19,000 users;
9 different profile information
• S1: tweets(UT)
• S2: followeestweets(UT)
• S3: followerstweets(UT)
• S4: tweets(UT), followeestweets(UT), followerstweets(UT)
• S5: followee(UT)
• S6: follower(UT)
• S7: followee(UT), follower(UT)
• S8: the scoring function is based on a combination of
content and collaborative strategies S1 and S6;
• S9: the scoring function is based on the position of the
user in each of the recommendation lists.
Recommendation Precision
• Our basic measure of recommendation performance is
the average percentage overlap between a given
recommendation list and the target user's actual
followees-list;
• We can also see that relevant recommendations tend to
be clustered towards the top of recommendation lists
since the precision of all strategies is seen to decline
within increasing recommendation-list size. Interestingly,
the collaborative strategies perform better than the
content strategies;
Ranking Effectiveness
• The position of relevant recommendations is also an important consideration, especially since we know that users focus the lion's share of their attention on items at the top of results or recommendation-lists.
A live-user trial
• Shortage of the off-line evaluation ?• It’s unwise to discount the non-overlapping
recommendations as definitively not relevant to the target user.
User Recommendation: an average of 6.9 users per recommendation-list. User Search: an average of 4.9 of the suggested users per search.
Conclusion
Advantage:
• User-generated contents are used as a source of profiling data.
• Tweet doesn’t been preprocessed.
My idea:
• User’s tweet should be preprocessed, such as extracting tag
from tweet. The tag may be more important than content.
• Besides, other information such as the group user join in is also
worthy to take into account.
• Users can be divided into celebrity and people. For different kind
of users, different strategy should be take into account.
• Barry Smyth : Centre Director • his research interests include personalization,
recommender systems, case-based reasoning, machine learning, and information retrieval.
• Mr. John HannonPh.D. Student
• Mike Bennett is a postdoctoral researcher and interaction designer