36
Making Your Interests Follow You on Twitter Fabrizio Silvestri, ISTI - CNR, Pisa, Italy Joint Work with: Marco Pennacchiotti, eBay Inc., San Jose, USA Hossein Vahabi, IMT, Lucca, Italy Rossano Venturini, Dept. of Computer Science, University of Pisa, Italy CIKM 2012 - Maui, HI Tuesday, October 30, 2012

Making Your Interests Follow You on Twitter

Embed Size (px)

Citation preview

Page 1: Making Your Interests Follow You on Twitter

Making Your Interests Follow You

on TwitterFabrizio Silvestri, ISTI - CNR, Pisa, Italy

Joint Work with:

Marco Pennacchiotti, eBay Inc., San Jose, USAHossein Vahabi, IMT, Lucca, Italy

Rossano Venturini, Dept. of Computer Science, University of Pisa, Italy

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

Page 2: Making Your Interests Follow You on Twitter

Twitter Recommendations:

Why?• Social media are popular:

• In January 2012 Twitter has been visited 2.5 billion times, more than double than 6 months before.

• More than 3,000 tweets per second: Information Overload?

• Information Hiding

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 3: Making Your Interests Follow You on Twitter

What is Twitter, A Social Network or a News

Media?• “We [...] classify the trending topics based on

the active period and the tweets and show that the majority (over 85%) of topics are headline or persistent news in nature.”

• Twitter users want to be notified on interesting (for them) news as soon as possible.

• What if I do not follow a person retweeting (or tweeting) a piece of news that is interesting for me?

H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media?. In Proceedings of WWW '10.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 4: Making Your Interests Follow You on Twitter

Related Work• At SIGIR 2012 (notification sent but accepted papers not available at

the time of CIKM submission):

• K. Chen, T. Chen, G. Zheng, O. Jin, E. Yao, and Y. Yu. Collaborative personalized tweet recommendation. In Proceedings of SIGIR '12.

• For them: “The goal of personalized tweet recommendation is to estimate the value of a tweet for each user.”

• They make the following assumptions:

• “Users’ retweeting actions reflect their personal judgement of informativeness and usefulness.”

•“Users who have retweeted similar statuses in the past are likely to retweet similar statuses in the future.”

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 5: Making Your Interests Follow You on Twitter

Related Work• Their solution is a collaborative tweet ranking

model integrating topic level features, social relations, and explicit features. Stochastic gradient descent is used for parameter estimation.

• How they evaluate effectiveness?

• If a user retweets a tweet it is relevant (1) otherwise it is not (0).

• P@n and MAP

Pros• Within their evaluation schema MAP gets

to a value of 0.7627

Cons• For each tweet we need to extract a quite

big set of features (expensive to compute).

• It is more a retweet prediction method rather than a tweet recommendation method.

•It can suggest the “same” tweet over and over again.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 6: Making Your Interests Follow You on Twitter

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 7: Making Your Interests Follow You on Twitter

Problem Setting

• Let T be a stream of tweets t1,t2,...

• u is a twitter user and we “assume” we know the interestingness of a tweet ti for u: Iu(ti).

• We define two problems whose goal is to select S⊆T, such that the overall interestingness, Iu(S), is maximized.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 8: Making Your Interests Follow You on Twitter

TweetRec Problem• Given a user u and a positive integer k, we aim

at finding a set S of k tweets in T maximizing the overall “interestingness”. More formally, we would like to find

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 9: Making Your Interests Follow You on Twitter

TweetRec Problem

• Pros

• The optimal solution to TweetRec is discoverable in O(|T| log k).

• Cons

• Independence: “Being interesting for a tweet does not impact on interestingness of other tweets”.

t1 = “What’s new in Linux 3.2? #linux”t2 = “New features in Linux 3.2. #linux”

Would u be happy to see both of them in the “tweet to follow” area?

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 10: Making Your Interests Follow You on Twitter

Interesting-Spanning TweetRec Problem

• Given a user u and a positive integer k, we aim at finding the k tweets t in T maximizing the overall interestingness. More formally, we would like to identify the set S such that

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

interestingness for user t∈S in the shared informative content among all tweets t∈S

Page 11: Making Your Interests Follow You on Twitter

Hardness Results• InterestSpanning TweetRec is NP-Hard

• Reduction from IndSet.

• F(S) is non-negative, monotone, and submodular.

• Theorem [Fisher, Nemhauser, and Wolsey, ’78]For a non-negative, monotone submodular function F, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value. Let S* be a set that maximizes the value of F over all k-element sets. Then F(S) ≥ (1 − 1/e)F(S*).

Page 12: Making Your Interests Follow You on Twitter

Estimating Interestingness

• So far we have assumed to know I(S) for each subset S of T.

• Assumption 1: “The interests of a user u are implicitly expressed in his/her tweets.”

• It is legitimate, then, to aim at computing I(t), i.e., the interestingness of a tweet t, as a linear combination of two text-based similarity scores: item-wise vs. pair-wise.

• Assumption 2: “The set of tweets written by a passive user u can be enriched with other carefully chosen tweets that have been posted by users that are highly authoritative and connected to u.”

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 13: Making Your Interests Follow You on Twitter

Estimation Procedure

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

{the, math, behind, this, year,

nobel, prize, in, economics, gently,

and, beautifully, explained, mathchat}

{<the math>, <the behind>, <the, this>, ...}

Terms

Pairs of

Terms

Terms

Pairs of

Terms

Active Users

Passive Users

Page 14: Making Your Interests Follow You on Twitter

Experiment Settings• Corpus of 182,000 tweets posted between Oct 30 and Nov 4,

2011

• A large set of more than 14 million tweets was downloaded from Twitter using the Spritzer API, that provides access to a 1% random sample of all tweets. This set was then pruned to obtain our final corpus containing informative and non-junk English tweets on which we run our experiments.

• In details, the pruning process was as follows

• We discarded all the tweets shorter than 30 characters and having less than 8 tokens (i.e., terms, hash- tags, and usernames)

• We removed tweets containing less than 3 English nouns and more than 5 English stop-words (we used the NLTK7 toolkit).

• Finally, we discarded directed tweets (i.e., tweets starting with the @ symbol), that are usually personal in nature, and therefore not interesting for recommendation purposes.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 15: Making Your Interests Follow You on Twitter

Evaluating intλ

• We evaluate (averaged on the 250 users):

• P@k. Precision at rank k is the fraction of correct tweets in the top-k tweets ranked by the method.

• S@k. Success at rank k is the probability of finding at least one correct tweet on the top-k ranked ones.

• MRR. Mean of the Reciprocal Rank for each user, i.e., the inverse of the position of the first correct tweet in the ranking produced by the method.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 16: Making Your Interests Follow You on Twitter

Evaluating intλ

• Assumption: “A user is likely to find his own tweets more interesting than random tweets from other users.”

• We consider a collection of 250 users. For each user’s u timeline we extract 90% of her tweets and we use them to train u’s user model.

• The remaining 10% from all users is used to test the model. For each user u we consider “correct/relevant” the tweets from her 10%.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 17: Making Your Interests Follow You on Twitter

Comparing intλ with baselines

• Cosine: cosine measure between tweets and user’s profile

• HashTags: cosine between tweets’ hashtags and users’ profile

• int0.9: our intλ metric with λ=0.9 (Pairs are very important).

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

λ=0.9 has been optiimzed on the

training set.

Page 18: Making Your Interests Follow You on Twitter

User Study• The assessment is conducted by a group of 7 professional assessors that regularly

post tweets (Active users, in our terminology), and a group of 5 professional assessors that are using Twitter mainly for reading tweets (Passive users).

• For each assessor and each method, we generate the top-20 tweet suggestions. Each assessor is provided with a random combination of tweets selected by the different methods, and is asked to state his personal interest on each tweet, using the following scale:

• Excellent: if the tweet is “very interesting/very informative/very funny with respect to his/her interests”;

• Good: if the tweet is “interesting/informative/funny with respect to his/her interests”;

• Fair: if the tweet is “somehow interesting, but nothing bad if he/she would have skipped it”;

• Bad: if the tweet is “not interesting, and he/she would have preferred not to have it in his/her timeline.”

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 19: Making Your Interests Follow You on Twitter

User Study(Useful tweets are those judged as E, G, or F)

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 20: Making Your Interests Follow You on Twitter

List-based Evaluation of Interest Spanning

TweetRec• We proceed by randomly selecting 15 sets of 20 lists each.

• Assumption: “each list represents a specific user interest.”

• For each list we download 800 tweets.

• We then create 15 virtual users with 8,000 (= 20 × 400) tweets, one per each set, by selecting 400 tweets per list.

• We use the remaining 400 tweets per list to build a single virtual stream of tweets. The resulting dataset is a set of 15 virtual users spanning 20 different interests and having produced 8,000 tweets.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 21: Making Your Interests Follow You on Twitter

List-based Evaluation of Interest Spanning

TweetRec

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 22: Making Your Interests Follow You on Twitter

Interest Spanning TweetRecVs.

TweetRec

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 23: Making Your Interests Follow You on Twitter

Conclusion• We presented two novel recommendation

methods based on two problems:

• TweetRec and Interest Spanning TweetRec

• We beat the baseline by a large margin in terms of all the metrics we tested

• The Interest Spanning formulation impacts positively on more than 30% of users.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 24: Making Your Interests Follow You on Twitter

Question Time

Page 25: Making Your Interests Follow You on Twitter

(Some) More Notation

• Given S⊆T, we want to maximize the overall interestingness F(S) of the content of tweets t∈S for the user u.

• Within this definition, overall means that duplicate information across tweets is not going to increase the value of the objective function t∈S.

• We also define as the interestingness for user

t∈S in the shared informative content among all tweets t∈S

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 26: Making Your Interests Follow You on Twitter

Interesting-Spanning TweetRec is NP-Hard

• Reduction from IndSet.

• We want to decide wether a graph G=(V,E) has an IndSet of size k.

• In our setting we have:

• V=T, |V|=|T|=n

• I(v)=1/n, for each v in V

• For any set S⊆T, we define I(S) = |S|/n iff for each pair of vertexes u,v in S, (u,v) is not in E; otherwise we define I(S) < |S|/n.

• Given k the set S identified by a solution of Interesting-Spanning TweetRec has probability k/n if and only if G has an independent set of size k. In this case, nodes in S are exactly the nodes of one of these independent sets.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 27: Making Your Interests Follow You on Twitter

Submodularity of F(S)

• We have to show that given A⊆B⊆T and c∈T it holds thatF({c}∪A) - F(A) ≥ F({c}∪B) - F(B)

• By definition of F:

Therefore

and

The thesis follows by observing that

since A⊆B⊆T.

Theorem [Fisher, Nemhauser, and Wolsey, ’78]

For a non-negative, monotone submodular function F, let S be a set of size k obtained by selecting elements one at a time, each time choosing an element that provides the largest marginal increase in the function value. Let S* be a set that maximizes the value of F over all k-element sets. Then F(S) ≥ (1 − 1/e)F(S*)

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 28: Making Your Interests Follow You on Twitter

Item-Wise Similarity

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 29: Making Your Interests Follow You on Twitter

Pair-Wise Similarity

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 30: Making Your Interests Follow You on Twitter

is the bag-of words of tweets in X

and is the bag-of-pairs of terms

for tweets in X.

Set Similarity

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 31: Making Your Interests Follow You on Twitter

Enhancing tscore for Passive Users

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 32: Making Your Interests Follow You on Twitter

Estimating authoritativeness

After tuning:•A = 0.5•α = 2•β = 2000

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 33: Making Your Interests Follow You on Twitter

Optimizing λλ = 0.9pairs are far more

valuable than single terms.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 34: Making Your Interests Follow You on Twitter

Manual Selection of Pairs of Users

• We manually selected 20 pairs of users having very similar interests. The selection was done by using the Twitter user recommender system.

• For each pair of users we add all the tweets of the second user in the stream; the task consists to re-retrieving them by using the set of tweets of the first user as the user profile.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 35: Making Your Interests Follow You on Twitter

User Study - III

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini

Page 36: Making Your Interests Follow You on Twitter

Future Work• First, we are interested in improving the precision of our methods

by devising automatic strategies to filter out tweets that are uninteresting ‘status updates’.

• We also plan to improve the selection power of Interests-Spanning TweetRec by providing tweet deduping at higher levels of semantics (e.g., adopting Textual Entailment Recognition techniques).

• Our recommendation system could work in several application scenarios, e.g., providing to users a more interesting timeline than the one currently available in Twitter.

• Finally, we plan to carry out simulation trials to test the computational cost of our methods when facing a real-time load of thousands of tweets per second.

CIKM 2012 - Maui, HI Tuesday, October 30, 2012

M. Pennacchiotti, F. Silvestri, H. Vahabi, R. Venturini