Khalid El-Arini Carnegie Mellon University Joint work with: Ulrich Paquet, Ralf Herbrich, Jurgen Van...

Preview:

Citation preview

Khalid El-AriniCarnegie Mellon University

Joint work with:Ulrich Paquet, Ralf Herbrich, Jurgen Van Gael, Blaise Agüera

y Arcas

Transparent User Models for

Personalization

Personalization is ubiquitous.

3

• YouTube: 72+ hours/minute of new video• Facebook: 950 million+ users• Twitter: 400+ million tweets/day• Shopping:

[1994]: 500K unique consumer goods sold in U.S.[2010]: Amazon alone offered 24 million.

Personalization is invaluable.

Keyword search is not enough.

Personalization is often wrong.

- J. Zaslow, November 26, 2002

“Basil…is not a neo-Nazi. Lukas…is not a shadowy stalker.David…is not Korean.

intent on giving them such labels.”

“there's just one way to change its mind: outfox it.” - J. Zaslow, November 26, 2002

What recourse do we have?

Can we do better?

You behave like a

vegan hipster

Vegan? Really? Why?

You: • tweeted with #meatlessmonday• follow @WholeFoods• …

We propose an alternative.

Why am I getting this?

We propose an alternative.

Why am I getting this?

You behave like a

Brooklyn hipster

Goal: Achieve transparency via interpretable user features, learned from user activity

You behave like a

Brooklyn hipster

Goal: Achieve transparency via interpretable user features, learned from user activity

Badges

10

Approach Model Experiments Summary

11

1. Define a vocabulary of badges

Apple fanboy

vegan runner photographer

Rich, interpretable and explainable

12

1. Define a vocabulary of badges

2. Identify exemplars

How do I find vegans?

observed label

Take advantage of how users describe themselves

14

Most vegans don’t label themselves as “vegan” on Twitter…

we want to infer the attributes of these users

15

1. Define a vocabulary of badges

2. Identify exemplars3. Model characteristic

behavior• Hashtags #meatlessmonday• Retweets RT @WholeFoods

16

Approach Model Experiments Summary

• We have no negative training examples.Use a generative model.

• Actions can be explained by multiple badges, even for the same user.

Noisy-or to combine badges.• How do we deal with user corrections?

Observing a latent variable.

Model sketch

18

i=1…B

B badges

19

u=1…N

i=1…B

N users

20

u=1…N

i=1…B

F actions j=1…F

j=1…F

21

bi(u)

u=1…N

i=1…BDoes user u have badge i?

j=1…F

j=1…F

22

bi(u) λi(u)

u=1…N

i=1…B

j=1…F

j=1…FDoes user u have label for

badge i in his profile?

23

aj(u)

bi(u) λi(u)

j=1…F u=1…N

i=1…B

Has user u performed action j?

j=1…F

24

sij

aj(u)

bi(u) λi(u)

j=1…F

j=1…F

u=1…N

i=1…B

Does badge i explain action j?

25

sijφij

aj(u)

bi(u) wi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

What’s the probability that a user with badge i performs action j?

26

sijφijφbg aj(u)

bi(u) wi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

What is the background probability for each action?

27

sijφijφbg aj(u)

bi(u) wi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

noisy or:Can at least one of my badges (or the background) explain it?

28

sijφijφbg aj(u)

bi(u) λi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

29

sijφijφbg aj(u)

bi(u) λi(u)

αφβφj=1…F

j=1…F

u=1…N

i=1…B

Beta priors to control sparsity

30

sijφijφbg aj(u)

bi(u) λi(u)

γiT γiF

αφβφ

αT βT αF βF

j=1…F

j=1…F

u=1…N

i=1…B

Beta prior to encode low recall (e.g., 10%)

Beta prior to encode high precision

(e.g., 99.9%)

31

ηisijφijφbg aj(u)

bi(u) λi(u)

γiT γiFωi

αφβφ

αη βη αω βω αT βT αF βF

j=1…F

j=1…F

u=1…N

i=1…B

32

• Collapsed Gibbs sampler (with MH steps)

Inference

sijφijφbg

bi(u)

33

ηisijφijφbg aj(u)

bi(u) λi(u)

γiT γiFωi

αφβφ

αη βη αω βω αT βT αF βF

j=1…F

j=1…F

u=1…N

i=1…BYou behave like a

vegan hipster.

34

ηisijφijφbg aj(u)

bi(u) λi(u)

γiT γiFωi

αφβφ

αη βη αω βω αT βT αF βF

j=1…F

j=1…F

u=1…N

i=1…BYou behave like a

vegan hipster.

35

Approach Model Experiments Summary

36

• Start with 7 million Twitter users• Manually define 31 sample badges

by specifying labels

Data description

• Start with 7 million Twitter users• Manually define 31 sample badges by

specifying labels• Gather 2 million tweets from August

2011• Recall: actions are hashtags and

retweets

Remove infrequent actions and inactive users, leaving us with:

75,880 users32,030 actions

Data description

38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Chart Title

Badges

artist

photographer

country music fan

book worm

Badge statistics

39

Can we learn badges?

40

Vegetarian badge

41

Runner badge

42

Hacker badge

43

Manchester United badge

44

Do all badges look this good?

No, but most do.

45wine lover

Over-generalized

46

Overwhelmed

Ruby on Rails

47

Can we just use the labels directly?

48

Inferred Apple fanboy badge

Self-described Apple fanboys

49

• Compare to labeled LDA [Ramage+ 2009]– LDA extension where each document is

labeled with multiple tags– One-to-one mapping between topics and tags– Document explained only by topics

associated with its tags

• Hold out random 10% of labels, treat as ground truth, and try to predict them

Comparative Analysis

50

Rank of held-out labels be

tter

Better predictiveperformance

51

bett

erBetter predictions for active

users

52

Sparse badges

Apple fanboy (badges) Apple fanboy (l-lda)

53

Approach Model Experiments Summary

54

Leveraged how users describe themselves

55

Leveraged how users describe themselves to build interpretable user features You behave like a

vegan hipster

56

Empirically showed we can infer a user’s attributes from his behavior

57

谢谢

What recourse do we have?

Collaborative filtering

Content-based filtering

Can we do better?

59

Most vegans don’t label themselves as “vegan” on Twitter……but what about non-vegans?

“I drink too much and hate vegans.”

Recommended