50
Recommend User to Group in Flickr Zhe Zhao 4-29 2010

Recommend User to Group in Flickr Zhe Zhao 4-29 2010

Embed Size (px)

DESCRIPTION

What I am going to present A problem seldom being studied in social media recommendation: Why does this problem matters

Citation preview

Recommend User to Group in Flickr

Zhe Zhao4-29 2010

What I am going to present

• A problem seldom being studied in social media recommendation:– Recommend Flickr Group to User

What I am going to present

• A problem seldom being studied in social media recommendation:

• Why does this problem matters

What I am going to present

• A problem seldom being studied in social media recommendation:

• Why does this problem matters• How to make use of meaningful information– A matrix factorization perspective to view the

problem– A Topic Model Based Solution

What I am going to present

• A problem seldom being studied in social media recommendation:

• Why does this problem matters• How to make use of meaningful information• At last, Something about implementation

Recommend User to Group

• Background:– User Activity: Upload and favor photos, add

contacts, and join groups, based on his/her interests and everyday life.

Recommend User to Group

• Our Problem:– Recommend Relevant Group to User• user relevant to a group means that the topic and

interests the group focused on is similar to the user’s interests, shown by the similarity of the content between the photos from the user and photos from the group pool.

Recommend User to Group

• Related Work– Problems: The first few works to recommend

Flickr group to user, using content, social relations and collaborative information.

– Approaches:• Recommender systems.• Expert Finding.

Our Proposed Solution• Intuition:– Find User’s interests and Group’s topics/Interests, similar

interests indicate user is relevant to Group. • Solution:– Latent Interests Dimensions can be found by matrix

factorization and graphical model.• Considered Information(Interests are reflected in)– User Upload and Favor photos– Group collect photos in pool.– User join Group.– User add contacts.

Our Proposed Solution

• Modeling Interests via Matrix Factorization– Mining Latent Interests from origin feature space– Used Information:

• User Upload and Favor photos• Group collect photos in pool.• User join Group.• User add contacts.

• A probabilistic solution on equivalent graphical model.

• Learning the model & Implementation

Modeling Interests via Matrix Factorization

(f1, f2, f3, f4, f5, f6, f7, … , fd)

Feature Space

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)…

..

Photo1

Photo2

Photo3

Photot

…..

Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)User u

Modeling Interests via Matrix Factorization

(f1, f2, f3, f4, f5, f6, f7, … , fd)

Feature Space

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)…

..

Photo1

Photo2

Photo3

Photot

…..

Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Cu

User u

Modeling Interests via Matrix Factorization

(f1, f2, f3, f4, f5, f6, f7, … , fd)

Feature Space

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)…

..

Photo1

Photo2

Photo3

Photot

…..

Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Cu

User u

Cu ≈ F ×Iu’ = MCu

Each row represent the latent interests of user in each photo

Modeling Interests via Matrix Factorization

(f1, f2, f3, f4, f5, f6, f7, … , fd)

Feature Space

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)…

..

Photo1

Photo2

Photo3

Photot

…..

Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Cu

User u

C1 ≈ F ×I1’ = MC1

C2 ≈ F ×I2’ = MC2

Cn ≈ F ×In’ = MCn

…For n Users

Modeling Interests via Matrix Factorization

(f1, f2, f3, f4, f5, f6, f7, … , fd)

Feature Space

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)…

..

Photo1

Photo2

Photo3

Photot

…..

Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Pg

Pg ≈ F ×Tg’ = MPg

Each row represent the latent topics of group in each photo

Group g

Modeling Interests via Matrix Factorization

(f1, f2, f3, f4, f5, f6, f7, … , fd)

Feature Space

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)

(f1, f2, f3, f4, f5, f6, f7, … , fd)…

..

Photo1

Photo2

Photo3

Photot

…..

Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Pg

Group g

P1 ≈ F ×T1’ = MP1

P2 ≈ F ×T2’ = MP2

Pm ≈ F ×Tm’ = MPm

…For m Groups

Modeling Interests via Matrix Factorization

(v1, v2, v3, v4, v5, v6, v7, … , vn)Group1

Group2

Group3

Group4

Groupm

…..

All Groups =R(v1, v2, v3, v4, v5, v6, v7, … , vn)

(v1, v2, v3, v4, v5, v6, v7, … , vn)

(v1, v2, v3, v4, v5, v6, v7, … , vn)

(v1, v2, v3, v4, v5, v6, v7, … , vn)U

ser1 U

ser2

User3

User4

Usern……

All Users

Rgu = |Cu ∩ Pg | / |Cu|

Modeling Interests via Matrix Factorization

(v1, v2, v3, v4, v5, v6, v7, … , vn)Group1

Group2

Group3

Group4

Groupm

…..

All Groups =R(v1, v2, v3, v4, v5, v6, v7, … , vn)

(v1, v2, v3, v4, v5, v6, v7, … , vn)

(v1, v2, v3, v4, v5, v6, v7, … , vn)

(v1, v2, v3, v4, v5, v6, v7, … , vn)

User

1 User2

User3

User4

Usern……

All Users

R ≈ f(LT ×LI’) = MTI

Each row represent the latent topics of groupEach row represent the latent interests of user

ȣ LT ×LI’

Modeling Interests via Matrix Factorization

• Till now, our model can be written as:

R ≈ f(LT ×LI’) = MTI

Cu ≈ F ×Iu’ = MCu

Pg ≈ F ×Tg’ = MPg

ȣ LT ×LI’

n

m

Modeling Interests via Matrix Factorization

• Till now, our model can be written as:

– Constrains of User Contacts:• Minimize the sum of Dis( Iu1, Iu2 ) = |Iu1, Iu2|Euc where User

u1 calls User u2 as contact.

R ≈ f(LT ×LI’) = MTI

Cu ≈ F ×Iu’ = MCu

Pg ≈ F ×Tg’ = MPg

ȣ LT ×LI’

n

m

Modeling Interests via Matrix Factorization

• Used Information:– User Upload and Favor photos– Group collect photos in pool.– User join Group.– User add contacts.

Our Proposed Solution

• Modeling Interests via Matrix Factorization:• A probabilistic solution on equivalent

graphical model.– Several Assumptions– Equivalent Graphical Model– Calculating the joint probability

• Learning the model & Implementation

A probabilistic solution on equivalent graphical model

• Several Assumptions

Our Proposed Matrix-Factorization Model

R ≈ f(LT ×LI’) = MTI

Cu ≈ F ×Iu’ = MCu

Pg ≈ F ×Tg’ = MPg

ȣ LT ×LI’

n

m

A probabilistic solution on equivalent graphical model

• Several Assumptions

Rewrite the Model in row and entry form

rgu ≈ f(ltg ×liu’)

cui ≈ F ×iui’pgj ≈ F ×tgj’

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*nR ≈ f(LT ×LI’) = MTI

Cu ≈ F ×Iu’ = MCu

Pg ≈ F ×Tg’ = MPg

ȣ LT ×LI’

n

m

A probabilistic solution on equivalent graphical model

• Several Assumptions– iui and tgj are hidden random variables.– ltg and liu are hidden random variables.

Rewrite the Model in row and entry form

R ≈ f(LT ×LI’) = MTI

Cu ≈ F ×Iu’ = MCu

Pg ≈ F ×Tg’ = MPg

ȣ LT ×LI’

n

m

rgu ≈ f(ltg ×liu’)

cui ≈ F ×iui’pgj ≈ F ×tgj’

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*n

A probabilistic solution on equivalent graphical model

• Several Assumptions– iui and tgj are hidden random variables.– ltg and liu are hidden random variables.

Add Gaussian noise to the right of the equations

rgu = f(ltg ×liu’) + ε

cui = F ×iui’ + εc

pgj = F ×tgj’ + εp

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*nR ≈ f(LT ×LI’) = MTI

Cu ≈ F ×Iu’ = MCu

Pg ≈ F ×Tg’ = MPg

ȣ LT ×LI’

n

m

A probabilistic solution on equivalent graphical model

• Several Assumptions– iui and tgj are hidden random variables.– ltg and liu are hidden random variables.– rgu are random varibles based on ltg and liu.– cui and pgj are random variables based on iui, F and tgj, F

respectively– iui and tgj are based on ltu and lig

The revised model

rgu = f(ltg ×liu’) + ε

cui = F ×iui’ + εc

pgj = F ×tgj’ + εp

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*n

A probabilistic solution on equivalent graphical model

• Several Assumptions– rgu|ltg,liu ~ N(f(ltg×liu’), δI)– cui | iui,F ~ N(F×iui’, δcI)– cgj | tgj,F ~ N(F×tgj’, δpI)

The revised model

rgu = f(ltg ×liu’) + ε

cui = F ×iui’ + εc

pgj = F ×tgj’ + εp

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*n

A probabilistic solution on equivalent graphical model

• Several Assumptions– rgu|ltg,liu ~ N(f(ltg×liu’), δI)– cui | iui,F ~ N(F×iui’, δcI)– cgj | tgj,F ~ N(F×tgj’, δpI)– iui | liu ~ Bernoulli (Multinomial, Exponential) – tgj | ltg ~ Bernoulli (Multinomial, Exponential)

The revised model

rgu = f(ltg ×liu’) + ε

cui = F ×iui’ + εc

pgj = F ×tgj’ + εp

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*n

A probabilistic solution on equivalent graphical model

• Several Assumptions– rgu|ltg,liu ~ N(f(ltg×liu’), δI)– cui | iui,F ~ N(F×iui’, δcI)– cgj | tgj,F ~ N(F×tgj’, δpI)– iui | liu ~ Bernoulli (Multinomial, Exponential) – tgj | ltg ~ Bernoulli (Multinomial, Exponential)

– iui ~ Conjugate prior of iui | liu

– tgj ~ Conjugate prior of tgj | ltg

The revised model

rgu = f(ltg ×liu’) + ε

cui = F ×iui’ + εc

pgj = F ×tgj’ + εp

ȣ ltg ×liu’

Σu|Cu|

Σg|Pg|

m*n

A probabilistic solution on equivalent graphical model

0,1,0,0

Latent interests

1,0,1,0

1,1,0,1

Photo1

Photo2

Photo3

Photo4 1,1,1,0

User u

Good color

Cute animal

Sony Camera Politics

0,1,0,0

1,0,1,0

0,1,0,0

Photo1

Photo2

Photo3

Photo4 0,1,1,0Group g

0.4, 0.2, 0.1, 0.3

0.1, 0.2, 0.7, 0.0

liu

ltg

iu1

iu2

iu3

iu4

tg1

tg2

tg3

tg4

rgu

ȣ 0.16

A probabilistic solution on equivalent graphical model

• Equivalent Graphical Model: Topic Model based Recommendation(TMR)

A probabilistic solution on equivalent graphical model

• Equivalent Graphical Model

cui = F ×iui’ + εc Σu|Cu| pgj = F ×tgj’ + εp Σg|Pg|

rgu = f(ltg ×liu’) + ε

ȣ ltg ×liu’ m*n

Our Proposed Solution

• Modeling Interests via Matrix Factorization:• A probabilistic solution on equivalent

graphical model.• Learning the model & Implementation – Gibbs Sampling based– User recommendation for group

Learning the model & Implementation

• Our task:– Predict rgu for user u and group g

Learning the model & Implementation

• Our task:– Predict rgu for user u and group g

• Our method:– Gibbs Sampling for the model• Sample each iui and tgj in the model• Chose the rgu based on pdf conditioned by iui and tgj

Learning the model & Implementation

• Gibbs sampling in our model– The joint probability of the model

Learning the model & Implementation

• Gibbs sampling in our model– The joint probability of the model

Learning the model & Implementation

• Gibbs sampling in our model– The joint probability of the model

Learning the model & Implementation

• Gibbs sampling in our model– The joint probability of the model

– Sampling based on equations:

Learning the model & Implementation

• Implementation– Data structure and preprocessing• Visual word extraction

– Hierarchical clustering on 100k subset get 1019 centers• Filter out high and low frequent tags

– Tags appear in 90% photos or less than 2 times --- 48733 tags• Build Hash table for User and Photo and Inverted Index

for tags on a 30 group subset• Use DBMS to store the 200 group dataset

Learning the model & Implementation

• Implementation– Sampling:

• 0. randomly select 20% of the rgu matrix as test set, user the rest as training set.

• 1. get a 5000 samples photos subset to perform svd to reduce dimensionality for tags (48733 -> 1000)

• 2. get a 5000 sampled photos subset after svd to perform svd to get the prior \miu in the model (2019->10, latent dimension set to be 10)

• 3. Init Iui for each photo of each user and init tgj for each photo of each group.

• 4. perform sampling in 1000 iterations (currently, 1 iteration cost 22 s)• 5. select the sampling result having the max joint probability• 6. predict rgu based on the result and relational function

Recent Works

• Problem in the Graphical Model– Photo feature is the sum of latent

interest features• Not a good/proper fitting for the feature

Recent Works

• Problem in the Graphical Model– Photo feature is the sum of latent

interest features• Not a good/proper fitting for the feature• Note that, different from LDA:

– LDA is document-word model– TMR is document-feature model– Different fitting schema– TMR is not linking of two LDA

Recent Works

• Problem in the Graphical Model• Revised Model– Weighted TMR– Multiple(l)-interest TMR– Hierarchical LDA

Recent Works

• Problem in the Graphical Model• Revised Model– Weighted TMR• Weighting Parameters

on User/Group Level

Recent Works

• Problem in the Graphical Model• Revised Model– Multiple(l)-interest TMR• Photo Interest formed by multiple basic interests

Recent Works

• Problem in the Graphical Model• Revised Model– Hierarchical LDA• Related Work: Blei NIPS04 hierarchical LDA

Recent Works

• Problem in the Graphical Model– Other problems:• Multiple Sources of Feature

– tags & visual• Currently, not considering User Contact.

– Solution: refer to Blei10 study on link prediction• Implementation Problem:

– Slow speed for sampling– Noise & data structure building

To sum up

• Recommend User to Group– First work in social media sharing websites, using

content, social relations and collaborative information

– Proposed Solution:• Modeling the interests by matrix factorization• A Probabilistic Approach on equivalent Graphical Model.• Gibbs Sampling based Parameter Tuning

– Future Work• Efficient Implementation & Experiment

Thank you!