Upload
hilary-summers
View
213
Download
0
Embed Size (px)
DESCRIPTION
What I am going to present A problem seldom being studied in social media recommendation: Why does this problem matters
Citation preview
What I am going to present
• A problem seldom being studied in social media recommendation:– Recommend Flickr Group to User
What I am going to present
• A problem seldom being studied in social media recommendation:
• Why does this problem matters
What I am going to present
• A problem seldom being studied in social media recommendation:
• Why does this problem matters• How to make use of meaningful information– A matrix factorization perspective to view the
problem– A Topic Model Based Solution
What I am going to present
• A problem seldom being studied in social media recommendation:
• Why does this problem matters• How to make use of meaningful information• At last, Something about implementation
Recommend User to Group
• Background:– User Activity: Upload and favor photos, add
contacts, and join groups, based on his/her interests and everyday life.
Recommend User to Group
• Our Problem:– Recommend Relevant Group to User• user relevant to a group means that the topic and
interests the group focused on is similar to the user’s interests, shown by the similarity of the content between the photos from the user and photos from the group pool.
Recommend User to Group
• Related Work– Problems: The first few works to recommend
Flickr group to user, using content, social relations and collaborative information.
– Approaches:• Recommender systems.• Expert Finding.
Our Proposed Solution• Intuition:– Find User’s interests and Group’s topics/Interests, similar
interests indicate user is relevant to Group. • Solution:– Latent Interests Dimensions can be found by matrix
factorization and graphical model.• Considered Information(Interests are reflected in)– User Upload and Favor photos– Group collect photos in pool.– User join Group.– User add contacts.
Our Proposed Solution
• Modeling Interests via Matrix Factorization– Mining Latent Interests from origin feature space– Used Information:
• User Upload and Favor photos• Group collect photos in pool.• User join Group.• User add contacts.
• A probabilistic solution on equivalent graphical model.
• Learning the model & Implementation
Modeling Interests via Matrix Factorization
(f1, f2, f3, f4, f5, f6, f7, … , fd)
Feature Space
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)…
..
Photo1
Photo2
Photo3
Photot
…..
Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)User u
Modeling Interests via Matrix Factorization
(f1, f2, f3, f4, f5, f6, f7, … , fd)
Feature Space
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)…
..
Photo1
Photo2
Photo3
Photot
…..
Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Cu
User u
Modeling Interests via Matrix Factorization
(f1, f2, f3, f4, f5, f6, f7, … , fd)
Feature Space
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)…
..
Photo1
Photo2
Photo3
Photot
…..
Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Cu
User u
Cu ≈ F ×Iu’ = MCu
Each row represent the latent interests of user in each photo
Modeling Interests via Matrix Factorization
(f1, f2, f3, f4, f5, f6, f7, … , fd)
Feature Space
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)…
..
Photo1
Photo2
Photo3
Photot
…..
Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Cu
User u
C1 ≈ F ×I1’ = MC1
C2 ≈ F ×I2’ = MC2
Cn ≈ F ×In’ = MCn
…For n Users
Modeling Interests via Matrix Factorization
(f1, f2, f3, f4, f5, f6, f7, … , fd)
Feature Space
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)…
..
Photo1
Photo2
Photo3
Photot
…..
Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Pg
Pg ≈ F ×Tg’ = MPg
Each row represent the latent topics of group in each photo
Group g
Modeling Interests via Matrix Factorization
(f1, f2, f3, f4, f5, f6, f7, … , fd)
Feature Space
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)
(f1, f2, f3, f4, f5, f6, f7, … , fd)…
..
Photo1
Photo2
Photo3
Photot
…..
Photo4 (f1, f2, f3, f4, f5, f6, f7, … , fd)=Pg
Group g
P1 ≈ F ×T1’ = MP1
P2 ≈ F ×T2’ = MP2
Pm ≈ F ×Tm’ = MPm
…For m Groups
Modeling Interests via Matrix Factorization
(v1, v2, v3, v4, v5, v6, v7, … , vn)Group1
Group2
Group3
Group4
Groupm
…..
All Groups =R(v1, v2, v3, v4, v5, v6, v7, … , vn)
(v1, v2, v3, v4, v5, v6, v7, … , vn)
(v1, v2, v3, v4, v5, v6, v7, … , vn)
(v1, v2, v3, v4, v5, v6, v7, … , vn)U
ser1 U
ser2
User3
User4
Usern……
All Users
Rgu = |Cu ∩ Pg | / |Cu|
Modeling Interests via Matrix Factorization
(v1, v2, v3, v4, v5, v6, v7, … , vn)Group1
Group2
Group3
Group4
Groupm
…..
All Groups =R(v1, v2, v3, v4, v5, v6, v7, … , vn)
(v1, v2, v3, v4, v5, v6, v7, … , vn)
(v1, v2, v3, v4, v5, v6, v7, … , vn)
(v1, v2, v3, v4, v5, v6, v7, … , vn)
User
1 User2
User3
User4
Usern……
All Users
R ≈ f(LT ×LI’) = MTI
Each row represent the latent topics of groupEach row represent the latent interests of user
ȣ LT ×LI’
Modeling Interests via Matrix Factorization
• Till now, our model can be written as:
R ≈ f(LT ×LI’) = MTI
Cu ≈ F ×Iu’ = MCu
Pg ≈ F ×Tg’ = MPg
ȣ LT ×LI’
n
m
Modeling Interests via Matrix Factorization
• Till now, our model can be written as:
– Constrains of User Contacts:• Minimize the sum of Dis( Iu1, Iu2 ) = |Iu1, Iu2|Euc where User
u1 calls User u2 as contact.
R ≈ f(LT ×LI’) = MTI
Cu ≈ F ×Iu’ = MCu
Pg ≈ F ×Tg’ = MPg
ȣ LT ×LI’
n
m
Modeling Interests via Matrix Factorization
• Used Information:– User Upload and Favor photos– Group collect photos in pool.– User join Group.– User add contacts.
Our Proposed Solution
• Modeling Interests via Matrix Factorization:• A probabilistic solution on equivalent
graphical model.– Several Assumptions– Equivalent Graphical Model– Calculating the joint probability
• Learning the model & Implementation
A probabilistic solution on equivalent graphical model
• Several Assumptions
Our Proposed Matrix-Factorization Model
R ≈ f(LT ×LI’) = MTI
Cu ≈ F ×Iu’ = MCu
Pg ≈ F ×Tg’ = MPg
ȣ LT ×LI’
n
m
A probabilistic solution on equivalent graphical model
• Several Assumptions
Rewrite the Model in row and entry form
rgu ≈ f(ltg ×liu’)
cui ≈ F ×iui’pgj ≈ F ×tgj’
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*nR ≈ f(LT ×LI’) = MTI
Cu ≈ F ×Iu’ = MCu
Pg ≈ F ×Tg’ = MPg
ȣ LT ×LI’
n
m
A probabilistic solution on equivalent graphical model
• Several Assumptions– iui and tgj are hidden random variables.– ltg and liu are hidden random variables.
Rewrite the Model in row and entry form
R ≈ f(LT ×LI’) = MTI
Cu ≈ F ×Iu’ = MCu
Pg ≈ F ×Tg’ = MPg
ȣ LT ×LI’
n
m
rgu ≈ f(ltg ×liu’)
cui ≈ F ×iui’pgj ≈ F ×tgj’
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*n
A probabilistic solution on equivalent graphical model
• Several Assumptions– iui and tgj are hidden random variables.– ltg and liu are hidden random variables.
Add Gaussian noise to the right of the equations
rgu = f(ltg ×liu’) + ε
cui = F ×iui’ + εc
pgj = F ×tgj’ + εp
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*nR ≈ f(LT ×LI’) = MTI
Cu ≈ F ×Iu’ = MCu
Pg ≈ F ×Tg’ = MPg
ȣ LT ×LI’
n
m
A probabilistic solution on equivalent graphical model
• Several Assumptions– iui and tgj are hidden random variables.– ltg and liu are hidden random variables.– rgu are random varibles based on ltg and liu.– cui and pgj are random variables based on iui, F and tgj, F
respectively– iui and tgj are based on ltu and lig
The revised model
rgu = f(ltg ×liu’) + ε
cui = F ×iui’ + εc
pgj = F ×tgj’ + εp
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*n
A probabilistic solution on equivalent graphical model
• Several Assumptions– rgu|ltg,liu ~ N(f(ltg×liu’), δI)– cui | iui,F ~ N(F×iui’, δcI)– cgj | tgj,F ~ N(F×tgj’, δpI)
The revised model
rgu = f(ltg ×liu’) + ε
cui = F ×iui’ + εc
pgj = F ×tgj’ + εp
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*n
A probabilistic solution on equivalent graphical model
• Several Assumptions– rgu|ltg,liu ~ N(f(ltg×liu’), δI)– cui | iui,F ~ N(F×iui’, δcI)– cgj | tgj,F ~ N(F×tgj’, δpI)– iui | liu ~ Bernoulli (Multinomial, Exponential) – tgj | ltg ~ Bernoulli (Multinomial, Exponential)
The revised model
rgu = f(ltg ×liu’) + ε
cui = F ×iui’ + εc
pgj = F ×tgj’ + εp
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*n
A probabilistic solution on equivalent graphical model
• Several Assumptions– rgu|ltg,liu ~ N(f(ltg×liu’), δI)– cui | iui,F ~ N(F×iui’, δcI)– cgj | tgj,F ~ N(F×tgj’, δpI)– iui | liu ~ Bernoulli (Multinomial, Exponential) – tgj | ltg ~ Bernoulli (Multinomial, Exponential)
– iui ~ Conjugate prior of iui | liu
– tgj ~ Conjugate prior of tgj | ltg
The revised model
rgu = f(ltg ×liu’) + ε
cui = F ×iui’ + εc
pgj = F ×tgj’ + εp
ȣ ltg ×liu’
Σu|Cu|
Σg|Pg|
m*n
A probabilistic solution on equivalent graphical model
0,1,0,0
Latent interests
1,0,1,0
1,1,0,1
Photo1
Photo2
Photo3
Photo4 1,1,1,0
User u
Good color
Cute animal
Sony Camera Politics
0,1,0,0
1,0,1,0
0,1,0,0
Photo1
Photo2
Photo3
Photo4 0,1,1,0Group g
0.4, 0.2, 0.1, 0.3
0.1, 0.2, 0.7, 0.0
liu
ltg
iu1
iu2
iu3
iu4
tg1
tg2
tg3
tg4
rgu
ȣ 0.16
A probabilistic solution on equivalent graphical model
• Equivalent Graphical Model: Topic Model based Recommendation(TMR)
A probabilistic solution on equivalent graphical model
• Equivalent Graphical Model
cui = F ×iui’ + εc Σu|Cu| pgj = F ×tgj’ + εp Σg|Pg|
rgu = f(ltg ×liu’) + ε
ȣ ltg ×liu’ m*n
Our Proposed Solution
• Modeling Interests via Matrix Factorization:• A probabilistic solution on equivalent
graphical model.• Learning the model & Implementation – Gibbs Sampling based– User recommendation for group
Learning the model & Implementation
• Our task:– Predict rgu for user u and group g
• Our method:– Gibbs Sampling for the model• Sample each iui and tgj in the model• Chose the rgu based on pdf conditioned by iui and tgj
Learning the model & Implementation
• Gibbs sampling in our model– The joint probability of the model
Learning the model & Implementation
• Gibbs sampling in our model– The joint probability of the model
Learning the model & Implementation
• Gibbs sampling in our model– The joint probability of the model
Learning the model & Implementation
• Gibbs sampling in our model– The joint probability of the model
– Sampling based on equations:
Learning the model & Implementation
• Implementation– Data structure and preprocessing• Visual word extraction
– Hierarchical clustering on 100k subset get 1019 centers• Filter out high and low frequent tags
– Tags appear in 90% photos or less than 2 times --- 48733 tags• Build Hash table for User and Photo and Inverted Index
for tags on a 30 group subset• Use DBMS to store the 200 group dataset
Learning the model & Implementation
• Implementation– Sampling:
• 0. randomly select 20% of the rgu matrix as test set, user the rest as training set.
• 1. get a 5000 samples photos subset to perform svd to reduce dimensionality for tags (48733 -> 1000)
• 2. get a 5000 sampled photos subset after svd to perform svd to get the prior \miu in the model (2019->10, latent dimension set to be 10)
• 3. Init Iui for each photo of each user and init tgj for each photo of each group.
• 4. perform sampling in 1000 iterations (currently, 1 iteration cost 22 s)• 5. select the sampling result having the max joint probability• 6. predict rgu based on the result and relational function
Recent Works
• Problem in the Graphical Model– Photo feature is the sum of latent
interest features• Not a good/proper fitting for the feature
Recent Works
• Problem in the Graphical Model– Photo feature is the sum of latent
interest features• Not a good/proper fitting for the feature• Note that, different from LDA:
– LDA is document-word model– TMR is document-feature model– Different fitting schema– TMR is not linking of two LDA
Recent Works
• Problem in the Graphical Model• Revised Model– Weighted TMR– Multiple(l)-interest TMR– Hierarchical LDA
Recent Works
• Problem in the Graphical Model• Revised Model– Weighted TMR• Weighting Parameters
on User/Group Level
Recent Works
• Problem in the Graphical Model• Revised Model– Multiple(l)-interest TMR• Photo Interest formed by multiple basic interests
Recent Works
• Problem in the Graphical Model• Revised Model– Hierarchical LDA• Related Work: Blei NIPS04 hierarchical LDA
Recent Works
• Problem in the Graphical Model– Other problems:• Multiple Sources of Feature
– tags & visual• Currently, not considering User Contact.
– Solution: refer to Blei10 study on link prediction• Implementation Problem:
– Slow speed for sampling– Noise & data structure building
To sum up
• Recommend User to Group– First work in social media sharing websites, using
content, social relations and collaborative information
– Proposed Solution:• Modeling the interests by matrix factorization• A Probabilistic Approach on equivalent Graphical Model.• Gibbs Sampling based Parameter Tuning
– Future Work• Efficient Implementation & Experiment
Thank you!