Upload
suchin
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure. Date : 2012/10/11 Resource : CIKM’11 Advisor : Dr. Jia -Ling Koh Speaker : I- Chih Chiu. Outline. Introduction Description of the approach Tag similarity computation Tag expansion - PowerPoint PPT Presentation
Citation preview
1
Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure
Date : 2012/10/11Resource : CIKM’11Advisor : Dr. Jia-Ling KohSpeaker : I-Chih Chiu
2
Outline Introduction Description of the approach
Tag similarity computation Tag expansion Taming computational complexity
Evaluation Conclusion
3
Introduction Social media application
Videos, pictures, music, blogs etc.
Pre-defined taxonomies Social tagging
Informally defined Continually changing Ungoverned
Find content of interest has become a main challenge
4
Motivation Various classic metrics have been used to
compute tag similarity Cosine similarity, Jaccard coefficient, Pearson
correlation
The underlying folksonomy is already dense This assumption does not hold true Most real life folksonomies exhibit a power law
distribution of tag usage
Using traditional metrics like cosine similarity, would almost always yield close-to-zero values
5
Goal Propose an approach that transparently
induces the creation of a dense folksonomy mutual reinforcement principle
Automatically expand the user-selected tag set Label a new resource Submit a query to retrieve some resources
• Cosine• Latent sematic indexing• SimRank• The novel approach
Tag similarity
computation
• It can automatically expand the tag set chosen by the user.Tag
expansion
6
Outline Introduction Description of the approach
Tag similarity computation Tag expansion Taming computational complexity
Evaluation Conclusion
7
Cosine Similarity Co-occurrence
Roughly 81% of resources were described by no more than 5 different tags (and roughly 58% by less than 3 )
Matrix TR is rather sparse
TR =
𝑠 (𝑢𝑠𝑒𝑟 ,𝑠𝑦𝑠𝑡𝑒𝑚)= 2√3 ∙√6
=√23
𝑠 (𝑡𝑖 ,𝑡 𝑗 )=⟨𝑡𝑟 (𝑖 ) ,𝑡 𝑟 ( 𝑗 ) ⟩
√ ⟨𝑡𝑟 (𝑖 ) , 𝑡𝑟 (𝑖 ) ⟩ ∙√𝑡𝑟 ( 𝑗 ) ,𝑡 𝑟 ( 𝑗 )
𝑠 (𝑡𝑖𝑚𝑒 ,𝐸𝑃𝑆)= 0√2∙√2
=0
(1)
8
Latent Semantic Indexing(1/2)
Singular Value Decomposition(SVD)
𝐴=𝑈 Σ𝑉𝑇 𝐴=𝑈𝑘 Σ𝑘𝑉 𝑘𝑇
9
Latent Semantic Indexing(2/2)
qk is then compared with every document vector in Vk using the cosine similarity.
The computation of LSI on large matrices is very costly
The tuning of parameter k is complex and time-expensive
query q = “user interface”
10
SimRank(1/2) More suitable to the folksonomy domain are techniques
that rely on the mutual reinforcement principle. People are similar if they purchase similar items. Items are similar if they are purchased by similar people.
𝑠 ( 𝐴 ,𝐵 )=𝐶1
¿𝑂( 𝐴)∨¿𝑂 (𝐵)∨¿ ∑𝑖=1
¿𝑂 (𝐴)∨¿ ∑𝑗=1
¿ 𝑂(𝐵)∨¿ 𝑠(𝑂 𝑖(𝐴 ), 𝑂 𝑗 (𝐵))
¿ ¿¿
¿¿𝑖𝑓 𝐴≠𝐵 ,
𝑠 (𝑐 ,𝑑)=𝐶2
¿ 𝐼 (𝑐)∨¿ 𝐼 (𝑑)∨¿ ∑𝑖=1
¿ 𝐼 (𝑐)∨¿ ∑𝑗=1
¿ 𝐼 (𝑑)∨¿𝑠 ( 𝐼𝑖 (𝑐 ) ,𝐼 𝑗 (𝑑))
¿¿ ¿
¿¿𝑖𝑓 𝑐≠ 𝑑 ,
𝑠 ( 𝐴 ,𝐵 )= 0.83∗3 ∗ (0.619∗6+1+1+0.437 )=0.547
𝑠 ( 𝑓𝑟𝑜𝑠𝑡𝑖𝑛𝑔 ,𝑒𝑔𝑔𝑠 )= 0.82∗2∗ (1+1+0.547∗2 )=0.619
(2)
(3)
11
SimRank(2/2) Iteration
Don’t consider the number of times a tag intervenes in labeling a resource Don’t distinguish between tags that have labeled exactly the same
resource
𝑅𝑘+1 (𝑎 ,𝑏 )=𝐶
¿ 𝐼 (𝑎)∨¿ 𝐼 (𝑏)∨¿ ∑𝑖=1
¿ 𝐼 (𝑎 )∨¿ ∑𝑗=1
¿𝐼 (𝑏)∨¿𝑅 𝑘( 𝐼 𝑖( 𝑎 ), 𝐼 𝑗 (𝑏))
¿ ¿¿
¿¿
𝑅0 (𝑎 ,𝑏 )={0( 𝑖𝑓 𝑎≠𝑏)1( 𝑖𝑓 𝑎=𝑏)
𝑅3 (𝑈𝑛𝑖𝑣 ,𝑃𝑟𝑜𝐵 )= 0.81∗2∑𝑖=1
1
∑𝑗=1
2
𝑅2(𝐼𝑖 (𝑈𝑛𝑖𝑣) , 𝐼 𝑗 (𝑃𝑟𝑜𝐵))
𝑅2 (𝑆𝑡𝑢𝑑𝐴 ,𝑆𝑡𝑢𝑑𝐵 )= 0.81∗1∑𝑖=1
1
∑𝑗=1
1
𝑅1(𝐼 𝑖(𝑆𝑡𝑢𝑑𝐴) , 𝐼 𝑗 (𝑆𝑡𝑢𝑑𝐵 ))𝑅2(𝑆𝑡𝑢𝑑𝐴 ,𝑈𝑛𝑖𝑣)
𝑅1 (𝑃𝑟𝑜𝑓𝐴 ,𝑃𝑟𝑜𝑓𝐵 )= 0.81∗2∑𝑖=1
1
∑𝑗=1
2
𝑅0 (𝐼𝑖 (𝑃𝑟𝑜𝑓𝐴) , 𝐼 𝑗 (𝑃𝑟𝑜𝑓𝐵))
𝑅0 (𝑈𝑛𝑖𝑣 ,𝑈𝑛𝑖𝑣 )=1 𝑅0 (𝑈𝑛𝑖𝑣 ,𝑆𝑡𝑢𝑑𝐵 )=0
(4)𝑅3 (𝑈𝑛𝑖𝑣 ,𝑃𝑟𝑜𝐵 )=0.128
12
A Novel Similarity Metric(1/2)
Mutual reinforcement factor To give more relevance to tags that labeled the very same
resources, with respect to those that labeled related (but not the very same) resources.
is equal to 1 if , while it is equal to if .
(5)
(6)
(7)
(8)
(9)
Cosine similarity
13
A Novel Similarity Metric(2/2)
TR =
𝑠𝑡0 (h𝑢𝑚𝑎𝑛 , 𝑖𝑛𝑡𝑒𝑟𝑓𝑎𝑐𝑒 )= 1√2√2
=12
How to compute the similarity of
𝑠𝑡1 (h𝑢𝑚𝑎𝑛 , 𝑖𝑛𝑡𝑒𝑟𝑓𝑎𝑐𝑒 )=𝑆𝑇 1(h , 𝑖)
√𝑆𝑇 1(h ,h)∙√𝑆𝑇 1(𝑖 , 𝑖)
=1.438
𝑆𝑇 𝑘 (𝑡𝑎 , 𝑡𝑏)= ∑𝑖 , 𝑗=1
𝑛𝑟
𝑇 𝑅𝑎𝑖 ∙Ψ 𝑖𝑗 ∙𝑠𝑟𝑘−1(𝑟 𝑖 ,𝑟 𝑗) ∙𝑇 𝑅𝑏𝑗
𝑆𝑇 1 (h ,h )=1+0.6∗ 1√18
+0.6∗ 1√18
+1=2.288
𝑆𝑇 1 (𝑖 , 𝑖 )=1+0.6∗ 1√12
+0.6∗ 1√12
+1=2.348
𝑠𝑡1 (h𝑢𝑚𝑎𝑛 , 𝑖𝑛𝑡𝑒𝑟𝑓𝑎𝑐𝑒 )= 1.438√2.288 ∙√2.348
=1.4382.318=0.62
14
Tag expansion(1/2) Key to their approach is the use of the previously
computed tag similarities to automatically expand the tag set chosen by the user.
𝑆𝐶 (𝑡 𝑖 ,𝑡𝑆𝑒𝑡 )= ∑𝑡 𝑗∈𝑡𝑆𝑒𝑡
𝑠𝑐 (𝑡𝑖 , 𝑡 𝑗)
is the set of user-selected tags is a tag in and a tag not in
𝑠𝑐 (𝑡 𝑖 ,𝑡 𝑗 )=𝑠𝑡 (𝑡𝑖 , 𝑡 𝑗)∙ log𝑐𝑜𝑢𝑛𝑡 (𝑡𝑖) ∙ 𝐼𝑅𝐹 (𝑡𝑖) : the previously computed similarity : the number of times appears in the folksonomy : the inverse resource frequency of
Largely used Important
(10)
(11)
15
Tag expansion(2/2)Assume = {tree, sea, sky}{sun, fruit} : not choose by user
𝑆𝐶 (𝑡 𝑖 ,𝑡𝑆𝑒𝑡 )= ∑𝑡 𝑗∈𝑡𝑆𝑒𝑡
𝑠𝑐 (𝑡𝑖 , 𝑡 𝑗)
𝑠𝑐 (𝑡 𝑖 ,𝑡 𝑗 )=𝑠𝑡 (𝑡𝑖 , 𝑡 𝑗)∙ log 𝑐𝑜𝑢𝑛𝑡 (𝑡𝑖) ∙ 𝐼𝑅𝐹 (𝑡𝑖)
Recommend top k highest scoring tags and users can decide which one to use.
16
Computational complexity From a theoretical standpoint, the
computation of each pairwise tag similarity may require an infinite number of iterations.
This could make our similarity measure inapplicable in practical cases, because each iteration would require exactly computations.
17
Outline Introduction Description of the approach
Tag similarity computation Tag expansion Taming computational complexity
Evaluation Conclusion
18
Evaluation Is our approach able to increase
the accuracy of searches?
Does our approach scale to large folksonomies?
19
Datasets Bibsonomy & CiteULike
Bibsonomy CiteULike
Bookmarks 648,924 2,281,609
User 4,696 57,053
Papers 578,587 1,928,302
Distinct tags 147,076 401,620
20
Accuracy of User Searches The first experiment aimed at determining the ability of
the approach to retrieve resources of relevance to the user querying the folksonomy.
Tag expansion can yield better results
Figure 1: Retrieved Ratio on Bibsonomy and CiteULike
21
Scalability As previously pointed out, the highest cost caused by
the approach lies in the computation of pairwise tag similarities.
This result confirms that their similarity measure is scalable and well suited to be applied even when operating in large folksonomies.
(12)
22
Outline Introduction Description of the approach
Tag similarity computation Tag expansion Taming computational complexity
Evaluation Conclusion
23
Conclusion Have proposed an approach that enables the
effective retrieval of resources within folksonomies.
This metric is used both when users label resources and when users query the folksonomy.
Finally, the computational cost of our iterative approach is limited, as convergence is guaranteed, and in practice reached after a handful of iterations.
24
Thanks for listening