22
Christian Körner 1 , Dominik Benz 2 , Andreas Hotho 3 , Markus Strohmaier 1 , Gerd Stumme 2 Stop thinking, start tagging: Tag Semantics arise from Collaborative Verbosity 1 Knowledge Management Institute and Know Center, Graz University of Technology, Austria 2 Knowledge and Data Engineering Group (KDE), University of Kassel, Germany 3 Data Mining and Information Retrieval Group University of Würzburg, Germany

Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

Embed Size (px)

DESCRIPTION

Recent research provides evidence for the presence of emergent semantics in collaborative tagging systems. While several methods have been proposed, little is known about the factors that influence the evolution of semantic structures in these systems. A natural hypothesis is that the quality of the emergent semantics depends on the pragmatics of tagging: Users with certain usage patterns might contribute more to the resulting semantics than others. In this work, we propose several measures which enable a pragmatic differentiation of taggers by their degree of contribution to emerging semantic structures. We distinguish between categorizers, who typically use a small set of tags as a replacement for hierarchical classification schemes, and describers, who are annotating resources with a wealth of freely associated, descriptive keywords. To study our hypothesis, we apply semantic similarity measures to 64 different partitions of real-world and large-scale folksonomy containing different ratios of categorizers and describers. Our results not only show that ‘verbose’ taggers are most useful for the emergence of tag semantics, but also that a subset containing only 40% of the most ‘verbose’ taggers can produce results that match and even outperform the semantic precision obtained from the whole dataset. Moreover, the results suggest that there exists a causal link between the pragmatics of tagging and resulting emergent semantics. This work is relevant for designers and analysts of tagging systems interested (i) in fostering the semantic development of their platforms, (ii) in identifying users introducing “semantic noise”, and (iii) in learning ontologies.

Citation preview

Page 1: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

Christian Körner1, Dominik Benz2, Andreas Hotho3,

Markus Strohmaier1, Gerd Stumme2

Stop thinking, start tagging:Tag Semantics arise from

Collaborative Verbosity

1Knowledge Management Institute

and Know Center,Graz University of

Technology, Austria

2Knowledge and Data Engineering Group (KDE), University of

Kassel, Germany

3Data Mining and Information Retrieval

GroupUniversity of Würzburg,

Germany

Page 2: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 2 / 20

Where do Semantics come from?

Semantically annotated content is the „fuel“ of the next generation World Wide Web – but where is the petrol station?

Expert-built expensive

Evidence for emergent semantics in Web2.0 data Built by the crowd!

Which factors influence emergence of semantics?

Do certain users contribute more than others?

Page 3: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 3 / 20

The Story

Emergent Tag Semantics

Pragmatics of tagging

Semantic Implications of Tagging Pragmatics

Conclusions

Page 4: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 4 / 20

Emergent Tag Semantics

tagging is a simple and intuitive way to organize all kinds of resources

uncontrolled vocabulary, tags are „just strings“

formal model: folksonomy F = (U, T, R, Y) Users U, Tags T, Resources R Tag assignments Y (UTR)

evidence of emergent semantics Tag similarity measures can

identify e.g. synonym tags (web2.0, web_two)

Page 5: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 5 / 20

Tag Similarity Measures: Tag Context Similarity

Tag Context Similarity is a scalable and precise tag similarity measure [Cattuto2008,Markines2009]: Describe each tag as a context vector Each dimension of the vector space correspond to

another tag; entry denotes co-occurrence count Compute similar tags by cosine similarity

5 30 1 10 50design

software blog web programming

…JAVA

Will be used as indicator of emergent semantics!

Page 6: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 6 / 20

= tag

Assessing the Quality of Tag Semantics

JCN(t,tsim) = 3.68TagCont(t,tsim) = 0.74

Folksonomy Tags= synset

WordNet Hierarchy

Mapping

Average JCN(t,tsim) over all tags t: „Quality of semantics“

Page 7: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 7 / 20

The Story

Pragmatics of tagging

Semantic Implications of Tagging Pragmatics

Conclusions

Tag Similarity measures can capture emergent tag semantics

Page 8: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 8 / 20

Tagging motivation

Evidence of different ways HOW users tag (Tagging Pragmatics)

Broad distinction by tagging motivation [Strohmaier2009]:

donuts

duff

margebeer

bart

barty

Duff-beer

bev

alc nalc

beer wine

„Categorizers“…

- use a small controlled tag vocabulary

- goal: „ontology-like“ categorization by tags, for later browsing

- tags a replacement for folders

„Describers“…

- tag „verbously“ with freely chosen words

- vocabulary not necessarily consistent (synomyms, spelling variants, …)

- goal: describe content, ease retrieval

Page 9: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 9 / 20

Tagging Pragmatics: Measures

How to disinguish between two types of taggers? Intuition: Describers use open set of many tags,

Categorizers use small set of controlled tags:

Vocabulary size:

Tag / Resource ratio:

Average # tags per post:

high

low

Page 10: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 10 / 20

Tagging Pragmatics: Measures

Next Intuition: Describers don‘t care about „abandoned“ tags, Categorizers do

Orphan ratio:

R(t): set of resources tagged by user u with tag t

high

low

Page 11: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 11 / 20

Tagging pragmatics: Limitations of measures

Real users: no „perfect“ Categorizers / Describers, but „mixed“ behaviour

Possibly influenced by user interfaces / recommenders

Measures are correlated

But: independent of semantics; measures capture usage patterns

Page 12: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 12 / 20

The Story

Semantic Implications of Tagging Pragmatics

Conclusions

Tag Similarity measures can capture emergent tag semantics

Measures of tagging pragmatics differentiate users by tagging motivation

Page 13: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 13 / 20

Influence of Tagging Pragmatics on Emergent Semantics

Idea: Can we learn the same (or even better) semantics from the folksonomy induced by a subset of describers / categorizers?

Extreme Categorizers

Extreme Describers

Complete folksonomy

Subset of 30% categorizers

= user

Page 14: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 14 / 20

Experimental setup

1. Apply pragmatic measures vocab, trr, tpp, orphan to each user2. Systematically create „sub-folksonomies“ CFi / DFi by subsequently

adding i % of Categorizers / Describers (i = 1,2,…,25,30,…,100)

3. Compute similar tags based on each subset (TagContext Sim.)4. Assess (semantic) quality of similar tags by avg. JCN distance

TagCont(t,tsim)= …

JCN(t,tsim)= …

DF20CF5

Page 15: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 15 / 20

Dataset

From Social Bookmarking Site Delicious in 2006 ORIGINAL Two filtering steps (to make measures more meaningful):

Restrict to top 10.000 tags FULL Keep only users with > 100 resources MIN100RES

dataset |T| |U| |R| |Y|

ORIGINAL 2,454,546 667,128 18,782,132

140,333,714

FULL 10,000 511,348 14,567,465

117,319,016

MIN100RES

9,944 100,363 12,125,176

96,298,409

Page 16: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 16 / 20

Results – adding Describers (DFi)

Almost all sub-folksonomies are better than random-picked ones

40% of describers according to trr outperform complete data!

Optimal performance for 70% describers (trr)

more describers

bett

er

sem

an

tics

Page 17: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 17 / 20

Results – adding Categorizers (CFi)

Almost all sub-folksonomies are worse than random-picked ones

Global optimum for 90% categorizers (tpp) removing 10% most extreme describers!(Spammers?)

bett

er

sem

an

tics

more categorizers

Page 18: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 18 / 20

The Story

Tag Similarity measures can capture emergent tag semantics

Measures of tagging pragmatics differentiate users by tagging motivation

Sub-folksonomies introduced by measures of pragmatics show different semantic qualities

Conclusions

Page 19: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 19 / 20

Summary & Conclusions

Introduction of measures of users‘ tagging motivation (Categorizers vs. Describers)

Evidence for causal link between tagging pragmatics (HOW people use tags) and tag semantics (WHAT tags mean)

„Mass matters“ for „wisdom of the crowd“, but composition of crowd makes a difference („Verbosity“ of describers in general better, but with a limitation)

Relevant for tag recommendation and ontology learning algorithms

Page 20: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 20 / 20

Guess who‘s a Categorizer from the authors

Page 21: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 21 / 20

Thanks for the attention! Questions? Be verbous

Tag Similarity measures can capture emergent tag semantics

Measures of tagging pragmatics differentiate users by tagging motivation

Sub-folksonomies introduced by measures of pragmatics show different semantic qualities

Evidende of causal link between pragmatics and semantics of tagging!

[email protected]@cs.uni-kassel.de

Page 22: Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity

30.04.2010Körner, Benz et al.: Tag Semantics arise from Collaborative Verbosity @ WWW2010 22 / 20

References

[Cattuto2008] Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme: Semantic Grounding of Tag Relatedness in Social Bookmarking Systems. In: Proc. 7th Intl. Semantic Web Conference (2008), p. 615-631

[Markines2009] Benjamin Markines, Ciro Cattuto, Filippo Menczer, Dominik Benz, Andreas Hotho, Gerd Stumme: Evaluating Similarity Measures for Emergent Semantics of Social Tagging. In: Proc. 18th Intl. World Wide Web Conference (2009), p.641-641

[Strohmaier2009] Markus Strohmaier, Christian Körner, Roman Kern: Why do users tag? Detecting users‘ motivation for tagging in social tagging systems. Technical Report, Knowledge Management Institute – Graz University of Technology (2009)