17
Understanding Why Users Tag: A Survey of Tagging Motivation Literature and Results from an Empirical Study Markus Strohmaier [email protected] Graz University of Technology Inffeldgasse 21 8010 Graz Austria Christian K¨ orner [email protected] Graz University of Technology Inffeldgasse 21 8010 Graz Austria Roman Kern [email protected] Know Center Inffeldgasse 21 8010 Graz Austria Abstract While recent progress has been achieved in understanding the structure and dynamics of social tagging systems, we know little about the underlying user motivations for tagging, and how they influence resulting folksonomies and tags. This paper addresses three issues related to this question: 1.) What distinctions of user motivations are identified by previous research, and in what ways is user motivation amenable to quantitative analysis? 2.) To what extent does tagging motivation vary across different social tagging systems? and 3.) How does variability in user motivation influence resulting tags and folksonomies? In this paper, we present measures to detect whether a tagger is primarily motivated by categorizing or describing resources, and apply these measures to datasets from seven different tagging systems. Our results show that a) users’ motivation for tagging varies not only across, but also within tagging systems, and that b) tag agreement among users who are motivated by categorizing resources is significantly lower than among users who are motivated by describing resources. Our findings are relevant for 1) the development of tag-based user interfaces 2) the analysis of tag semantics and 3) the design of search algorithms for social tagging systems. Keywords: social tagging systems, tagging motivation, user motivation, user goals 1. Introduction Tagging typically describes the voluntary activity of users who are annotating resources with terms - so called “tags” - freely chosen from an unbounded and uncontrolled vocabulary (cf.[1, 2]). Social tag- ging systems have been the subject of research for several years now. An influential paper by Mika [3] describes, for example, how social tagging systems can be used for ontology learning. Other work has studied the evolution of folksonomies over time, such as Chi and Mytkowicz [4]. Further studies focused on using tags for a variety of purposes, in- cluding the use of tags for web search [5, 6], ranking [7], taxonomy development [8], and other tasks. However, our understanding of folksonomies is Preprint submitted to Journal of Web Semantics October 19, 2012

Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

Understanding Why Users Tag: A Survey of Tagging Motivation Literatureand Results from an Empirical Study

Markus Strohmaier

[email protected]

Graz University of Technology

Inffeldgasse 21 8010 Graz Austria

Christian Korner

[email protected]

Graz University of Technology

Inffeldgasse 21 8010 Graz Austria

Roman Kern

[email protected]

Know Center

Inffeldgasse 21 8010 Graz Austria

Abstract

While recent progress has been achieved in understanding the structure and dynamics of social taggingsystems, we know little about the underlying user motivations for tagging, and how they influence resultingfolksonomies and tags. This paper addresses three issues related to this question: 1.) What distinctionsof user motivations are identified by previous research, and in what ways is user motivation amenable toquantitative analysis? 2.) To what extent does tagging motivation vary across different social taggingsystems? and 3.) How does variability in user motivation influence resulting tags and folksonomies? In thispaper, we present measures to detect whether a tagger is primarily motivated by categorizing or describingresources, and apply these measures to datasets from seven different tagging systems. Our results show thata) users’ motivation for tagging varies not only across, but also within tagging systems, and that b) tagagreement among users who are motivated by categorizing resources is significantly lower than among userswho are motivated by describing resources. Our findings are relevant for 1) the development of tag-baseduser interfaces 2) the analysis of tag semantics and 3) the design of search algorithms for social taggingsystems.

Keywords: social tagging systems, tagging motivation, user motivation, user goals

1. Introduction

Tagging typically describes the voluntary activityof users who are annotating resources with terms -so called “tags” - freely chosen from an unboundedand uncontrolled vocabulary (cf.[1, 2]). Social tag-ging systems have been the subject of research forseveral years now. An influential paper by Mika [3]

describes, for example, how social tagging systemscan be used for ontology learning. Other work hasstudied the evolution of folksonomies over time,such as Chi and Mytkowicz [4]. Further studiesfocused on using tags for a variety of purposes, in-cluding the use of tags for web search [5, 6], ranking[7], taxonomy development [8], and other tasks.

However, our understanding of folksonomies is

Preprint submitted to Journal of Web Semantics October 19, 2012

Page 2: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

not yet (nor could it be) mature. A question thathas recently attracted the interest of the semanticweb community is whether the properties of tagsin tagging systems and their usefulness for differ-ent purposes can be assumed to be a function ofthe taggers’ motivation or intention behind tagging[9]. If this was the case, users’ motivation for tag-ging would have broad implications for the designof useful algorithms such as adapting tag recom-mendation techniques and user interfacesfor socialtagging systems. In order to assess the general use-fulness of algorithms that aim to, for example, cap-ture knowledge from folksonomies, we would needto know whether these algorithms produce similarresults across different user populations driven bydifferent motivations for tagging. Recent researchalready suggests that different tagging systems af-ford different motivations for tagging [9, 10]. Fur-ther work presents anecdotal evidence that evenwithin the same tagging system, motivation for tag-ging between individual users may vary greatly [11].Given these observations, it is interesting to studywhether and how the analysis of user motivation fortagging is amenable to quantitative investigations,and whether folksonomies and the tag semanticsthey contain are influenced by different tagging mo-tivations.

Tagging motivation has remained largely elusiveuntil the first studies on this subject have beenconducted in 2006. At this time, the work byGolder and Huberman [1] and Marlow and Naa-man [2] have made advances towards expanding ourtheoretical understanding of tagging motivation byidentifying and classifying user motivation in tag-ging systems. Their work was followed by studiesproposing generalizations, refinements and exten-sions to previous classifications [9]. An influentialobservation was made by Coates [12] and elabo-rated on and interpreted in [2] and [9]. This lineof work suggests that a distinction between at leasttwo fundamental types of user motivation for tag-ging is important:

Categorization vs. Description: On onehand, users who are motivated by categorizationview tagging as a means to categorize resources ac-cording to some shared high-level characteristics.These users tag because they want to constructand maintain a navigational aid to the resources forlater browsing. On the other hand, users who aremotivated by description view tagging as a means toaccurately and precisely describe resources. Theseusers tag because they want to produce annotations

that are useful for later searching. This distinctionhas been found to be important because, for ex-ample, tags assigned by describers might be moreuseful for information retrieval (because these tagsfocus on the content of resources) as opposed totags assigned by categorizers, which might be moreuseful to capture a rich variety of possible interpre-tations of a resource (because they focus on user-specific views on resources).

In this paper, we are adopting the distinction be-tween categorizers and describers to study the fol-lowing three research questions: 1) How can wemeasure the motivation behind tagging? 2) Towhat extent does tagging motivation vary acrossdifferent social tagging systems? and 3) Howdoes tagging motivation influence resulting folk-sonomies?

The overall contribution of this paper is three-fold: first, our work presents measures capable ofdetecting the tacit nature of tagging motivation insocial tagging systems. Secondly, we apply and val-idate these measures in the context of seven dif-ferent tagging systems, showing that tagging mo-tivation significantly varies across and even withintagging systems. Third, we provide first empiricalevidence that at least one property of tags (“tagagreement”) is influenced by users’ motivation fortagging. By analyzing complete tagging histories ofalmost 4000 users from seven different tagging sys-tems, to the best of our knowledge this work rep-resents the largest quantitative cross-domain studyon tagging motivation to date so far.

This article is organized as follows: first we re-view the state of the art of research on tagging mo-tivation. Then, we introduce and discuss a num-ber of ways to make tagging motivation in socialtagging systems amenable to quantitative analysis.This is followed by a presentation of results from anempirical study involving nine datasets from sevendifferent social tagging systems. Subsequently, weevaluate our measures by a combination of differentstrategies. Finally, we present evidence for a rela-tion between the motivation behind tagging and thelevel of agreement on tags among users of taggingsystems.

2. Related Work

Table 1 gives a chronological overview of past re-search focusing on user motivation in tagging sys-tems to give an answer to what research was donein the past with regard to tagging motivation.

2

Page 3: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

Table 1: Overview of Research on Users’ Motivation for Tagging in Social Tagging SystemsAuthors Categories of Tag-

ging MotivationDetection Evidence Reasoning Systems

investigated# of users Resources

per userCoates2005

Categorization, de-scription

Expertjudgment

Anecdotal Inductive Weblog 1 N/A

Hammondet al. 2005

Self/self,self/others, oth-ers/self, oth-ers/others

Expertjudgment

Observation Inductive 9 differ-ent taggingsystems

N/A N/A

Golder etal. 2006

What it is about,what it is, whoowns it, refiningcategories, iden-tifying qualities,self reference, taskorganizing

Expertjudgment

Dataset Inductive Delicious 229 300 (aver-age)

Marlow etal. 2006

Organizational,social, [and refine-ments]

Expertjudgment

N/A Deductive Flickr 10(25,000)

100 (mini-mum)

Xu et al.2006

Content-based,context-based,attribute-based,subjective, organi-zational

Expertjudgment

N/A Deductive N/A N/A N/A

Sen et al.2006

Self-expression,organizing, learning,finding, decisionsupport

Expertjudgment

Priorexperience

Deductive MovieLens 635(3,366)

N/A

Wash et al.2007

Later retrieval, shar-ing, social recogni-tion, [and others]

Expertjudgment

Interviews(semistruct.)

Inductive Delicious 12 950 (aver-age)

Ames et al.2007

Self/organization,self/communication,social/organization,so-cial/communication

Expertjudgment

Interviews(in-depth)

Inductive Flickr, Zone-Tag

13 N/A

Heckner etal. 2009

Personal informa-tion management,resource sharing

Expertjudgment

Survey(M. Turk)

Deductive Flickr,Youtube,Delicious,Connotea

142 20 and5 (mini-mum)

Nov et al.2009

enjoyment, commit-ment, self develop-ment, reputation

Expertjudgment

Survey (e-mail)

Deductive Flickr (PROusers only)

422 2,848.5(average)

The best known and most influential paperon surveying social tagging systems was done byGolder and Huberman [1] in which an early analysisof folksonomies is conducted. The authors investi-gate the structure of collaborative tagging systemsand found regularities in user activity, tag frequen-cies, the used tags and other aspects of two snap-shots taken from the Delicious system. This workwas the first to explore user behavior in these sys-tems and showed that users in Delicious exhibit avariety of different behaviors. Hammond et al.[10]were the first to conduct early analyses of folkso-nomic systems. The authors examined nine dif-ferent social bookmarking platforms such as Ci-teULike, Connotea1, Delicious, Flickr and others.In this work two dimensions of tagging methodol-ogy are differentiated: tag creation and tag usage.Coates [12] hypothesizes two distinct tagging ap-

1http://www.connotea.org

proaches. The first approach treats tags as a re-placement for folders. This way tags describe thecategory where the annotated resource belongs to.The other approach simply uses tags on resourceswhich make sense to the user and characterize theresource in a detailed manner.

Marlow et al. [2] show two high level types ofcategorization for motivation: organizational andsocial practices. The authors further elaborate alist of incentives by which users can be motivated.Examples are future retrieval where tags are usedto make the resource easier to find by the annotatorhimself and contribution and sharing in which key-words are facilitated in order to create clusters ofresources to make them retrievable by other users.

Xu et al. [13] created a taxonomy of tags for thecreation of a tag recommendation algorithm. Thesefive categories are: Content-based tags which giveinsight into the content or the categories of an an-notated object (e.g. names, brands etc.) Context-

3

Page 4: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

based tags show the context under which the re-source is stored (examples are: location, time, etc.)Attribute tags tell about properties of a resource.Subjective tags explain the user’s opinion of a givenresource. Organizational tags allow a user to orga-nize her library.

[14] introduce a user-centered model for vocab-ulary evolution to answer the questions of howstrongly a user’s investment and habit as well asthe community of a system influence tagging be-havior. The authors combine the seven tag classesby [1] into three classes of their own: Factual Tagsdescribe properties (like place, time, etc.) of themovie. Subjective Tags expressing users’ opinionsabout the annotated movie. Personal Tags repre-sent tags which are used to organize a user’s library.

Further work by [11] focuses on the incentivesusers have when they utilize a social computingsystems. By analyzing interviews conducted withtwelve users from the Delicious system the authorsthe primary answers were the reuse of tags thatwere previously assigned to resources as well as theusage of terms a user expects to search on.

Heckner et al. [15] perform a comparative studyon four different tagging systems (YouTube, Con-notea, Delicious and Flickr) and observe differencesin tagging behavior for different digital resources.General trends the authors identify were amongstothers: photos are tagged for content and location.Videos are tagged for persons and scientific articlesare tagged for time and task.

Several interesting observations can be made:first, Table 1 shows that a vast variety of cate-gories for tagging motivation has been proposed inthe past. While these categories give us a broadand multifaceted picture of user motivations in tag-ging systems, the total set of categories suffers froma number of potential problems, including overlapand incompleteness of categories, as well as differ-ences in terms of coverage and scope. A commonunderstanding, or even a stable set of approacheshas yet to emerge. This suggests that researchin this direction is still in an early stage. Sec-ondly, we can observe that while earlier work fo-cused on anecdotal evidence [12] and theory de-velopment [1], empirical validation has become in-creasingly important. This is evident in the re-cent focus on interview- and survey-based research([2, 9, 11, 16, 17]). At the same time the numberof users under investigation increased from a sin-gle user (in 2005) up to several hundred in laterwork (2009). Third, to the best of our knowledge,

no quantitative approaches for measuring taggingmotivation exist; all previous research requires ex-pert judgment to assess the appropriate category oftagging motivation based on observations, datasets,interviews or surveys. In a departure from past re-search, this paper sets out to explore quantitativemeasures for tagging motivation. Additional papersfocused on other aspects of tagging, such as thedevelopment of tag classification taxonomies [18]or experimental studies on latent influences duringtagging [19, 20].

3. Tagging Motivation

Our study was driven by the desire to identifycategories of tagging motivation that lend them-selves to quantification. As we have discussed inthe previous Section 2, most existing categories oftagging motivation are rather abstract and highlevel, and do not lend themselves naturally to quan-tification. For example, it is not obvious or self-evident how an automatic distinction between mo-tivations such as self-expression [21], enjoyment [16]or social-recognition [11] can be made. For thesereasons, this paper focuses on a particularly promis-ing distinction between categorizers and describers,inspired by Coates [12] and Heckner [9] and fur-ther refined and discussed in [22]. Several intu-itions about this distinction (such as the kinds oftagging styles different types of users would adopt)make it a promising candidate for future investiga-tions. Figure 1 illustrates this distinction with tagclouds [23] produced by a typical a categorizer vs.a typical describer.

While the example on the left illustratesthat some categorizers even use specialcharacters to build a pseudo-taxonomy oftags (e.g. fashion blog, fashion brand,fashion brand bags), what we see here is anextreme example. What separates categorizersfrom describers is the fact that they use tags ascategories (often from a controlled vocabulary),as opposed to descriptive labels for the resourcethey are tagging. Details about this distinction areintroduced next.

3.1. Using Tags to Categorize Resources

Users who are motivated by categorization en-gage in tagging because they want to constructand maintain a personal navigational aid to theresources (URLs, photos, etc) being tagged. This

4

Page 5: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

a. Typical Categorizer b. Typical Describer

Figure 1: Examples of Tag Clouds Produced by Different Users: Categorizer (left) vs. Describer (right)

implies developing a limited set of tags (or cate-gories) that is rather stable over time. The tagsthat are assigned to resources are regarded as an in-vestment into a structure, and changing this struc-ture is typically regarded to be costly to a cate-gorizer (cf. [21]). Resources are assigned to tagswhenever they share some common characteristicimportant to the mental model of the user (e.g.‘photos’, ‘trip to Vienna’ or ‘favorites’). Be-cause the tags assigned by categorizers are veryclose to the mental models of users, they can actas suitable facilitators for navigation and browsing.

3.2. Using Tags to Describe Resources

Users who are motivated by description engagein tagging because they want to appropriately andrelevantly describe the resources being tagged. Thistypically implies an open set of tags, with a ratherdynamic and unlimited tag vocabulary. Tags arenot viewed as an investment into a tag structure,and changing the structure continuously is not re-garded as costly. The goal of tagging is to identifythose tags that match the resource best. Becausethe tags assigned are very close to the content of theresources, they can be utilized for searching and re-trieval.

Table 2: Differences Between Categorizers and DescribersCategorizer (C) Describer (D)

Goal later browsing later retrievalChange of vocab. costly cheapSize of vocab. limited openTags subjective objectiveTag reuse frequent rareTag purpose mimick taxonomy descriptive labels

Table 2 illustrates key differences between thetwo identified types of tagging motivation. Whilethese two categories make an ideal distinction, tag-ging in the real world is likely to be motivated by acombination of both. A user might maintain a fewcategories while pursuing a description approach for

the majority of resources and vice versa, or addi-tional categories might be introduced over time. Inaddition, the distinction between categorizers anddescribers is not about the semantics of tagging, itis a distinction based on the motivation for tagging.One implication of that is that it would be perfectlyplausible for the same tag (for example ‘java’) tobe used by both describers and categorizers, andserve both functions at the same time. In otherwords, the same tag might be used as a category ora descriptive label.

So the function of each tag is determined by prag-matics (how tags are used) rather than semantics(what they mean). In that sense, tagging motiva-tion presents a new and additional perspective onstudies of folksonomies. Taking this new perspec-tive allows for studying whether tagging motivationhas an impact on tag semantics. For example, it isreasonable to hypothesize that the tags produced bydescribers are more descriptive than tags producedby categorizers. If this was the case, consideringtagging motivation in studies focused on utilizingtags for information retrieval or ontology learningwould benefit from knowledge about the users’ mo-tivation for tagging.

4. Methodology and Setup

In social tagging systems, the structure of datacaptured by tagging can be characterized by a tri-partite graph with hyper edges. The three disjoint,finite sets of such a graph correspond to 1) a set ofusers u ∈ U 2) a set of objects or resources r ∈ Rand 3) a set of annotations or tags t ∈ T that areused by users U to annotate resources R.

A folksonomy is thus a tuple F := (U, T,R, Y )where Y is a ternary relation between those threesets, i.e. Y ⊆ U×T×R (cf. [3, 7, 24]). A personomyFu of a user u ∈ U is the reduction of F to u [25].

5

Page 6: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

To study whether and how we can measure tag-ging motivation, and to what extent tagging mo-tivation varies across different social tagging sys-tems, we develop a number of measures and applythem to a range of personomies that exhibit differ-ent characteristics. We assume that understandinghow an extreme describer would behave over timein comparison to an extreme categorizer helps inassessing the usefulness of measures to characterizeuser motivation in tagging systems.

In this work, we adopt a combination of the fol-lowing strategies to explore the usefulness of theintroduced measures: first we apply all measuresto both synthetic and real-world tagging datasets.After that we analyze the ability of measures tocapture predicted (synthetic) behavior. Finally, werelate our findings to results reported by previouswork.

Assuming that the different motivations for tag-ging produce different personomies (different tag-ging behavior over time), we can use synthetic datafrom extreme categorizers and describers to find up-per and lower bounds for the behavior that can beexpected in real-world tagging systems.

4.1. Synthetic Personomy Data

To simulate behavior of users who are mainlydriven by description, data from the ESP gamedataset2 was used. This dataset contains a largenumber of inter-subjectively validated, descriptivetags for pictures useful to capture describer behav-ior.

To contrast this data with behavior of users whoare mainly driven by categorization, we crawleddata from Flickr, but instead of using the tags weused information from users’ photo sets. We con-sider each photo set to represent a tag assignedby a categorizer for all the photos that are con-tained within this set. The personomy then con-sists of all photos and the corresponding photo setsthey are assigned to. We use these two syntheticdatasets to simulate behavior of ‘artificial’ taggerswho are mainly motivated by description and cate-gorization.

4.2. Real-World Personomy Data

In addition to the synthetic datasets, we alsocrawled popular tagging systems. The datasetsneeded to be sufficiently large in order to enable us

2http://www.cs.cmu.edu/~biglou/resources/

Table 3: Overview and Statistics of Social Tagging Datasets.Asterisks (*) indicate synthetic personomies of extreme cat-egorization/description behavior.

Dataset |U | |T | |R| |Ru|min |T |/|R|ESPGame*

290 29,834 99,942 1,000 0.2985

FlickrSets*

1,419 49,298 1,966,269 500 0.0250

Delicious 896 184,746 1,089,653 1,000 0.1695FlickrTags

456 216,936 965,419 1,000 0.2247

BibsonomyBook-marks

84 29,176 93,309 500 0.3127

BibsonomyPub-lica-tions

26 11006 23,696 500 0.4645

CiteULike 581 148,396 545,535 500 0.2720DiigoTags

135 68,428 161,475 500 0.4238

Movielens 99 9,983 7,078 500 1.4104

to observe tagging motivation across a large num-ber of users and they needed to be complete be-cause we wanted to study a user’s complete tagginghistory over time - from the users first bookmarkup to the most recent bookmarks. Because manyof the tagging datasets available for research focuson sampling data on an aggregate level rather thancapturing complete personomies, we had to acquireour own datasets.

Table 3 gives an overview of the datasets acquiredfor this research. To the best of our knowledge, theaggregate collection of these datasets itself repre-sents the most diverse and largest dataset of com-plete and very large personomies (>500 tagged re-sources, recorded from the user’s first bookmark on)to date. The restriction a minimum of 500 taggedresources was set to capture users who have activelyused the system. Because assessing tagging motiva-tion for users who are active or less active in the sys-tem is a harder problem, we defer it to future work.However in the Section 7 we will see that measur-ing below 500 tagged resources might be feasible aswell.

Acquiring this kind of data in itself representsa challenge. To give an example, from about 3.600users on Bibsonomy, only ˜100 users had more than500 annotated resources according to a databasedump made available at the end of 2008. The num-ber of large and complete personomies acquired forthis work totaled 3,986 users. The data was storedin an XML-based data schema to capture person-omies from different systems in a uniform manner.This allows easy and consistent processing of sta-tistical calculations. In the following, we briefly de-

6

Page 7: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

scribe the data acquisition strategies for selecteddatasets. The other datasets were acquired in asimilar fashion.

• ESP Game: Because temporal order can beassumed to be of little relevance for describers,a synthetic list of users has been created byrandomly selecting resources from the ESPgame dataset in order to simulate personomydata. The same thresholds as for the Flickrand Bibsonomy datasets were used. A numberof virtual user personomies were generated byrandomly choosing sets of resources and cor-responding descriptive tags from the availablecorpus of real-world image labeling data. Analmost arbitrary large number of personomiescan be generated with this approach.

• Flickr: Using Flickr’s web page that showsrecently uploaded photos, a list consisting ofseveral thousand user ids has been gathered.Because on Flickr there is a photo limit of 200for regular users (i.e. users without a premiumaccount) only public accounts were consideredduring crawling. For all users in our list, weretrieved all stored photos that are publicly ac-cessible. The lower bound of tagged resourcesin our dataset is 500. The upper bound oftagged resources was 4000. For the seconddataset that focuses on Flickr sets data, thesame thresholds have been applied. For usersthat satisfied our requirements, the completetagged photo stream and / or the photos be-longing to photo sets have been retrieved usingthe Flickr Rest API.

• Delicious: Starting from a set of popular tags,a list consisting of several thousand usernamesthat have bookmarked any URLs with thosetags has been recursively gathered. Becausethere is a limit for public access of the storedbookmarks to the most recent 4,000 entries of auser, only users having fewer bookmarks wereused for our analysis. The minimum numberof resources for each user was 1,000. We ac-quired the complete personomy data for userssatisfying these criteria.

• Bibsonomy: We used a database dump ofabout 3,600 Bibsonomy users (from the endof 2008) that was provided by the tagging re-search group at University of Kassel. The lowerand upper bound for both datasets (bookmarks

and publications) are the same as for Flickr.The bookmark and publication datasets havebeen extracted from the database dump andwere directly encoded into the XML-based per-sonomy schema.

5. Measuring Tagging Motivation

In the following, we develop and apply differentmeasures that aim to provide useful insights intothe fabric of tagging motivation in social taggingsystems. The measures introduced below focus onstatistical aspects of users’ personomies only in-stead of using entire folksonomies. For each mea-sure, the underlying intuition, a formalization andthe results of applying the measure to our datasetsare presented next.

5.1. Growth of Tag Vocabulary - Mvocab

Intuition: Over time, the tag vocabulary (i.e.the set of all tags used in a users’ personomy) ofan ideal categorizer’s tag vocabulary would be ex-pected to reach a plateau. The rationale for thisidea is that only a limited set of categories is ofinterest to a user mainly motivated by categoriza-tion. On the other hand, an ideal describer wouldnot aim to limit the size of her tagging vocabulary,but would likely introduce new tags as required bythe contents of the resources being tagged.

Description: A measure to distinguish betweenthese two approaches would examine whether thecumulative tag vocabulary of a given user plateausover time.

Mvocab =|T ||R|

(1)

Figure 2, left-most column, depicts the growthof tag vocabulary over 3,000 bookmarks across thedifferent tagging datasets used in our study. Thedifferences between extreme describers (syntheticESP game data in the top row) and extreme cat-egorizers (synthetic Flickr sets data in the bottomrow) is stark. Data from Delicious users (in themiddle, left) shows that real word tagging behaviorlies in between the identified extremes.

Discussion: What can be observed from ourdata is that, for example, categorizers in taggingsystems do not start with a fixed vocabulary, butintroduce categories over time. For the first book-marks of users, Mvocab exhibits only marginal dif-ferences between categorizers and describers, since

7

Page 8: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

Figure 2: Overview of the introduced measures (from left to right: Mvocab, Mdesc, Mcat and Mcombined) over time for the twosynthetic datasets (top and bottom row) and the Delicious dataset (middle row). The synthetic datasets form approximateupper- and lower bounds for “real” tagging datasets.

not a huge variety of different behavior can be ex-amined on the y-scale. For the other measures thevariety is reflected immediately in the figures. Inaddition, tag ratio is influenced by the number oftags that users assign to resources, which can varyfrom user to user regardless of their tagging mo-tivation. To address some of these issues, we willpresent more sophisticated measures that follow dif-ferent intuitions about categorizers and describersnext.

5.2. Orphaned Tags - Mdesc

Intuition: Categorizers would have a low inter-est in introducing orphaned tags, i.e. tags (or cat-egories) that are only used for a single resource,and are then abandoned. Introducing categoriesof size one seems counterintuitive for categorizers,as it would prevent them from using their set oftags for browsing efficiently. On the other hand,describers would not refrain from introducing or-phaned tags but would rather actively introduce arich variety of different tags to resources so thatre-finding becomes easier when searching for themlater. Following this intuition, describers can beassumed to have a tendency to produce more or-phaned tags than categorizers.

Description: The number of tags that are usedonly once within each personomy in relation to thetotal number of tags would be a simple measure todetect this difference in motivation. A more robustapproach to measuring the influence of orphanedtags is described as follows:

Mdesc =|{t : |R(t)| ≤ n}|

|T |, n = d |R(tmax)|

100e(2)

We introduce a measure Mdesc that aims to cap-ture ideal describer behavior based on orphanedtags. In the formula R(t) denotes the number of re-sources a tag t annotates. The measure ranges from0 to 1, where a high value reflects a large number oftags that are used rarely. The threshold n for raretags is derived from the individual users’ taggingstyle, where tmax is the most frequently used tag ofthe user. The Mdesc measure can be interpreted asthe proportion of tags within the first percentile ofthe tag frequency histogram. It is more robust thana measure focusing on tags with frequency = 1 be-cause it is not only influenced by those tags, but byits relation to the most popular tag of a given user.A high Mdesc value can be expected to indicate thatthe main motivation of a user is to describe the re-sources.

8

Page 9: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

The second column from the left in Figure 2shows that Mdesc (orphaned tags) indeed captures adifference between extreme describers (ESP Gameon top) and extreme categorizers (Flickr photo setson the bottom). Again, data from Delicious showa vast variety of different rates of orphaned tags,suggesting that user motivation varies significantlywithin this system.

Discussion: The Mdesc ranges from 0 to 1 where0 represents extreme categorizing behavior and 1 in-dicates extreme descriptive behavior. The resultingvalue gives insight into how a user re-uses tags inher library.

5.3. Tag Entropy - Mcat

Intuition: For categorizers, useful tags shouldbe maximally discriminative with regard to the re-sources they are assigned to. This would allow cat-egorizers to effectively use tags for navigation andbrowsing. This observation can be exploited to de-velop a measure for tagging motivation when view-ing tagging as an encoding process, where entropy[26] can be considered a measure of the suitabilityof tags for this task. A categorizer would have astrong incentive to maintain high tag entropy (orinformation value) in his tag cloud. In other words,a categorizer would want the tag-frequency distri-bution as equally distributed as possible in orderfor her to be useful as a navigational aid. Oth-erwise, tags would be of little use in browsing. Adescriber on the other hand would have little inter-est in maintaining high tag entropy as tags are notused for navigation at all.

Description: In order to measure the suitabil-ity of tags to navigate resources, we develop anentropy-based measure for tagging motivation, us-ing the set of tags and the set of resources as ran-dom variables to calculate conditional entropy. Ifa user employs tags to encode resources, the con-ditional entropy should reflect the effectiveness ofthis encoding process. The measure uses a user’stags Tu and her resources Ru as input respectively:

H(R|T ) = −∑r∈R

∑t∈T

p(r, t)log2(p(r|t)) (3)

The joint probability p(r, t) depends on the distri-bution of tags over the resources. The conditionalentropy can be interpreted as the uncertainty of theresource that remains after knowing the tags. Theconditional entropy is measured in bits and is influ-enced by the number of resources and the tag vo-cabulary size. To account for individual differences

in users, we propose a normalization of the con-ditional entropy so that only the encoding qualityremains. As a factor of normalization we can calcu-late the conditional entropy Hopt(R|T ) of an idealcategorizer, and relating it to the actual conditionalentropy of the user at hand. Calculating Hopt(R|T )can be accomplished by modifying p(r, t) in a waythat reflects a situation where all tags are equallydiscriminative while at the same time keeping theaverage number of tags per resource the same as inthe user’s personomy. Based on this approach, wedefine a measure for tagging motivation by calculat-ing the difference between the observed conditionalentropy and the conditional entropy of an ideal cat-egorizer put in relation to the conditional entropyof the ideal categorizer:

Mcat =H(R|T )−Hopt(R|T )

Hopt(R|T )(4)

In Figure 2 the conditional tag entropy Mcat isshown in the second column from the right. It cap-tures the differences between the synthetic, extremecategorizers and describers well, evident in the datafrom the ESP game (top) and Flickr sets (bottom).Again, real world tagging behavior shows significantvariation.

Discussion: Conditional Tag Entropy rangesfrom 0 for users that behave exactly like an idealcategorizer, and can reach values greater than 1 forextreme non-categorizing users. The resulting valueis thus inversely proportional to users’ encoding ef-fectiveness. The results presented in Figure 2 showthat Delicious users exhibit traits of categorizersto varying extent, with some users showing scoresthat we calculated for ideal categorizers (lower partof the corresponding diagram). Further, it can beseen that the Mcat measure identifies users of theESP game as extreme describers whereas users ofthe Flickr sets dataset are found to be extreme cat-egorizers. An interesting aspect is that in the Deli-cious dataset users are included which outperformthe extreme synthetic describers.

5.4. A Combination of Measures - Mcombined

The final proposed measure is a combination ofboth Mdesc and Mcat because (i) these measureboth capture assumptions about intuitions that weexcept to see in describers and categorizers and (ii)they are not dependent on tagging styles (such as

9

Page 10: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

the number of tags per resource) that might varyacross describers and categorizers.

Mcombined =Mdesc + Mcat

2(5)

The right-most column of Figure 2 depicts thetemporal behavior of Mcombined for the two syn-thetic datasets and the Delicious users.

5.5. Evaluation: Usefulness of Measures

The introduced measures have a number of usefulproperties: they are content-agnostic and language-independent, and they operate on the level of in-dividual users. An advantage of content-agnosticmeasures is that they are applicable across differ-ent media (e.g. photos, music and text). Becausethe introduced measures are language-independent,they are applicable across different user populations(e.g. English vs. German). Because the measuresoperate on an personomy level, only tagging recordsof individual users are required (as opposed to com-paring a users’ vocabulary to the vocabulary of anentire folksonomy).

Figure 3 depicts Mdesc and Mcat measures forall tagging datasets at |Ru| = 500, i.e. at thepoint where all users have bookmarked exactly 500resources. We can see that both measures iden-tify synthetic describer behavior (ESP game, up-per left) and synthetic categorizer behavior (Flickrphotosets, lower right) as extreme behavior. Wecan use the synthetic data as points of referencefor the analysis of real tagging data, which wouldbe expected to lie in between these points of ref-erence. The diagrams for the real-world datasetsshow that tagging motivation in fact mostly lies inbetween the identified extremes. The fact that thesynthetic datasets act as approximate upper andlower bounds for real-world datasets is a first indi-cation for the usefulness of the presented measures.We also calculated Pearson’s correlation betweenMdesc and Mcat. The results presented in Figure 3are encouraging because the measures were inde-pendently developed based on different intuitionsand yet they are highly correlated.

Figure 4 presents a different visualization for se-lected datasets to illustrate how tagging motiva-tion varies within and across social tagging systems.Each row shows the distribution of users for a par-ticular dataset according to Mcombined. Again, wesee that the profiles of Delicious, Diigo and Movie-lens as well as other datasets largely lie in between

these bounds. It can be seen that users from theDiigo system tend to be more driven by descriptionbehavior whereas users from Movielens seem to usetheir tags in a categorization manner. An interest-ing aspect that was not evaluated during our ex-periments was the influence on user interface char-acteristics on the behavior of users. The character-istic distribution of different datasets provides firstempirical insights into the fabric of tagging motiva-tion in different systems, illustrating a broad varietywithin these systems. Each row shows the distribu-tion of users of a particular dataset according to themean of the two measures Mdesc and Mcat, binnedinto 5 equal bins in the interval [0..1]. From the di-agram we can see that two profiles - the profiles forthe two synthetic datasets - the ESP Game (backrow) and the Flickr Photosets (second back row)- stand out drastically in comparison to all otherdatasets. While almost all users of the Flickr photo-set dataset fall into the first bin indicating that thisbin is characteristic for categorizers (second backrow), almost all users of the ESP Game fall intothe fifth bin indicating that this bin is characteris-tic for describers (back row). Using this data as apoint of reference, we can now characterize taggingdescribermotivation in the profiles of the remainingdatasets. The profiles of Flickr tags, Delicious andBibsonomy largely lie in between these bounds, andthereby provide insights into the fabric of taggingmotivation in those systems.

In addition, we want to evaluate whether indi-vidual users that were identified as extreme cat-egorizers / extreme describers by Mcombined werealso confirmed as such by human subjects. In ourevaluation, we asked one human subject (who wasnot related to this research) to classify 40 exem-plary tag clouds into two equally-sized piles: a cat-egorizer and a describer pile. The 40 tag cloudswere obtained from users in the Delicious dataset,where we selected the top20 categorizers and thetop20 describers as identified by Mcombined. Theinter-rater agreement kappa between the results ofthe human subject evaluation and Mcombined was1.0. This means the human subject agrees thatthe top20 describers and the top20 categorizers (asidentified by Mcombined) are good examples of ex-treme categorization / description behavior. Thetag clouds illustrated earlier (cf. Figure 1) were ac-tual examples of tag clouds used in this evaluation.

Finally, it is interesting to investigate the min-imum number of tagging events that is necessaryto approximate a users’ motivation for tagging at

10

Page 11: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

0 50 100 150 200 250 300

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

ESP Game*

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.82)Mcat (mean=0.83)

Correlation of measures: 0.72

Mcombined = 0.82

0 100 200 300 400 500 600

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

CiteULike

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.62)Mcat (mean=0.7)

Correlation of measures: 0.81

Mcombined = 0.66

0 20 40 60 80 100 120 140

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Diigo Tags

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.61)Mcat (mean=0.66)

Correlation of measures: 0.92

Mcombined = 0.63

0 200 400 600 800

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Delicious

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.52)Mcat (mean=0.55)

Correlation of measures: 0.9

Mcombined = 0.53

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Bibsonomy Bookmarks

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.52)Mcat (mean=0.56)

Correlation of measures: 0.77

Mcombined = 0.54

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Bibsonomy Publications

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.61)Mcat (mean=0.69)

Correlation of measures: 0.85

Mcombined = 0.65

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Movielens

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.5)Mcat (mean=0.46)

Correlation of measures: 0.72

Mcombined = 0.48

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Flickr

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.47)Mcat (mean=0.47)

Correlation of measures: 0.91

Mcombined = 0.47

0 200 400 600 800 1000 1200 1400

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Flickr Sets*

Users sorted by Mcombined

Mea

sure

Mdesc (mean=0.055)Mcat (mean=0.12)

Correlation of measures: 0.72

Mcombined = 0.086

Figure 3: Mdesc and Mcat at |Ru| = 500 for 9 datasets, including Pearson correlation and the mean value for Mcombined. Thetop-left and the lower-right figure show the reference datasets for describers and categorizers respectively (designated with anasterisk).

Table 4: Classification results for <500 resources according to the Mcombined measure on the delicious dataset. The identifieduser behavior at 500 resources was used as a ground truth for the previous steps.

0 100 200 300 400Correctly identified user behavior 50 77 84 90 91Incorrectly identified user behavior 50 23 16 10 9

later stages. Figure 5 shows that for many users,Mcombined provides a good approximation of users’

tagging motivation particularly at early phases of auser’s tagging history. This knowledge might help

11

Page 12: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

Flickr'Tags'

Bibsonomy'Publica4ons'

CiteULike'

Diigo'Delicious'Bibsonomy'Bookmarks'MovieLens'ESP*'Flickr'Sets*'

0'

0.2'

0.4'

0.6'

0.8'

1'

0.2' 0.4' 0.6' 0.8' 1'

Percen

tage)of)U

sers)

M)combined)

Are)tags)used)to)categorize)or)to)describe)resources?)Flickr'Tags'

Bibsonomy'Publica4ons'

CiteULike'

Diigo'

Delicious'

Bibsonomy'Bookmarks'

MovieLens'

ESP*'

Flickr'Sets*'

Figure 4: Mcombined at |Ru| = 500 for 9 different datasets, binned in the interval [0.0 .. 1.0]. The synthetic Flickr Sets dataset(back row) indicates extreme categorizer behavior, the synthetic ESP game dataset (second back row) indicates extremedescriber behavior (designated by an asterisk).

to approximate users’ motivation for tagging overtime. In Table 4, we show the classification accu-racy of our approach using the classification of usersat 500 resources as our ground truth. Our resultsprovide first insights into how reducing the numberof required resources might influence classificationaccuracy. The results presented suggest that using100 instead of 500 tagged resources would intro-duce 23 false classifications. However, more workis required to address this question in a more com-prehensive way.

5.6. Other Potential Measures

The three introduced measures for tagging mo-tivation were developed independently, using dif-ferent intuitions. While the growth of the tag vo-cabulary is a measure that can be regarded to beinfluenced by many latent user characteristics (e.g.the rate of tags/resource), Mdesc and Mcat are in-tended to normalize behavior and account for in-dividual differences of tagging behavior not related

to tagging motivation. Mdesc was motivated by theintention to identify describer behavior while Mcat

was motivated by the intention to identify catego-rizer behavior. For Mdesc and Mcat, small values(close to 0) can be seen as indicators for a taggingbehavior mainly motivated to categorize resourcesand high values can be seen as an indicator for de-scribers.

Other ways of measuring user motivation includetapping into the knowledge contained in resourcesthemselves or relating individual users’ tags to tagsof the entire folksonomy. To give an example:

A conceivable measure to identify describerswould calculate the overlap between the tag vo-cabulary and the words contained in a resource(e.g. a URL or scientific paper), as proposed by[27]. According to our theory, tags produced bydescribers would be expected to have a higher vo-cabulary overlap than the ones of categorizers. Oneproblem of such an approach is that it would relyon resources being available in the form of textual

12

Page 13: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

a. Mdesc

b. Mcat

c. Mcombined

Figure 5: Mdesc (a.), Mcat (b.) and Mcombined (c.) in the interval |Ru| = [0..500] for 100 random users obtained from thedelicious dataset. The 100 users are split into two equal halves at |Ru| = 500 according the the corresponding measure, withthe upper half colored red (50 describers) and the lower half colored blue (50 categorizers). Due to our particular approach, it isobvious that all measures exhibit perfect separation at |Ru| = 500, but Mcombined (right) appears to exhibit faster convergenceand better separability in early phases of a user’s tagging history, especially for small |Ru| (such as |Ru| < 50).

data, which limits its use to tagging systems thatorganize textual resources and prohibits adoptionfor tagging systems that organize photos, audio orvideo data.

Another measure might tap into the vocabularyof folksonomies: if a user tends to use tags that arefrequently used in the vocabulary of the entire so-cial folksonomy, this might serve as an indicator forher to be motivated by description, as she orients

her vocabulary towards that of the crowd consen-sus. If she does’t, this might serve as an indicatorfor Categorization. Problems of such an approachinclude that it is a) language dependent, and b)it requires global knowledge of the vocabulary of afolksonomy.

By operating on a content-agnostic, user-specificand language-independent level, the measures intro-duced in this paper can be applied to the tagging

13

Page 14: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

records of a single user, without the requirementto crawl the entire tagging system or to have pro-grammatic access to the resources that are beingtagged.

6. Influence on Tag Agreement

Given that tagging motivation varies even withinsocial tagging systems, we are now interested inwhether such differences influence resulting folk-sonomies. We can assert an influence if folk-sonomies primarily populated by categorizers woulddiffer from folksonomies primarily populated by de-scribers in some relevant regard. We investigatethis question next.

In order to answer this question, we selected theDelicious dataset for further investigations. Thiswas done for the following reasons: firstly thedataset exhibits a broad range of diverse behav-ior with regard to tagging motivation. Secondly,Delicious is a broad folksonomy3 allowing multi-ple users to annotate a single resource and there-fore generating a good coverage of multiply taggeddocuments. Other datasets represent narrow folk-sonomies - such as flickr - where most resources aretagged by a single user only, or do not exhibit suf-ficient overlap to conduct meaningful analysis.

To explore whether tagging motivation influencesthe nature of tags, we split the Delicious user baseinto two groups of equal size by selecting a thresholdof 0.5514 for the Mcombined measure: the first groupis defined by users who score lower than the thresh-old. We refer to this group as Delicious categoriz-ers from now on. The second group is defined asthe complementary set of users, with an Mcombined

greater than 0.5514. We refer to this group as De-licious describers. This allows us to generate twoseparate tag sets for each of the 500 most popularresources contained in our dataset, where one tagset is constructed by Delicious describers and theother by Delicious categorizers. For each tag set,we can calculate tag agreement, i.e. the number oftags that k percent of users agree on for a givenresource.

We define agreement on tags in the following way:

Ta = t ∈ Tr,|Ut||Ur|

> k (6)

3http://www.vanderwal.net/random/entrysel.php?

blog=1635

In this equation Ut denotes the number of usersthat have a resource annotated with the tag t andUr represents the number of users that have the re-source in their library. |Ta| describes the numberof tags that k percent of users agree on. We re-strict our analysis to |Ut| > 3 to avoid irrelevanthigh values of tag agreement when only few usersare involved. From a theoretical point of view, wecan expect that describers would achieve a higherdegree of agreement compared to categorizers, asdescribers focus on describing the content of re-sources. This is exactly what can be observed inour data.

Table 6 shows that among the 500 most popu-lar resources in our dataset, describers achieve asignificantly higher tag agreement than categoriz-ers (‘describer wins’ in the majority of cases). Fork=30% we can see that describers win in 94.2% ofcases (471 out of 500 resources), categorizers win1% of cases (5 resources) and 4.8% of cases (24resources) are a tie. Other k produce similar re-sults. We have conducted similar analysis on theCiteULike dataset, which has a much smaller num-ber of overlapping tag assignments, and thereforeis less reliable. The results from this analysis issomewhat conflicting - further research is requiredto study the effects more thoroughly. However, webelieve that our work has some important implica-tions: the findings suggest that when consideringthe users’ motivation for tagging, not all tags areequally meaningful.

Depending on a researcher’s task, a researchermight benefit from focusing on data produced bydescribers or categorizers only. To give an exam-ple: for tasks where high tag agreement is relevant(e.g. tag-based search in non-textual resources),tags produced by describers might be more use-ful than tags produced by categorizers. In relatedwork, we showed that the emergent semantics thatcan be extracted from folksonomies are influencedby our distinction between different motivations fortagging (categorizers vs. describers) [28].

7. Limitations

Our work focuses on understanding tagging mo-tivation based on a distinction between two cat-egories, i.e. categorization vs. description, anddoes not yet deal with other types of motivations,such as recognition, enjoyment, self-expression andother categories proposed in the literature (cf.[16, 17, 29, 18]). Future work should investigate

14

Page 15: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

Table 5: Tag agreement among Delicious describers and categorizers for 500 most popular resources. For all different k,describers produce more agreed tags than categorizers. Results for k > 70 are dominated by ties.

k 10 20 30 40 50 60 70 80 90Desc. Wins 378 463 471 452 380 286 172 74 23Cat. Wins 56 12 5 7 5 3 4 4 0Ties 66 25 24 41 115 211 324 422 477

the degree to which other categories of tagging mo-tivation are amenable to quantitative analysis. Thiswork is the first attempt towards a viable way of au-tomatically and quantitatively understanding tag-ging motivation in social tagging systems.

Because we distinguish between describers andcategorizers based on observing their tagging be-havior over time, a minimum number of resourcesper user is required to observe sufficient differences.In our case, we have required a |Ru| min of 500respectively 1, 000 resources. In some tagging ap-plications, this requirement is likely to be too high(e.g. youtube.com). However, from our data - es-pecially from our analysis presented in Figure 5- we feel that measuring tagging motivation for|Ru| < 500 appears feasible as well, but the intro-duced measures might produce results that are oflower accuracy, making analysis of tagging motiva-tion more challenging. While we regard this issueto be important, we leave it to future work.

Furthermore, all analyses on “real” social taggingdatasets conducted in this work focused on analyz-ing publicly available tagging data only - privatebookmarks are not captured by our datasets. Towhat extent private bookmarks might influence in-vestigations of tagging motivation remains an openquestion.

8. Contribution and Conclusions

The contribution of this paper is twofold: firstlythe paper presents the first thorough literature sur-vey on the topic of tagging motivation. Secondly,we show a preliminary study presenting measuresfor approximating tagging motivation.

This paper presented results from a large em-pirical study involving data from seven differentsocial tagging datasets. We introduce and applyan approach that aims to measure and detect thetacit nature of tagging motivation in social taggingsystems. We have evaluated the introduced mea-sures with synthetic datasets of extreme behavioras points of reference, via a human subject study

and via triangulation with previous findings. Basedon a large sample of users, our results show that 1)tagging motivation of individuals not only variesacross but also significantly within social taggingsystems, and 2) that users’ motivation for tagginghas an influence on resulting tags and folksonomies.By analyzing the tag sets produced by Delicious de-scribers and Delicious categorizers in greater detail,we showed that agreement on tags among catego-rizers is significantly lower compared to agreementamong describers. These findings have some impor-tant implications:

Usefulness of Tags: Our research shows thatusers motivated by categorization produce fewer de-scriptive tags,and that the tags they produce ex-hibit a lower agreement among users for given re-sources. This provides further evidence that notall tags are equally useful for different tasks, suchas search or recommendation. Rather the oppositeseems to be the case: to be able to assess the useful-ness of tags on a content-independent level, deeperinsight into users’ motivation for tagging is needed.The measures introduced in this paper aim to illu-minate a path towards understanding user motiva-tion for tagging in a quantitative, content-agnosticand language-independent way that is based on lo-cal data of individual users only. In related work,the introduced distinction between categorizers anddescribers was used to demonstrate that emergentsemantics in folksonomies are influenced by theusers’ motivation for tagging [28]. However, theinfluence of tagging motivation on other areas ofwork, such as taxonomy learning from tagging sys-tems [30], simulation of tagging processes [31] ortag recommendation [32], remains to be studied infuture work.

Usage of Tagging Systems: While tags havebeen traditionally viewed as a way of freely de-scribing resources, our analysis reveal that the mo-tivation for tagging across different real world so-cial tagging systems such as Delicious, Bibsonomyand Flickr varies tremendously. Our analysis cor-roborates previous qualitative research and pro-

15

Page 16: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

vides first quantitative evidence for this observa-tion. Moreover, our data shows that even withinthe same tagging systems the motivation for tag-ging varies strongly. Our research also highlightsseveral opportunities for designers of social taggingsystems to influence user behavior. While catego-rizers could benefit from tag recommenders thatrecommend tags based on their individual tag vo-cabulary, describers could benefit from tags thatbest describe the content of the resources. Offer-ing users tag clouds to aid the navigation of theirresources might represent a way to increase the pro-portion of categorizers, while offering more sophis-ticated search interfaces and algorithms might en-courage users to focus on describing resources.

Studying the extent to which the motivation be-hind tagging influences other properties of socialtagging systems such as efficiency [4], structure [33],emergent semantics [34] or navigability [35] repre-sent interesting areas for future research.

9. Acknowledgements

Thanks to Hans-Peter Grahsl for support incrawling the data sets and to Mark Kroll for com-ments on earlier versions of this paper. This work isfunded by the FWF Austrian Science Fund GrantP20269 TransAgere. The Know-Center is fundedwithin the Austrian COMET Program under theauspices of the Austrian Ministry of Transport, In-novation and Technology, the Austrian Ministry ofEconomics and Labor and by the State of Styria.COMET is managed by the Austrian Research Pro-motion Agency FFG.

References

[1] S. Golder, B. Huberman, Usage patterns of collabora-tive tagging systems, Journal of Information Science32 (2) (2006) 198.

[2] C. Marlow, M. Naaman, D. Boyd, M. Davis, Ht06, tag-ging paper, taxonomy, flickr, academic article, to read,in: HYPERTEXT ’06: Proceedings of the 17th Confer-ence on Hypertext and Hypermedia, ACM, New York,NY, USA, 2006, pp. 31–40.

[3] P. Mika, Ontologies are us: A unified model of socialnetworks and semantics, Web Semantics: Science, Ser-vices and Agents on the World Wide Web 5 (1) (2007)5–15.

[4] E. Chi, T. Mytkowicz, Understanding the efficiencyof social tagging systems using information theory, in:HYPERTEXT ’08: Proceedings of the 19th ACM Con-ference on Hypertext and Hypermedia, ACM New York,NY, USA, 2008, pp. 81–88.

[5] Y. Yanbe, A. Jatowt, S. Nakamura, K. Tanaka, Can so-cial bookmarking enhance search in the web?, in: Pro-ceedings of the 7th ACM/IEEE-CS Joint Conference onDigital Libraries, ACM, 2007, p. 116.

[6] P. Heymann, G. Koutrika, H. Garcia-Molina, Can so-cial bookmarking improve web search?, in: WSDM ’08:Proceedings of the int’l Conference on Web Search andWeb Data Mining, ACM, 2008, pp. 195–206.

[7] A. Hotho, R. Jaschke, C. Schmitz, G. Stumme, Infor-mation retrieval in folksonomies: Search and ranking,Lecture Notes in Computer Science 4011 (2006) 411.

[8] P. Heymann, H. Garcia-Molina, Collaborative creationof communal hierarchical taxonomies in social taggingsystems, Tech. rep., Computer Science Department,Standford University (April 2006).URL dbpubs.stanford.edu:8090/pub/2006-10

[9] M. Heckner, M. Heilemann, C. Wolff, Personal informa-tion management vs. resource sharing: Towards a modelof information behaviour in social tagging systems, in:ICWSM ’09: Int’l AAAI Conference on Weblogs andSocial Media, San Jose, CA, USA, 2009.

[10] T. Hammond, T. Hannay, B. Lund, J. Scott, Socialbookmarking tools (I), D-Lib Magazine 11 (4) (2005)1082–9873.

[11] R. Wash, E. Rader, Public bookmarks and private ben-efits: An analysis of incentives in social computing, in:ASIS&T Annual Meeting, Citeseer, 2007.

[12] T. Coates, Two cultures of faux-onomies collide, http://www. plasticbag.org/archives/2005/06/two cultures of fauxonomies collide/.Last access: May 8 (2005) 2008.

[13] Z. Xu, Y. Fu, J. Mao, D. Su, Towards the semantic web:Collaborative tag suggestions, in: Collaborative WebTagging Workshop at WWW2006, Edinburgh, Scot-land, Citeseer, 2006.

[14] S. Sen, S. Lam, A. Rashid, D. Cosley, D. Frankowski,J. Osterhouse, F. Harper, J. Riedl, Tagging, communi-ties, vocabulary, evolution, in: Proceedings of the 200620th anniversary Conference on Computer SupportedCooperative Work, ACM, 2006, p. 190.

[15] M. Heckner, T. Neubauer, C. Wolff, Tree, funny,to read, google: what are tags supposed to achieve?a comparative analysis of user keywords for differ-ent digital resource types, in: Proceedings of the2008 ACM workshop on Search in social media, SSM’08, ACM, New York, NY, USA, 2008, pp. 3–10.doi:10.1145/1458583.1458589.URL http://doi.acm.org/10.1145/1458583.1458589

[16] O. Nov, M. Naaman, C. Ye, Motivational, Structuraland Tenure Factors that Impact Online CommunityPhoto Sharing, in: ICWSM 2009: Proceedings of AAAIInt’l Conference on Weblogs and Social Media, 2009.

[17] M. Ames, M. Naaman, Why we tag: motivations forannotation in mobile and online media, in: CHI ’07:Proceedings of the SIGCHI Conference on Humanfactors in Computing Systems, ACM, New York, NY,USA, 2007, pp. 971–980.URL http://portal.acm.org/citation.cfm?id=

1240624.1240772

[18] K. Bischoff, C. S. Firan, W. Nejdl, R. Paiu, Can alltags be used for search?, in: CIKM ’08: Proceedingof the 17th ACM Conference on Information andKnowledge Management, ACM, New York, NY, USA,2008, pp. 193–202.URL http://portal.acm.org/citation.cfm?id=

16

Page 17: Understanding Why Users Tag: A Survey of Tagging ...markusstrohmaier.info/documents/2012_JWS_Understanding_Why_Users_Tag.pdftems and showed that users in Delicious exhibit a variety

1458112

[19] E. Rader, R. Wash, Influences on tag choices indel.icio.us, in: CSCW ’08: Proceedings of the ACM2008 Conference on Computer Supported CooperativeWork, ACM, New York, NY, USA, 2008, pp. 239–248.URL http://portal.acm.org/citation.cfm?id=

1460563.1460601&coll=Portal&dl=GUIDE&CFID=

31571793&CFTOKEN=71651308

[20] C. Korner, R. Kern, H. P. Grahsl, M. Strohmaier, Ofcategorizers and describers: An evaluation of quantita-tive measures for tagging motivation, in: 21st ACMSIGWEB Conference on Hypertext and Hypermedia(HT 2010), Toronto, Canada, ACM, Toronto, Canada,2010.

[21] S. Sen, S. K. Lam, A. M. Rashid, D. Cosley,D. Frankowski, J. Osterhouse, F. M. Harper, J. Riedl,tagging, communities, vocabulary, evolution, in:CSCW ’06: Proceedings of the 2006 20th AnniversaryConference on Computer Supported Cooperative Work,ACM, New York, NY, USA, 2006, pp. 181–190.URL http://portal.acm.org/citation.cfm?id=

1180904

[22] M. Strohmaier, C. Korner, R. Kern, Why do users tag?Detecting users’ motivation for tagging in social tag-ging systems, in: International AAAI Conference onWeblogs and Social Media (ICWSM2010), Washington,DC, USA, May 23-26, 2010.

[23] J. Sinclair, M. Cardew-Hall, The folksonomy tag cloud:when is it useful?, Journal of Information Science 34 (1)(2008) 15.

[24] R. Lambiotte, M. Ausloos, Collaborative tagging as atripartite network, Lecture Notes in Computer Science3993 (2006) 1114.

[25] A. Hotho, R. Jaschke, C. Schmitz, G. Stumme, Bib-Sonomy: A social bookmark and publication sharingsystem, in: Proceedings of the Conceptual StructuresTool Interoperability Workshop at the 14th Int’l Con-ference on Conceptual Structures, Citeseer, 2006, pp.87–102.

[26] T. M. Cover, J. A. Thomas, Elements of informationtheory, Wiley-Interscience, New York, NY, USA, 1991.

[27] U. Farooq, T. Kannampallil, Y. Song, C. Ganoe, J. Car-roll, L. Giles, Evaluating tagging behavior in socialbookmarking systems: metrics and design heuristics,in: Proceedings of the 2007 int’l ACM Conference onSupporting Group Work, ACM, 2007, pp. 351–360.

[28] C. Korner, D. Benz, M. Strohmaier, A. Hotho,G. Stumme, Stop thinking, start tagging - tag semanticsemerge from collaborative verbosity, in: Proceedingsof the 19th International World Wide Web Conference(WWW 2010), ACM, Raleigh, NC, USA, 2010.

[29] Z. Xu, Y. Fu, J. Mao, D. Su, Towards the semanticweb: Collaborative tag suggestions, in: Proceedingsof the Collaborative Web Tagging Workshop at theWWW 2006, Edinburgh, Scotland, 2006.URL http://.inf.unisi.ch/phd/mesnage/site/

Readings/Readings.html

[30] T. Eda, M. Yoshikawa, T. Uchiyama, T. Uchiyama,The effectiveness of latent semantic analysis for build-ing up a bottom-up taxonomy from folksonomy tags,World Wide Web 12 (4) (2009) 421–440.URL http://dx.doi.org/10.1007/

s11280-009-0069-1

[31] P. A. Chirita, S. Costache, W. Nejdl, S. Handschuh,P-tag: large scale automatic generation of personalized

annotation tags for the web, in: WWW ’07: Proceed-ings of the 16th international conference on WorldWide Web, ACM, New York, NY, USA, 2007, pp.845–854.URL http://portal.acm.org/citation.cfm?id=

1242686&dl=GUIDE,

[32] A. Shepitsen, J. Gemmell, B. Mobasher, R. Burke,Personalized recommendation in social tagging systemsusing hierarchical clustering, in: Proceedings of the2008 ACM conference on Recommender systems, ACM,2008, pp. 259–266.

[33] S. Golder, H. B.A., Usage patterns of collaborativetagging systems, Journal of Information Science 32 (2)(2006) 198–208.URL http://jis.sagepub.com/cgi/doi/10.1177/

0165551506062337

[34] C. Korner, D. Benz, A. Hotho, M. Strohmaier,G. Stumme, Stop thinking, start tagging: Tag seman-tics arise from collaborative verbosity, in: 19th Inter-national World Wide Web Conference (WWW2010),Raleigh, NC, USA, April 26-30, ACM, 2010.

[35] D. Helic, C. Trattner, M. Strohmaier, K. Andrews, Onthe navigability of social tagging systems, in: The 2ndIEEE International Conference on Social Computing(SocialCom2010), Aug 20-22, 2010, Minneapolis, Min-nesota, USA, 2010.

17