1
Retrieval Effectiveness of Tagging Systems Isabella Peters, Laura Schumann, Jens Terliesner, Wolfgang G. Stock {isabella.peters | laura.schumann | jens.terliesner}@uni-duesseldorf.de Heinrich-Heine-University, Department of Information Science Universit ¨ atsstraße 1, 40225 D ¨ usseldorf, Germany Paper and full reference list at: http://tinyurl.com/retrieval-test Abstract Social tagging is a widespread activity for indexing user-generated content on Web services. This poster summarizes research on folksonomies and their retrieval effectiveness. A TREC-like retrieval test was conducted with tags and resources from the social bookmarking system delicious, which resulted in recall and precision values for tag-only searches. Moreover, several experimental tag- based databases (i.e., Power tags, Luhn tags) have been tested regarding their retrieval effectiveness. Test results show that folksonomies work best with short queries although recall values are high and precision values are low. Here, a search function “Power tags only” greatly enhances precision values. 1. Extract documents and tags I 1,989 resources consisting of the docsonomies of the delicious-bookmarks I collected from delicious.com in October 2010 I tags form the indexed databases for the retrieval test runs I each resource is tagged from at least 30 users I each resource was tagged with “folksonomy”,“seo” or “folksonomies” 2. Adjust tags I original tags (Information Retrieval) I tags unified, i.e. without special characters (informationretrieval) I tags unified and stemmed (informationretriev) Relevance Queries Tags All tags Power tags Luhn tags 4. Create information needs 55 information needs and search tasks have been created. They vary in their complexity as one can see in the following examples: simple lookup, complex lookup, exploratory search task. I Find a thesaurus I Find articles which present social bookmarking tools I Find articles which advise the combination of folksonomies and controlled vocabularies for indexing 5. Judge relevance of documents I 24 students and 9 people from staff of the department act as assessors I assessments are binary and are conducted manually I assessors had access to resources (i.e., websites). Tags were hidden for relevance assessments I if two assessors agreed in their decision the relevance judgment was saved immediately I this procedure results in 109,395 relevance judgments 6. Create queries I 25 students and 2 people from staff created queries for each information need 730 queries I Boolean operators and brackets are allowed (they can be used in delicious) I minimum number of query terms: 1, maximum number of query terms: 13 I average query length: 3 terms I experts from staff built queries for each search task with a maximum number of 5 query terms (simulation of real-world-users: they use 2 terms or less) 3. Extract Power tags and Luhn tags The algorithms developed for Power tag and Luhn tag extraction work differently and depend on the tag distribution of the docsonomy. 0 0,2 0,4 0,6 0,8 1 1,2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0,00 0,20 0,40 0,60 0,80 1,00 1,20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 7. Retrieval and evaluation The figures A to F visualize the results of the retrieval test and display average recall values, average precision values and values for F-measure for conducted search runs. Figures A to C visualize the results of the expert searches, figures D to F show the results of simple lookup-search tasks requesting one query term. Figures G and H visualize the relative benefit or loss while using Power tags and Luhn tags with expert queries (G) and one word queries (H). Test results show that retrieval in folksonomies works best with very short queries. Here, a search function “Power tags only” greatly enhances retrieval effectiveness. mean average recall for database mean average recall for database mean average recall for database mean average recall for database mean average precision for database mean average precision for database mean average precision for database mean average precision for database F-measure F-measure F-measure F-measure 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Original tags 55 information needs Expert queries all delicious tags restricted to Luhn tags restricted to Luhn tags and Power tags restricted to Power tags mean average recall for database mean average recall for database mean average recall for database mean average recall for database mean average precision for database mean average precision for database mean average precision for database mean average precision for database F-measure F-measure F-measure F-measure 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Unified tags 55 information needs Expert queries all delicious tags, unified restricted to Luhn tags, unified restricted to Luhn tags and Power tags, unified restricted to Power tags, unified mean average recall for database mean average recall for database mean average recall for database mean average recall for database mean average precision for database mean average precision for database mean average precision for database mean average precision for database F-measure F-measure F-measure F-measure 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Unified and stemmed tags 55 information needs Expert queries all delicious tags, unified and stemmed restricted to Luhn tags, unified and stemmed restricted to Luhn tags and Power tags, unified and stemmed restricted to Power tags, unified and stemmed Expert queries 0% 20% 40% 60% 80% 100% 120% 140% 160% original tags original tags restricted to Luhn tags original tags restricted to Power tags unified tags unified tags restricted to Luhn tags unified tags restricted to Power tags unified and stemmed tags unified and stemmed tags restricted to Luhn tags unified and stemmed tags restricted to Power tags Baseline = F-measure of delicious-tags for expert queries average recall for database average recall for database average recall for database average recall for database average precision for database average precision for database average precision for database average precision for database F-measure F-measure F-measure F-measure 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Original tags 5 information needs One word query all delicious tags restricted to Luhn tags restricted to Luhn tags and Power tags restricted to Power tags average recall for database average recall for database average recall for database average recall for database average precision for database average precision for database average precision for database average precision for database F-measure F-measure F-measure F-measure 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Unified tags 5 information needs One word query all delicious tags, unified restricted to Luhn tags, unified restricted to Luhn tags and Power tags, unified restricted to Power tags, unified average recall for database average recall for database average recall for database average recall for database average precision for database average precision for database average precision for database average precision for database F-measure F-measure F-measure F-measure 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Unified and stemmed tags 5 information needs One word query all delicious tags, unified and stemmed restricted to Luhn tags, unified and stemmed restricted to Luhn tags and Power tags, unified and stemmed restricted to Power tags, unified and stemmed 0% 20% 40% 60% 80% 100% 120% 140% 160% original tags original tags restricted to Luhn tags original tags restricted to Power tags unified tags unified tags restricted to Luhn tags unified tags restricted to Power tags unified and stemmed tags unified and stemmed tags restricted to Luhn tags unified and stemmed tags restricted to Power tags Baseline = F-measure of delicious-tags for one word queries 74th Annual Meeting of the American Society for Information Science and Technology, October 9-13, 2011, New Orleans, Louisiana The research is funded by the DFG (STO 764/4-1). We thank students and staff for their valuable contribution.

Retrieval Effectiveness of Tagging Systems...Social tagging is a widespread activity for indexing user-generated content on Web services. This poster summarizes research on folksonomies

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Retrieval Effectiveness of Tagging Systems...Social tagging is a widespread activity for indexing user-generated content on Web services. This poster summarizes research on folksonomies

Retrieval Effectiveness of Tagging SystemsIsabella Peters, Laura Schumann, Jens Terliesner, Wolfgang G. Stock{isabella.peters | laura.schumann | jens.terliesner}@uni-duesseldorf.de

Heinrich-Heine-University, Department of Information ScienceUniversitatsstraße 1, 40225 Dusseldorf, Germany

Paper and fullreference list at:

http://tinyurl.com/retrieval-test

Abstract

Social tagging is a widespread activity for indexing user-generatedcontent on Web services. This poster summarizes research onfolksonomies and their retrieval effectiveness. A TREC-like retrievaltest was conducted with tags and resources from the socialbookmarking system delicious, which resulted in recall and precisionvalues for tag-only searches. Moreover, several experimental tag-based databases (i.e., Power tags, Luhn tags) have been testedregarding their retrieval effectiveness. Test results show thatfolksonomies work best with short queries although recall values arehigh and precision values are low. Here, a search function “Powertags only” greatly enhances precision values.

1. Extract documents and tags

I 1,989 resources consisting of the docsonomies of thedelicious-bookmarks

I collected from delicious.com in October 2010

I tags form the indexed databases for the retrieval test runs

I each resource is tagged from at least 30 users

I each resource was tagged with “folksonomy”,“seo” or“folksonomies”

2. Adjust tags

I original tags (Information Retrieval)

I tags unified, i.e. without special characters (informationretrieval)

I tags unified and stemmed (informationretriev)

Relevance

Queries Tags

All tags

Pow

er tags

Luh

n tags

4. Create information needs

55 information needs and search tasks have been created. They vary in theircomplexity as one can see in the following examples: simple lookup, complexlookup, exploratory search task.

I Find a thesaurus

I Find articles which present social bookmarking tools

I Find articles which advise the combination of folksonomies and controlledvocabularies for indexing

5. Judge relevance of documents

I 24 students and 9 people from staff of the department act as assessors

I assessments are binary and are conducted manually

I assessors had access to resources (i.e., websites). Tags were hidden forrelevance assessments

I if two assessors agreed in their decision the relevance judgment was savedimmediately

I this procedure results in 109,395 relevance judgments

6. Create queries

I 25 students and 2 people from staff created queries for each information need→ 730 queries

I Boolean operators and brackets are allowed (they can be used in delicious)

I minimum number of query terms: 1, maximum number of query terms: 13

I average query length: 3 terms

I experts from staff built queries for each search task with a maximum number of5 query terms (simulation of real-world-users: they use 2 terms or less)

3. Extract Power tags and Luhn tags

The algorithms developed for Power tag and Luhn tag extractionwork differently and depend on the tag distribution of thedocsonomy.

0

0,2

0,4

0,6

0,8

1

1,2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0,00

0,20

0,40

0,60

0,80

1,00

1,20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

7. Retrieval and evaluation

The figures A to F visualize the results of the retrieval test and display average recall values, average precision values and values for F-measure for conducted search runs. Figures A to C visualize the results of the expert searches, figures D to F showthe results of simple lookup-search tasks requesting one query term. Figures G and H visualize the relative benefit or loss while using Power tags and Luhn tags with expert queries (G) and one word queries (H).

Test results show that retrieval in folksonomies works best with very short queries. Here, a search function “Power tags only” greatly enhances retrieval effectiveness.

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Original tags 55 information needs

Expert queries

all delicious tags restricted to Luhn tags restricted to Luhn tags and Power

tags restricted to Power tags

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Unified tags 55 information needs

Expert queries

all delicious tags, unified restricted to Luhn tags, unified restricted to Luhn tags and Power

tags, unified

restricted to Power tags, unified

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge r

eca

ll fo

r d

atab

ase

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

me

an a

vera

ge p

reci

sio

n f

or

dat

abas

e

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Unified and stemmed tags 55 information needs

Expert queries

all delicious tags, unified and stemmed

restricted to Luhn tags, unified and stemmed

restricted to Luhn tags and Power tags, unified and stemmed

restricted to Power tags, unified and stemmed

Expert queries

0%

20%

40%

60%

80%

100%

120%

140%

160%

original tags original tags

restricted to

Luhn tags

original tags

restricted to

Power tags

unified tags unified tags

restricted to

Luhn tags

unified tags

restricted to

Power tags

unified and

stemmed

tags

unified and

stemmed

tags

restricted to

Luhn tags

unified and

stemmed

tags

restricted to

Power tags

Baseline = F-measure of delicious-tags for expert queries

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Original tags 5 information needs

One word query

all delicious tags restricted to Luhn tags restricted to Luhn tags and Power

tags restricted to Power tags

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Unified tags 5 information needs

One word query

all delicious tags, unified restricted to Luhn tags, unified restricted to Luhn tags and Power

tags, unified

restricted to Power tags, unified

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

re

call

for

dat

abas

e

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

ave

rage

pre

cisi

on

fo

r d

atab

ase

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

F-m

eas

ure

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Unified and stemmed tags 5 information needs

One word query

all delicious tags, unified and stemmed

restricted to Luhn tags, unified and stemmed

restricted to Luhn tags and Power tags, unified and stemmed

restricted to Power tags, unified and stemmed

0%

20%

40%

60%

80%

100%

120%

140%

160%

original tags original tags

restricted to

Luhn tags

original tags

restricted to

Power tags

unified tags unified tags

restricted to

Luhn tags

unified tags

restricted to

Power tags

unified and

stemmed

tags

unified and

stemmed

tags

restricted to

Luhn tags

unified and

stemmed

tags

restricted to

Power tags

Baseline = F-measure of delicious-tags for one word queries

74th Annual Meeting of the American Society for Information Science and Technology, October 9-13, 2011, New Orleans, Louisiana The research is funded by the DFG (STO 764/4-1). We thank students and staff for their valuable contribution.