23
1 Constructing Folksonomies from User-Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

Embed Size (px)

Citation preview

Page 1: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

1

Constructing Folksonomies from User-Specified Relations on Flickr

Anon Plangprasopchok and

Kristina Lerman(WWW 2009)

Page 2: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

2

Motivation

UsersWeb content

hierarchical classification

Consume

ProduceAnnotate

Organize

DiscoverAnnotation /

Metadata

Organize Search Recommend Leverage Categorize

Page 3: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

3

Motivation

Goal: to induce category knowledge from social annotation produced by many users

• Metadata from an individual user may be too inaccurate and incomplete…

•The metadata from different users may complement each other, making it, in combination, meaningful.

Page 4: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

4

Folksonomy

• Original definition: classification emerging from the use of tags by users (Thomas Vander Wal)

• In this work: hidden classification hierarchies from annotation created by many users

Page 5: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

5

Hierarchical Relations in Social Web

• Appear Implicitly

• Appear Explicitly

Tags:InsectGrasshopperAustralianMacroOrthopteran (直翅類 )

Folder (collection)

Sub folder (set)

Relations

Goal: to induce deeper hierarchies from this metadata

Page 6: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

6

Outline

• Motivation

• Approaches

• Results

• Discussion

• Related work

Page 7: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

7

Inducing Hierarchy from TagsExisting approaches

• Graph based (Mika05)• build a network of associated tags (node = tag, edge = co-occurrence of tags)• suggest applying betweenness centrality and set theory to determine broader/narrower relations

• Hierarchical Clustering (Brooks06; Heymann06+)•Tags appear more frequently would have higher centrality and thus more abstract.

• Probabilistic subsumption (Sanderson99+, Schmitz06)• x is broader than y if x subsumes y• x subsumes y if p(x|y) > t & p(y|x) < t x

y

Page 8: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

8

Inducing Hierarchy from Tags• Some difficulties when using tags to induce hierarchy:

Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06]

Washington United States

Car Automobile

Notation: A B (A is broader than B)(hypernym relation)

Insect Hongkong

Color Brazilian

Specificity Rarity

Tags are from different facets*

Page 9: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

9

Inducing Hierarchy from user-specified relations

• User specified relations, e.g., – Flickr’s Collection-Set , – Delicious’ Bundle-Tag, – Bibsonomy’s Relation-Tag

• Key intuition: Not so many people specify peculiar relations like – “automobile” “car”, or – “Washington” “United States”

Page 10: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

10

Simple Strategy

Sets

Collection

The Netherlands - Holanda

Set

Collection

Blijdorp - Rotterdam Tokenize + Stem

Concept relations

netherland holanda

blijdorp

rotterdam

holanda

rotterdam

countri

netherland

netherland

blijdorp

2. Link concepts & Select path

blijdorp

countri

netherlandholland china

……

1. Remove “noisy” relations- Conflict resolution- Significance test

Page 11: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

11

Remove noisy relations: 1st approach

• Conflict Resolution (when both a->b and b->a appear)– Relation conflicts occur because of noise– Voting scheme:

Keep ab (and discard ba)

If Nu(ab) > 1 and Nu(ab) > Nu(ba)

insect

butterfly

butterfly

insect

10 2

Page 12: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

12

Remove noisy relations:2nd approach

• Significance Test- Use statistical significance test to decide if a b is significant

- Null hypothesis: observed relation ab was generated by chance, via the random, independent generation of individual concepts a, b (according to the binomial distribution).

# observations

rejectaccept

# of ab

Is “b” narrower than “a” by chance?

Page 13: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

13

Link concepts and select path

• Link concepts: assume that same terms refer to the same concept.

anim

bug insect

moth

26 72

4 18

1

4 possible paths from anim moth:1) abim2) aim3) am4) abm

Network Bottleneck idea: “the flow bottleneck is a minimum flow capacity among all relations in the path”

1) abim [BN score = min(26,1,18) = 1]2) aim [BN score = min(72,18) = 18]3) am [BN score = min(10) = 10]4) abm [BN score = min(26,4) = 4]

10

anim

bug+

• Select path: link relations from many users can cause a spaghetti graph

anim

insect

anim

bug insect

Page 14: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

14

Evaluation & Data Set

Contribution#2:Learning Concept Hierarchies

• Hypothesis: the approach that takes explicit relations into account can induce better hierarchies.

• “Better” means more consistent with hand-built hierarchies (ODP ver. 10/08)

• The baseline approach is subsumption approach [Schmitz06] Collection and set terms are used instead of tags, making it comparable.

Data Set: Data from 17 user groups, devoted to wildlifeand naturalist photography

21,792 of 39,922 users specify at least one collection

110,543 unique terms (c.f. 166,153 unique terms in ODP), 15,495 terms in common.

Page 15: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

15

Evaluation methodologyODP has many sub hierarchies: comparing to the induced ones are impractical!

Contribution#2:Learning Concept Hierarchies

It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare.

Reference hierarchy

Relations (right after tokenized)

Induced hierarchy

Induce (remove

noise+link)

(ODP)

Page 16: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

16

Metrics• Taxonomic Overlap [adapted from Maedche02+]

– measuring structure similarity between two trees– for each node, determining how many ancestor and

descendant nodes overlap to those in the reference tree.

• Lexical Recall– measuring how well an approach can discover

concepts, existing in the reference hierarchy (coverage)

Contribution#2:Learning Concept Hierarchies

Page 17: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

17

Quantitative Results

Page 18: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

18

Quantitative Results

Contribution#2:Learning Concept Hierarchies

• Manually selecting 32 root nodes

• Taxonomic Overlap :• 27 of them are better than those by subsumption • 3 of them get zero score in both approaches

• Lexical Recall:• 28 of them are better than those by subsumption• 2 of them get similar score on both approaches• the rest, by subsumption, only induce the root node.

• The proposed approach can induce deeper trees

The proposed approach can induce hierarchies more consistent with ODP in almost all cases.

Page 19: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

19

Sport hierarchy

Page 20: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

20

Invertebrate hierarchy

Page 21: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

21

Country hierarchy

Page 22: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

22

Discussion

• Simple strategy to aggregate a large number of shallow relations specified by different users into a common, deeper hierarchy

• Induced hierarchies are more consistent with ODP

• Future work includes:Term ambiguityRelation typesGlobal pathApply to other datasets

Page 23: 1 Constructing Folksonomies from User- Specified Relations on Flickr Anon Plangprasopchok and Kristina Lerman (WWW 2009)

23

Thank You!&

Questions?