89
Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS- 9984741.

Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS-9984741

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Transforming Tags to (Faceted) Tagsonomies

Marti HearstUC Berkeley School of Information

This Research Supported by NSF IIS-9984741.

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Focus: Search and Navigation of Large Collections

ImageCollections

E-GovernmentSites

Example: the University of California Library Catalog

Shopping SitesDigital Libraries

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

What do we want done differently?

• Organization of results• Hints of where to go next• Flexible ways to move around

• … How to structure the information?

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Outline• Motivation: support for browsing big collections

– Focus on usability for a wide range of lay users

• Approach: flexible application of hierarchical faceted metadata– Advantages of the approach– Results of usability studies

• Automated Facet Creation– We have a nearly-automated algorithm that works well– I think it could greatly improve folksonomies

organization

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

How to Structure Information for Search and Browsing?

• Hierarchy is too rigid

• KL-One is too complex

• Hierarchical faceted metadata:– A useful middle ground

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

What are facets?• Sets of categories, each of which describe a

different aspect of the objects in the collection.• Each of these can be hierarchical.• (Not necessarily mutually exclusive nor

exhaustive, but often that is a goal.)

Time/Date Topic RoleGeoRegion

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Facet example: Recipes

Course

Main Course

CookingMethod

Stir-fry

Cuisine

Thai

Ingredient

Red Bell Pepper

Curry

Chicken

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Example of Faceted Metadata:Categories for Biomedical Journal Articles

1. Anatomy [A]

2. Organisms [B]

3. Diseases [C]

4. Chemicals and Drugs [D]

1. Lung

2. Mouse

3. Cancer

4. Tamoxifen

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Goal: assign labels from facets

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Motivation Description: 19th c. paint horse; saddle and hackamore; spurs; bandana on rider; old time cowboy hat; underchin thong; flying off.

Nature Animal Mammal Horse

Occupations Cowboy

Clothing Hats Cowboy Hat

Media Engraving Wood Eng.

Location North America America

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Motivation Description: 19th c. paint horse; saddle and hackamore; spurs; bandana on rider; old time cowboy hat; underchin thong; flying off.

By using facets,what we are not capturing?

The hat flew off;The bandana stayed on.

The thong is part of the hat.

The bandana is on the cowboy(not the horse). The saddle is on the horse (not the cowboy).

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Hierarchical Faceted Metadata

• A simplification of knowledge representation

• Does not represent relationships directly

• BUT can be understood well by many people when browsing rich collections of information.

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

How to Put In an Interface?Some Challenges:

• Users don’t like new search interfaces.

• How to show lots of information without overwhelming or confusing?

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

A Solution (The Flamenco Project)

• Use proper HCI methods.

• Organize search results according to the faceted metadata so navigation looks similar throughout

– Easy to see what to go next, were you’ve been

– Avoids empty result sets

– Integrates seamlessly with keyword search

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Art History Images Collection

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Information previews• Use the metadata to show where to go next

– More flexible than canned hyperlinks– Less complex than full search

• Help users see and return to previous steps• Reduces mental work

– Recognition over recall– Suggests alternatives

• More clicks are ok iff (J. Spool)• The “scent” of the target does not weaken• If users feel they are going towards, rather than away,

from their target.

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

What is Tricky About This?

• It is easy to do it poorly• It is hard to be not overwhelming

– Most users prefer simplicity unless complexity really makes a difference

– Small details matter

• It is hard to “make it flow”

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Search Usability Design Goals

1. Strive for Consistency

2. Provide Shortcuts

3. Offer Informative Feedback

4. Design for Closure

5. Provide Simple Error Handling

6. Permit Easy Reversal of Actions

7. Support User Control

8. Reduce Short-term Memory Load

From Shneiderman, Byrd, & Croft, Clarifying Search, DLIB Magazine, Jan 1997. www.dlib.org

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Usability Studies• Usability studies done on 3 collections:

– Recipes: 13,000 items– Architecture Images: 40,000 items– Fine Arts Images: 35,000 items

• Conclusions:– Users like and are successful with the

dynamic faceted hierarchical metadata, especially for browsing tasks

– Very positive results, in contrast with studies on earlier iterations.

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Post-Test Comparison

15 16

2 30

1 29

   4 28

8 23

6 24

28 3

1 31

2 29

FacetedBaseline

Overall Assessment

More useful for your tasksEasiest to useMost flexible

More likely to result in dead endsHelped you learn more

Overall preference

Find images of rosesFind all works from a given period

Find pictures by 2 artists in same media

Which Interface Preferable For:

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Advantages of the Approach• Honors many of the most important usability

design goals– User control– Provides context for results– Reduces short term memory load– Allows easy reversal of actions– Provides consistent view

• Allows different people to add content without breaking things

• Can make use of standard technology

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Advantages of the Approach

• Systematically integrates search results:– reflect the structure of the info architecture– retain the context of previous interactions

• Gives users control and flexibility – Over order of metadata use– Over when to navigate vs. when to search

• Allows integration with advanced methods– Collaborative filtering, predicting users’ preferences

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Disadvantages

• Does not model relations explicitly• Does it scale to millions of items?

– Adaptively determine which facets to show for different combinations of items

• Requires faceted metadata!

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Opportunities

• Creating hierarchical faceted categories– Assigning items to those categories– Adaptively adding new facets as data changes

• A new approach to personalization: – User-tailored facet combinations

• Create task-based search interfaces– Equate a task with a sequence of facet types

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Creating Classifications from Data

• Most approaches are associational– AKA clustering, LSA, LDA, etc.– This leads to poor results when applied to text

• To derive facets, need a different angle– We have a simple approach based on

WordNet

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Clustering (The Hope)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Clustering (The Hope)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Clustering (The Reality)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Clustering (The Reality)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Example: Recipes (3500 docs)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Blei, Ng, & Jordan ’03 (Latent Dirichlet Allocation)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Blei, Ng, & Jordan ’03 (Latent Dirichlet Allocation)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Sanderson & Croft ’99Term Subsumption

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Sanderson & Croft ’99Term Subsumption

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Stoica & Hearst ’04WordNet-based

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Application to Photo Labeling

• ANLP class project Fall ’04– Earlier version of code– Masters students: Jeff Towle and Simon King

• Dataset: 1650 very short photo labels• Procedure

– Students simply ran the code– Had to remove proper names– Re-ran the code; done!

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Example Photos

very scary x-mas tree Hp presentation

chasing a cat in the dark My cat

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

• instrumentality, (112) – vehicle (26)

• car (9) • bike (8) • vessel, watercraft (4)

– mayflower (2) – ferry (1) – gig (1)

• truck (3) • airplane (2)

– device (20) • machine (7)

– computer (4) – laptop (1) – sander (1)

– container (16) • vessel (7)

– bottle (5) » water_bottle (2) » jug (1) » pill_bottle (1)

– bath (2) – bowl (1)

• can (2) • backpack (1) • bumper (1) • empty (1) • salt_shaker (1)

– furniture, piece of furniture, article of furniture (12)

• seat (8) – bench (2) – chair (2) – couch (2) – lounge (1)

• bed (4) • desk (1)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Associational techniques• Pros:

– Sometimes terms grouped to get a general concept• Airline, airplane, pilots, flight

• Cons:– Highly unpredictable– Not comprehensive

• Dollar and yen but no deutchmarks

• Eastern but no other directions

– Not uniform in subject matter• Mixing currencies with countries with timing

• Mixing compass directions with airlines

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Lexical Hierarchy-based• Pros

– Faceted and hierarchical– Consistent is-a hierarchies– Comprehensiveness more likely

• Cons– Doesn’t provide overall themes

• Airlines, pilots, airplanes

– Sometimes uses wrong word sense– Sometimes the right term/hierarchy is not present

• Doesn’t have “dish type” nor “cuisine” for recipes• Specialized domains won’t work

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Our Approach• Leverage the structure of WordNet

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Our Approach• Leverage the structure of WordNet

Doc

umen

ts

WordNet

Get hypernym

paths

Sel

ect

ter

ms

Build tree

Compresstree

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

1. Select Terms

red blue

• Select well distributed

terms from collection Doc

ume

nts

WordNet

Get hypernym

pathsSel

ect

term

s

Build tree

Comp. tree

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

2. Get Hypernym Path

red blue

chromatic color

abstraction

property

visual property

color

red, redness

abstraction

property

visual property

color

blue, blueness

chromatic color

Doc

ume

nts

WordNet

Get hypernym

pathsSel

ect

te

rms

Build tree

Comp. tree

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

3. Build Tree

red blue

chromatic color

abstraction

property

visual property

color

red, redness

abstraction

property

visual property

color

blue, blueness

chromatic color

red blue

abstraction

property

visual property

color

red, redness

chromatic color

blue, blueness

Doc

ume

nts

WordNet

Get hypernym

pathsSel

ect

te

rms

Buildtree

Comp. tree

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

4. Compress Tree

Doc

ume

nts

WordNet

Get hypernym

pathsSel

ect

te

rms

Build tree

Comp.tree

red, redness

color

red

chromatic color

blue, blueness

blue

green, greenness

green green red

color

chromatic color

blue

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

4. Compress Tree (cont.)

red

color

chromatic color

blue green

color

red blue green

Doc

ume

nts

WordNet

Get hypernym

pathsSel

ect

te

rms

Build tree

Comp. tree

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Disambiguation• Ambiguity in:

– Word senses– Paths up the hypernym tree

Sense 1 for word “tuna”organism, being => plant, flora => vascular plant => succulent => cactus

=> tuna

Sense 2 for word “tuna”organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna

2 paths for same word

2 paths for

same sense

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

How to Select the Right Senses and Paths?

• First: build core tree– (1) Create paths for words with only one sense– (2) Use Domains

• Wordnet has 212 Domains– medicine, mathematics, biology, chemistry, linguistics, soccer, etc.

• Automatically scan the collection to see which domains apply• The user selects which of the suggested domains to use or

may add own • Paths for terms that match the selected domains are added to

the core tree

• Then: add remaining terms to the core tree.

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Using Domains

dip glosses:

Sense 1: A depression in an otherwise level surface

Sense 2: The angle that a magnet needle makes with horizon

Sense 3: Tasty mixture into which bite-size foods are dipped

dip hypernyms

Sense 1 Sense 2 Sense 3

solid shape, form food

=> concave shape => space => ingredient, fixings

=> depression => angle => flavorer

Given domain “food”, choose sense 3

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Opportunities for Tagging• New opportunity: Tagging, folksonomies

– (flickr de.lici.ous)– People are created facets in a decentralized manner– They are assigning multiple facets to items– This is done on a massive scale– This leads naturally to meaningful associations

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

http://www.airtightinteractive.com/projects/related_tag_browser/app/

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

This Doesn’t Solve Everything• Harder to determine what’s related to more

complex terms• Still not good for finding a recipe using potatoes

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Linking Metadata Into Tasks

• Old Yahoo restaurant guide combined:– Region – Topic (restaurants) – Related Information

• Other attributes (cuisines)

• Other topics related in place and time (movies)

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Green: restaurants & attributes

Red: related in place & time

Yellow: geographic region

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Other Possible Combinations• Region + A&E• City + Restaurant + Movies• City + Weather• City + Education: Schools• Restaurants + Schools• …

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Creating Tasks from HFM

• Recipes Example:– Click Ingredient > Avocado– Click Dish > Salad– Implies task of “I want to make a Dish type d with an

Ingredient i that I have lying around”– Maybe users will prefer to select tasks like these over

navigating through the metadata.

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Summary

• Flexible application of hierarchical faceted metadata is a proven approach for navigating large information collections.

– Midway in complexity between simple hierarchies and deep knowledge representation.

• Perhaps HFM is a good stepping stone to deeper semantic relations

– Currently in use on e-commerce sites; spreading to other domains

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Opportunities

• Creating hierarchical faceted categories– Assigning items to those categories– Adaptively adding new facets as data

changes

• A new approach to personalization: – User-tailored facet combinations

• Create task-based search interfaces– Equate a task with a sequence of facet types

• Make use of folksonomies data!

Faceted Metadata in Search InterfacesMarti Hearst: UC Berkeley SIMS

Acknowledgements

• Flamenco team– Brycen Chun– Ame Elliott– Jennifer English– Kevin Li– Rashmi Sinha– Emilia Stoica– Kirsten Swearingen– Ping Yee

• Thanks also to NSF (IIS-9984741)