29
Copyright © 2006 Access Innovations, Inc. 1 Building Building Taxonomies Taxonomies Part 3 Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006

Building Taxonomies Part 3

Embed Size (px)

DESCRIPTION

Building Taxonomies Part 3. Alice Redmond-Neal Access Innovations, Inc . Enterprise Search Summit New York City, May 21, 2006. Build a taxonomy – simple steps. Get paper and pencil Sharpen pencil Define subject field Collect terms Organize terms Fill in gaps - PowerPoint PPT Presentation

Citation preview

Page 1: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 1

Building Building TaxonomiesTaxonomies

Part 3Part 3 Alice Redmond-NealAccess Innovations, Inc.

Enterprise Search SummitNew York City, May 21, 2006

Page 2: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 2

Build a taxonomy – simple Build a taxonomy – simple stepsstepsBuild a taxonomy – simple Build a taxonomy – simple stepssteps

• Get paper and pencil– Sharpen pencil

• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms

You’re done!

Page 3: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 3

Define subject fieldDefine subject fieldDefine subject fieldDefine subject field

• Review representative collection of content• Determine:

– Core areas – Peripheral topics

PsychologyEducation

Sociology

Law

• Scope can be modified later

Page 4: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 4

Before you go on: Build or Before you go on: Build or buy?buy?Before you go on: Build or Before you go on: Build or buy?buy?

• Survey existing thesaurus/taxonomy resources for your domain

• Test for– Scope– Depth

• Make-or-break terms

– Cost

Don’t reinvent the wheel!

Page 5: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 5

Collect termsCollect termsCollect termsCollect terms

• Your documents and databases• Departmental terminology• Text books and their indexes (indices)

• Book tables of contents and indexes• Journal quarterly indexes• Encyclopediae• Lexicons, glossaries on the topic• Web resources• Users and experts• Search logs

Page 6: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 6

Gather terms from search Gather terms from search logslogsGather terms from search Gather terms from search logslogs

Beyond the Spider: The Accidental Thesaurus (Richard Wiggins, Information Today, Oct 2002)

Top ~100 search terms from search logs Match to web site with appropriate answer Basis for favorites or best bets, presented at the

top of results list. (AKA behavior-based taxonomy)

Not a thesaurus or taxonomy, but still a useful source of terms.

Page 7: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 7

Organize terms – roughlyOrganize terms – roughlyOrganize terms – roughlyOrganize terms – roughly

• Sort terms into several major categories – logical groups of similar concepts as Top Terms– Identify core areas and peripheral topics– 10 – 20 to start– Consider moving proper names to authority

files• Result: loose collection of terms under

several main headings– Rough and tentative – see how it fits as you go– Initial gap analysis– Add / modify / delete as needed

Page 8: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 8

Labelling a concept – Labelling a concept – cognitive cognitive

linguisticslinguistics

Labelling a concept – Labelling a concept – cognitive cognitive

linguisticslinguistics• Most-used labels are middle in range from

abstract to specific --- relates to search • Linguistic universal – true across cultures

• Unique beginner

• Life form• Generic

• Specific• Varietal

Insurance Health insurance Group health insurance

Practical application?

Page 9: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 9

Craft the Top Terms Craft the Top Terms Craft the Top Terms Craft the Top Terms

• Toughest job and most important step!• Dictates further organization• Determines how browsers/searchers

perceive the taxonomy – Coverage– Formality

• Establish the concept first, tweak the wording later

Page 10: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 10

The term recordThe term recordThe term recordThe term record

• Main Term (MT) • Top Term (TT)• Broader Terms (BT)• Narrower Terms (NT)• Related Terms (RT)

– See also (SA)

• Scope Note (SN)• History (H)• NonPreferred Term (NP)

– Used for (UF), See (S)

see Lexicographer’s lexicon

= subject term, heading, node, category, descriptor, class

TAXONOMY

THESAURUS

Page 11: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 11

Usefulness of a term – Usefulness of a term – the “duh” factorthe “duh” factor

Usefulness of a term – Usefulness of a term – the “duh” factorthe “duh” factor

• Some terms are so basic for a domain that they have little or no value– “Sports” in Sports Illustrated– “Technology” in Technology Review– “Golf” in Golf Magazine

• How useful will the term be for indexing?– Apply to everything in the domain?– Distinguish important concepts?– If term is needed, specify limited use

conditions in Scope Note

Page 12: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 12

Hierarchy structures – Hierarchy structures – variations on a variations on a

themetheme

Hierarchy structures – Hierarchy structures – variations on a variations on a

themetheme• Not pre-determined

– Winestypevarietyregioncost– Or Winescosttype….

• Varies by user group and needs– May have multiple views of same content– Standard alpha view or customized notation

• Affects information architecture, i.e. how web site functions

Page 13: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 13

How do terms relate?How do terms relate?How do terms relate?How do terms relate?

• Hierarchical relationships-- Parents and their

children• Equivalence relationships

-- Aliases• Associative relationships

-- Cousins

TAXONOMY

THESAURUS

Page 14: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 14

Hierarchical relationshipsHierarchical relationshipsHierarchical relationshipsHierarchical relationships

• Broader Term represents the category • Narrower Term represents the specific • Three types:

– Generic relationship (BTG/NTG)– Whole-part relationship (BTP/NTP)– Instance relationship (BTI/NTI)

• BTs/NTs have a reciprocal relationship

Page 15: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 15

Broader to Narrower Broader to Narrower TermsTermsBroader to Narrower Broader to Narrower TermsTerms

Gubernatorial elections

Politics

ElectionsPresidential elections

Mayoral electionsGeneric Specific Varietal

Page 16: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 16

Hierarchy – Generic Hierarchy – Generic (genus-species) (genus-species) relationshiprelationship

Hierarchy – Generic Hierarchy – Generic (genus-species) (genus-species) relationshiprelationship• Inheritance or inclusion – what’s true of

the parent (BT) is true for all children (NTs)

• Applies to entities, actions, properties, agents – not just biological taxonomies

Value Teachers Thinking Cultural value Adult educators Contemplation Economic value School teachers Divergent thinking Moral value Special ed teachers Lateral

thinking Social value Student teachers Reasoning

Page 17: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 17

Generic relationship test – Generic relationship test – 11 Generic relationship test – Generic relationship test – 11

• Both terms in same fundamental category• “All-and-some” test

SOME ALL

SOME NOT ALL

Rodents

Squirrels

Pests

Squirrels

Page 18: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 18

Generic relationship test – Generic relationship test – 22Generic relationship test – Generic relationship test – 22

Pests

Squirrels

Rodents

ALL squirrels are rodents x NOT ALL squirrels are pestsx NOT ALL pests are rodents

Page 19: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 19

Hierarchy – Whole-part Hierarchy – Whole-part relationshiprelationshipHierarchy – Whole-part Hierarchy – Whole-part relationshiprelationship

• Also known as meronymy or partonomy• Four types allowed in thesaurus standards

– Body systems and organs• Ear Middle ear

– Geographical locations• Bernalillo County Albuquerque

– Fields of study• Geology Physical geology

– Hierarchical organizational/corporate/social/political structures• Diocese Parish

Page 20: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 20

Hierarchy – Instance Hierarchy – Instance relationshiprelationshipHierarchy – Instance Hierarchy – Instance relationshiprelationship

• General category (common noun) = BT• Individual example (proper noun) = NT

Seas New York museumsBaltic Sea Guggenheim Museum Caspian Sea Museum of Modern Art

Mediterranean Sea Museum of Natural History

Essentially identical to “final node” in taxonomies.Best practice: long list move to authority file

Page 21: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 21

Polyhierarchical Polyhierarchical relationshiprelationshipPolyhierarchical Polyhierarchical relationshiprelationship

• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)

• New to ANSI/NISO standardsSpoons Forks Sporks Sporks

Nurses Health administrators Nurse administrators Nurse administrators

Finance Careers Accounting Accounting

Page 22: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 22

Equivalence relationshipEquivalence relationshipEquivalence relationshipEquivalence relationship

• Preferred Term – Thesaurus term and valid for indexing– Thesaurus notation: USE

• NonPreferred Term– Not valid for indexing– An alias or imposter– Entry point, directs user to Preferred Term– Thesaurus notation: UF or NPT

Spiders Plant pathology UF Arachnids USE Phytopathology

Page 23: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 23

Equivalence – when to useEquivalence – when to useEquivalence – when to useEquivalence – when to use

• Synonyms, slang, quasi-synonyms• Scientific and trade names

– Ibubrofen UF Motrin™• Lexical variants

– Fiber optics UF Fibre optics– Mouse UF Mice

• Upward posting of narrow concepts not specified in taxonomy or thesaurus– Social class UF Elite, Middle class, Working

classGet equivalent terms from search logs, brainstorming…

Page 24: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 24

Associative relationshipAssociative relationshipAssociative relationshipAssociative relationship• Related Terms (RTs) ~ cousins• “…terms related conceptually but not

hierarchically, and are not part of an equivalence set” (i.e. not synonyms)– Should siblings be Related Terms??

• Both terms are valid thesaurus terms for indexing, and have reciprocal relationship

• Expands user’s awareness, reflects thesaurus coverage of unanticipated areas

• Standards describe specific types (see Lexicon)

Page 25: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 25

Sibling rivalry and facetsSibling rivalry and facetsSibling rivalry and facetsSibling rivalry and facets• Format and sense of sibling terms should

be consistent• If siblings don’t coexist well, separate them • Subdivide large groups of terms into facets,

mutually exclusive subcategories• Growing demand with faceted navigation• Facet examples

– Properties, Materials, Agents, Actions, Influence– Objects, Styles and periods, Color, Shape

(Art & Architecture Thesaurus)

Page 26: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 26

Faceted classificationFaceted classificationFaceted classificationFaceted classification

• Pharmaceuticals– (by action)

• Anti-inflammatory agents…

– (by chemical structure)• Alkaloids…

– (by indication)• Pain…

– (by use)• Immunosuppression…

Facet indicators (aka Node labels), not to be used for indexing

Page 27: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 27

Faceting challengeFaceting challengeFaceting challengeFaceting challenge

• Paint– Oil paint– High-gloss paint– Interior paint– Matte paint– Latex paint– Semi-gloss paint– Exterior paint

Propose facet indicators and subgroup these paint varieties into facets.

Page 28: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 28

Do you agree?Do you agree?Do you agree?Do you agree?

• Paint– (by type)

• Oil paint• Latex paint

– (by use)• Interior paint• Exterior paint

– (by surface)• High-gloss paint• Matte paint• Semi-gloss paint

Page 29: Building Taxonomies Part 3

Copyright © 2006 Access Innovations, Inc. 29

Scope Notes (SN)Scope Notes (SN)Scope Notes (SN)Scope Notes (SN)

• Indicate meaning of the term in the context of this thesaurus, for this audience– Stress – Metal, Psychological, Physiological

• Indicate any restriction in meaning• Indicate range of topics covered• Provide direction for indexers; for terms

often confused, may suggest an alternative term

• Use only as needed – not for every term• Establish and stick with consistent format• Be concise