33
Realizing the Full Potential of Taxonomies Content Strategy Workshops Vancouver, BC, July 12, 2013 Branka Kosovac, dotWit Consulting [email protected]

Realizing the Full Potential of Taxonomies by Branka Kosovac

Embed Size (px)

Citation preview

Page 1: Realizing the Full Potential of Taxonomies by Branka Kosovac

Realizing the Full Potential of Taxonomies

Content Strategy Workshops

Vancouver, BC, July 12, 2013

Branka Kosovac, dotWit Consulting

[email protected]

Page 2: Realizing the Full Potential of Taxonomies by Branka Kosovac

1 2

3

Page 3: Realizing the Full Potential of Taxonomies by Branka Kosovac

4

6 7

5

Page 4: Realizing the Full Potential of Taxonomies by Branka Kosovac

8

9 10

Page 5: Realizing the Full Potential of Taxonomies by Branka Kosovac

11

12

13

Page 6: Realizing the Full Potential of Taxonomies by Branka Kosovac

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#"> <skos:Concept rdf:about="http://www.my.com/#canals"> <skos:definition>A feature type category for places such as the Erie Canal</skos:definition> <skos:prefLabel>canals</skos:prefLabel> <skos:altLabel>canal bends</skos:altLabel> <skos:altLabel>canalized streams</skos:altLabel> <skos:altLabel>ditch mouths</skos:altLabel> <skos:altLabel>ditches</skos:altLabel> <skos:altLabel>drainage canals</skos:altLabel> <skos:broader rdf:resource="http://www.my.com/#hydrographic%20structures"/> <skos:related rdf:resource="http://www.my.com/#channels"/> <skos:related rdf:resource="http://www.my.com/#transportation%20features"/> <skos:related rdf:resource="http://www.my.com/#tunnels"/> <skos:scopeNote>Manmade waterway used by watercraft or for drainage, irrigation, mining, or water power</skos:scopeNote> </skos:Concept> </rdf:RDF>

14

Page 7: Realizing the Full Potential of Taxonomies by Branka Kosovac

<owl:Class rdf:ID="Wine"> <rdfs:subClassOf rdf:resource="&food;PotableLiquid"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasMaker" /> <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasMaker" /> <owl:allValuesFrom rdf:resource="#Winery" /> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#madeFromGrape" /> <owl:minCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:minCardinality> </owl:Restriction> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasBody" /> <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasColor" /> <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:cardinality> </owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#locatedIn"/> <owl:someValuesFrom rdf:resource="&vin;Region"/> </owl:Restriction> </rdfs:subClassOf> <rdfs:label xml:lang="en">wine</rdfs:label> <rdfs:label xml:lang="fr">vin</rdfs:label> </owl:Class>

15

Page 8: Realizing the Full Potential of Taxonomies by Branka Kosovac

Continuum from enumerations to ontologies

Enumeration Classification (Scheme)

Subject Headings Controlled Vocabulary

Semantic Network

Term Base

Light Ontology

Thesaurus

Ontology Contextual Taxonomy

Enterprise Taxonomy

Business Taxonomy

Tagging Taxonomy

Navigation Taxonomy

Profiling Taxonomy

Page 9: Realizing the Full Potential of Taxonomies by Branka Kosovac

Uses • Accessing information

– Browsing • Hierarchy

• Filtering

• Cross-navigation

– Search • Full-text search

• Advanced search

• Faceted search

• Matching – Personalization/Targeting

– Contextual advertising

– Contextualization

– Security

– Content to person

– Product to product

– Person to person….

• Information management – Managing access

– Managing display

– Managing currency

– …

• Integration & interoperability

• Analytics & visualization

• Mining & intelligence

• Natural language processing

• Terminology management

• eDiscovery

• ….

Page 10: Realizing the Full Potential of Taxonomies by Branka Kosovac

How

Infrastructure Taxonomy; Schemas; Mappings; Standards

Magic Description/tagging, classification/filing, matching, search engine configuration…

Automated, manual, semi-automated

UI Navigation, search UI, search results, personalized/targeted/contextualized delivery…

Page 11: Realizing the Full Potential of Taxonomies by Branka Kosovac

Objects

• Documents • Webpages • Content components • Digital assets • Knowledge assets • Marketing

assets/resources • Records • Social content • Products • People profiles • …

• Subject domain

• Enterprise

• Intranet

• Website

• World Wide Web

• Catalogue

– Single channel

– Multi-channel

• Application

• …

Scopes

Page 12: Realizing the Full Potential of Taxonomies by Branka Kosovac

Elements Categories Labels Relationships

Descriptions Codes (language independent) Hierarchy Designed organic

Scope notes Preferred Typed Named Formally defined

Formal definitions (for computer inference)

Alternative Synonym rings Equivalence relationships

Generic (Is a kind of) Partitive (is a part of) Instance of (is an instance of)

Typed Associative

Multilingual

Transitivity Reflectivity Symmetry

Associated vocabulary (for auto-classification)

user-added keywords, hashtags (for social content)

Page 13: Realizing the Full Potential of Taxonomies by Branka Kosovac

• Those that belong to the emperor • Embalmed ones • Those that are trained • Suckling pigs • Mermaids (or Sirens) • Fabulous ones • Stray dogs • Those that are included in this classification • Those that tremble as if they were mad • Innumerable ones • Those drawn with a very fine camel hair brush • Et cetera • Those that have just broken the flower vase • Those that, at a distance, resemble flies

Taxonomy of Animals in Celestial Emporium of Benevolent Knowledge

from Jorge Luis Borges essay "The Analytical Language of John Wilkins", 1942

Page 14: Realizing the Full Potential of Taxonomies by Branka Kosovac

KINGDOM STRUCTURAL

ORGANIZATION METHOD OF

NUTRITION

Monera small, simple single prokaryotic cell (nucleus is

not enclosed by a membrane); some form

chains or mats

absorb food and/or

photosynthesize

Protista large, single eukaryotic cell (nucleus is

enclosed by a membrane); some form chains

or colonies

absorb, ingest, and/or

photosynthesize food

Fungi multicellular filamentous form with

specialized eukaryotic cells absorb food

Plantae multicellular form with

specialized eukaryotic cells; do not have their own means of locomotion

photosynthesize food

Animalia multicellular form with

specialized eukaryotic cells; have their own

means of locomotion

ingest food

Definitions of Kingdom categories in the Linnaean Classification of Living Things

Page 15: Realizing the Full Potential of Taxonomies by Branka Kosovac

Linnaean Classification of Living Things: hierarchy for homo sapiens Images taken from: Encyclopaedia Britannica

ANIMALIA

CHORDATA

SAPIENS

MAMMALIA

ORDER

GENUS

SPECIES

eukaryotic cells having cell membrane but lacking a cell wall, multicellular, heterotrophic

animals with a notochord, dorsal nerve cord, and pharyngeal gill slits, which may be vestigial PHYLUM

KINGDOM

CLASS

PRIMATES

warm-blooded vertebrates with hair and mammary glands which, in females, secrete milk to feed young

FAMILY upright posture, large brain, stereoscopic vision, flat face, hands and feet have different specializations

HOMINIDAE

s-curved spine HOMO

HABILIS ERECTUS

high forehead, well-developed chin, skull bones thin

collar bone, eyes face forward, grasping hands with fingers, and two types of teeth: incisors and molars

Page 16: Realizing the Full Potential of Taxonomies by Branka Kosovac

Classification theories

Aristotle’s categories • Class definitions • Membership based on shared characteristics--

necessary and sufficient conditions • Strong influence on Western thinking • Not how the real world works, but is what

Western audiences are expecting

Prototype theory • Categories based on prototypes • Membership decided based on family

resemblances

Page 17: Realizing the Full Potential of Taxonomies by Branka Kosovac

Sometimes it’s easy

Page 18: Realizing the Full Potential of Taxonomies by Branka Kosovac

• when there is a single clear distinguishing feature

• when there are well established categories (someone of authority created them, e.g. state/province, zodiac sign, blood type, …)

• when you work at a “basic category” level

• when the collection is not too large and diverse

• when it’s single use • when homogeneous audience

Sometimes it’s easy

Select v

circle square triangle

Page 19: Realizing the Full Potential of Taxonomies by Branka Kosovac

Sometimes a bit less easy

Page 20: Realizing the Full Potential of Taxonomies by Branka Kosovac

Sometimes a bit less easy

Color Blue Red Yellow

Shape

Circle Square Triangle

Size

Small Medium Big

But what if…

• Your technology does not support faceted approach or polyhierarchy?

• These are physical objects: • Table linen you have to put into

your drawer? • Earrings?

Page 21: Realizing the Full Potential of Taxonomies by Branka Kosovac

And sometimes…

Page 22: Realizing the Full Potential of Taxonomies by Branka Kosovac

When it gets complicated

• large and diverse collections

• multiple uses

• diverse user groups

• cultural differences

• cultural/political sensitivities

• no formal agreement/authoritative source

• emerging and volatile domains

• far from “basic categories”

• ….

Page 23: Realizing the Full Potential of Taxonomies by Branka Kosovac

What to do then?

• There are some general (but not universal) rules • and some tricks of trade • but above all: context, context, context…

– external users vs. internal audience – human use vs. computer inference – impact of error – use scenarios – display constraints – supporting technology – costs…

Page 24: Realizing the Full Potential of Taxonomies by Branka Kosovac

Categories

• mutually exclusive

• collectively exhaustive

• clear grouping principle

• relevant grouping principle

• homogeneous peer categories

• pre-coordination vs. post-coordination

• compound concepts (“first aid” vs. “coal extraction”)

Page 25: Realizing the Full Potential of Taxonomies by Branka Kosovac

Labels

• clear

• unambiguous

• informative

• brief

• suitable for audience

• consistently formatted

• grammatically parallel

• no abbreviations, jargon, concatenation

Page 26: Realizing the Full Potential of Taxonomies by Branka Kosovac

Hierarchy

• consistent or varied depth?

• defined levels, typed relationships, or organic?

• polyhierarchy?

• lots of top level categories or deep hierarchy?

• transitive or not transitive?

Page 27: Realizing the Full Potential of Taxonomies by Branka Kosovac

Overall structure

• logical • consistent • well-balanced • extensible • fit for purpose (scenarios, business goals…) • ordering logical and consistent • top levels convey the scope • no single-child categories • no Other/Miscellaneous/General

Page 28: Realizing the Full Potential of Taxonomies by Branka Kosovac

Some techniques

• Standardize, but not more than necessary

• Consensus vs. mapping vs. standardized core and general rules

• Derivative local taxonomies—mix & match

• Scoped labels and/or relationships

• If future use not known, follow general rules, define ad document as much as possible

Page 29: Realizing the Full Potential of Taxonomies by Branka Kosovac

How to begin

• make sure you know what your taxonomy needs to do–now and in the future – user research, business requirements, vision, scenarios

• make sure you know all the constraints – tools, costs (including long-term maintenance), available expertise,

organizational culture…

• promote and obtain high-level management support

• gather sources: – user warrant (search logs, social content, user research/feedback logs)

– content warrant (your content, global content, your competitors’…)

– existing metadata, folksonomies, glossaries, formal or informal taxonomies…

– publicly available taxonomies—reuse, adapt, start from scratch (e.g. Linked Data, Taxonomy Warehouse)

Page 30: Realizing the Full Potential of Taxonomies by Branka Kosovac

How to develop • Combination of:

– Top down (domain modelling)

– Bottom up (terminology clustering, open card sort)

• Design & Strategy – Metadata element set, associated facets/branches

– Category/term properties, relationship types, hierarchy levels…

– Sustainable maintenance strategy

– Metrics

– Roadmap

• Development – Know where to stop

• Validation & Testing – Throughout development and beyond

Page 31: Realizing the Full Potential of Taxonomies by Branka Kosovac

How to complete

• Documentation – Scope

– Design

– Maintenance guidelines

– Implementation guidance

– Use guidelines

• Deployment – Work with developers, UX designers, taggers and don’t give up until

properly implemented

• Governance – Roles and responsibilities

– Procedures

Page 32: Realizing the Full Potential of Taxonomies by Branka Kosovac

Exercises

• Exercise groups/topics

• Exercise tasks

– Describe vision (add context details as needed)

– Develop domain model

– High-level taxonomy design and strategy

– Develop key facet

– Record your considerations, sources, thought process