Upload
leanh
View
244
Download
6
Embed Size (px)
Citation preview
1
Taxonomy
Enterprise Content Management (ECM)
ECM ECM ECM Case Study
AIIM ECM Certificate programme
ECMStrategy
ECMPractitioner
ECM Specialist
Case Study
2© AIIM | All rights reserved 2
2
ECM Practitioner Course Outline
Foundations Tools & Instruments
1. Introduction
2. Technologies
& Functionality
4. Create & Capture
5. Metadata
7. Security & Control
10. Delivery & Presentation
8. Process & Automation
11. Trends & Directions
Futures
3
3. Information Architecture
9. Findability6. Taxonomy
© AIIM | All rights reserved 3
Agenda
Defining taxonomies and classification
Subject-based classificationTaxonomies FolksonomiesOntologiesThesaurus and Semantic networks
Business case for classification
4
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
3
Agenda
Defining taxonomies and classification
Subject-based classificationTaxonomies FolksonomiesOntologiesThesaurus and Semantic networks
Business case for classification
5
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Defining taxonomy (1)
Taxonomy is the science of classifying information
A taxonomy is a law for classifying information
Taxonomies are nearly ubiquitous, but poorly understood
6
Source: Dictionary.com
© AIIM | All rights reserved
4
Defining taxonomy (2)
“In recent years, the business world has fallen in love with the term ‘taxonomies’. We use it specifically to referwith the term taxonomies . We use it specifically to refer to a hierarchical arrangement of categories within the user interface of a website or intranet.”
7
Source: Information Architecture for the World Wide Web (Louis Rosenfeld and Peter Morville, 2002)
© AIIM | All rights reserved
AIIM website
1
2
4
3
8© AIIM | All rights reserved
5
Understanding taxonomies
A taxonomy is a classification schemeSuch as the way that an individual classifies the content of their e mailSuch as the way that an individual classifies the content of their e-mail inbox, a personal CD collection, or the contents on an iPod
A taxonomy is a knowledge mapReflects how it’s owner conceives a given body of content (a knowledge domain), for purposes of browsing, navigating, discovering, and sharing that information
9
A taxonomy is semanticIndicating the relationships between concepts, such as the relationships between a car and a steering wheel, in that the steering wheel is a “part of” a car
Source: Organising Knowledge (Patrick Lambe, 2007)
© AIIM | All rights reserved
Category perspectives
Business function
Geo-political
Company focus vs. industry focus
Product or service
Business issues, conditions, events
10
Type/Source of content
© AIIM | All rights reserved
6
Representations of taxonomies (1)
Lists
Trees
Hierarchies
Polyhierarchies
Matrices
11
Facets
System Maps
© AIIM | All rights reserved
Source: Organising Knowledge (Patrick Lambe, 2007)
Representations of taxonomies (2)
ListsSimple collection of relatedSimple collection of related things. The relationship is defined by the purpose of the list.
Good when domain is simple, amount of content is small. Basic building blocks of all other taxonomical representations
Examples: Country codes types
12
Examples: Country codes, types of diseases
Source: Organising Knowledge (Patrick Lambe, 2007) Source: Wikipedia
© AIIM | All rights reserved
7
Representations of taxonomies (3)
TreesRepresents a transition fromRepresents a transition from general to more specific relationships or whole to part.
Good when a list gets to be too long, and “naturally” breaks into subcategories.
Examples: Yellow pages (phone directories)
13
directories)
Source: Organising Knowledge (Patrick Lambe, 2007) Source: CoreFiling.com
© AIIM | All rights reserved
Representations of taxonomies (4)
HierarchiesA specific tree structure that hasA specific tree structure that has inclusiveness, consistency, and maintains the same “type” of relationship at each level. The “child” inherits all of the characteristics of the “parent” and each child can only belong in one place in the taxonomy
14
Works best with mature, formal, logical schemes
Examples: Military rank, Biological, Family Genealogy
Source: Organising Knowledge (Patrick Lambe, 2007)
© AIIM | All rights reserved
8
Representations of taxonomies (5)
PolyhierarchiesUsed when an item belongs inUsed when an item belongs in more than one place in the real world, and multiple organising principles are required. Provides “virtual linking” between hierarchies.Example: a single collection of content concerning diseases can
15
content concerning diseases can be organised/taxonomised via affected body part and causes
Source: Organising Knowledge (Patrick Lambe, 2007) Source: Rosenfeld, Morville (2006)
© AIIM | All rights reserved
Representations of taxonomies (6)
MatricesProvides a 2 or 3 dimensionalProvides a 2 or 3-dimensional cross linking of taxonomies, and an ability to provide differing views into the same body of content.Example: The same content could be located based on project manager, project initiation, and/or
16
manager, project initiation, and/or affected standards
Source: Organising Knowledge (Patrick Lambe, 2007)
© AIIM | All rights reserved
9
Representations of taxonomies (6)
FacetsA multi dimensional taxonomyA multi-dimensional taxonomy comprised of multiple tags, each tag representing an individual taxonomy, thus the content is categorised in multiple ways, within a single interface.Example: selecting wines based on characteristics such as type,
17
on characteristics such as type, price, varietals, regions, appellations, and price.
Source: Organising Knowledge (Patrick Lambe, 2007) Source: wine.com
© AIIM | All rights reserved
Representations of taxonomies (7)
System mapsVisual representations of aVisual representations of a domain of knowledgeLabelled representing taxonomy categoriesExample: A collection of medical content relating to the human nervous system is accessible via a diagram of the human body
18
a diagram of the human body. Each component of that system is illustrated in context, and labelled appropriately.
Source: Organising Knowledge (Patrick Lambe, 2007)
© AIIM | All rights reserved
10
Defining classification
Classification:“The systematic identification and arrangement of business activities and/or records into categories according to logically structured conventions, methods and procedural rules represented in a classification system”
19
Source: ISO 15489
© AIIM | All rights reserved
What is classification?
In simple terms, it’s just grouping information together
Common examples of classification:Cars
by make, model, performanceFood
tinned/fresh, type (meat, vegetable, grain)TV programmes
20
p gcomedy, thriller, quiz show
Clothes adult/child, expensive/cheap, winter/summer
© AIIM | All rights reserved
11
Dewey Decimal system
Used to classify information throughout the western
Dewey Decimal system000 General & Bibliographythroughout the western
worldVery Euro-centric
000 General & Bibliography
100 Philosophy & Psychology
200 Religion
300 Social Science
400 Languages & Linguistics
500 Sciences
21© AIIM | All rights reserved
600 Technology
800 Literature
900 Geography & History
Chinese library classification
43,600 categories. Constantly expanding to meet the needs of a rapidly changing nation
1) Marxism, Leninism, Maoism & Deng Xiaoping Theory
2) Philosophy and Religion3) Social Sciences4) Politics and Law5) Military Science6) Economics
12) Natural Science13) Mathematics, Physics and
Chemistry14) Astronomy and Geoscience15) Life Sciences16) Medicine and Health Sciences17) Agricultural Sciences
Political considerations drive some organisation
22
6) Economics7) Culture, Science, Education, and
Sports8) Languages and Linguistics9) Literature10) Art11) History and Geography
17) Agricultural Sciences18) Industrial Technology19) Transportation20) Aviation and Aerospace21) Environmental Science
© AIIM | All rights reserved
12
US Library of Congress
Used to categorise books published in the United States
Expanded categories emphasise USA-specific history and interestsA) General WorksB) Philosophy, Psychology, ReligionC) History: Auxiliary SciencesD) History: General and Old WorldE) History: United StatesF) History: Western Hemisphere
L) MusicM) Fine ArtsN) Literature & LanguagesO) ScienceP) MedicineQ) Agriculture
23
G) Geography, Anthropology, RecreationH) Social ScienceI) Political ScienceJ) LawK) Education
R) TechnologyS) Military ScienceT) Naval ScienceU) Bibliography & Library
Science
© AIIM | All rights reserved
What are classification schemes?
A classification scheme…Is the structure an organisation uses for organisingIs the structure an organisation uses for organising, accessing/retrieving, storing and managing its informationCan be used to classify records
A Business Classification Scheme (BCS) is a classification scheme based on an organisation’s business functions and activities
24
These are predominately used for Records Management purposes
© AIIM | All rights reserved
13
Classification schemes: Types
Keyword /
Deployment
P Hierarchical / thesaurus-based
Functional
Subject /thematic
Principles of class
Generally preferred
tree style
25
sification
Organisational
© AIIM | All rights reserved
Hierarchical / tree style BCSs: Key
CLASS C
FILE
RECORD
F
R
26
DOCUMENT D
© AIIM | All rights reserved
14
Schematic example: Hierarchical / tree BCS
C
C C C
C C CC C C CC
C C C C
C
C
27
F F FF F FF F
© AIIM | All rights reserved
Populated example: Hierarchical / tree BCS
Innovation, Knowledge Transfer and Technical Infrastructure (super ( pfunction)
Innovation (function)Knowledge Transfer (function)Technical Infrastructure (function)
Standards and Accreditation (sub function)
Policy Management (activity)
SuperFunction
Function Function
Sub Function
Sub Function
Sub Function
28
Activity Activity Activity Activity
Infrastructure Support (activity) National Measurement System (sub function)
Policy Management (activity)Civil Space Activity (sub function)
Space Regulation (activity)© AIIM | All rights reserved
15
Agenda
Defining taxonomies and classification
Subject-based classificationTaxonomies FolksonomiesOntologiesThesaurus and Semantic networks
Business case for classification
29
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Toward subject-based classification
It’s often valuable to create multiple classificationsUsers: Intended audienceContent: Inherent subject matterContext: Temporal, organisational or political drivers
User-understood terms are criticalEspecially important for e-commercePeople search Google for “cheap flights” 75x
th “l f ” (S G M G )
30
Source: Louis Rosenfeld LLC
more than “low fares” (Source: Gerry McGovern)Who are the users? Scientists? Consumers?
Context mattersWhy this user with this content?
© AIIM | All rights reserved
16
Taxonomies in context
31© AIIM | All rights reserved
Source: Yahoo!
Hierarchies as implicit semantics
Divides information space into categories & subcategories, relating broader & narrower concepts viasubcategories, relating broader & narrower concepts via parent-child relationship
Generic = Class-species: Species B (crow) is a member of Class A (Bird) & inherits characteristics of its parent}Whole-Part = B is a part of A (i.e., Index Finger is part of Hand)Instance = B is an instance of A (i.e., Indian Ocean is an Ocean)
32
AABB
© AIIM | All rights reserved
17
Differing views
Simple truth: People see (and label!) the world differently…
Sand trap, or bunker?
33
Sand trap, or bunker?
© AIIM | All rights reserved
Personal taxonomy
• Personal classification of information• E-mail folders -- most common
manifestationmanifestation• Can improve relevance and findability
to an individual– Some approaches enable personal
classification in addition to “authorised” taxonomy
– Gmail and some other systems employ faceted classification as well
• From enterprise perspective, personal taxonomies can be quite problematic
No interoperabilit ling istic chaos
34
– No interoperability, linguistic chaos– Impossible to establish enterprise-wide
standards and vocabularies• When combined with peers, can
become a “folksonomy”
© AIIM | All rights reserved
18
Folksonomy
Collaborative tagging of content with minimal controls
Relevance between metadata and content may be determined by users in a democratic fashion
Clusters emerge and communities typically self-organise around them (“Wisdom of the crowd”)
Typically arise in Web-based communities where i di id l h t t th t d t
35
individuals share content, then create and use tags
Best used when there is a critical mass of taggersCan be a useful “bottom-up” approach to developing taxonomies
© AIIM | All rights reserved
Folksonomy example
36
Source: flickr.com
© AIIM | All rights reserved
19
What is an ontology?
Explicit specification or conceptualisation of a domainOften subsume thesauri, but employ richer semantic relationshipsOften subsume thesauri, but employ richer semantic relationships among terms and attributesApply rigid rules specifying terms and relationshipsDo more than just control vocabulary; are a knowledge representation
Semantic technologies are typically centered around ontologies
An ontology for salad would contain the structure for
37
An ontology for salad would contain the structure for how it relates to everything, from ingredients to growers to the rodents that might eat it, and how a salad is different in Japan vs. Italy
© AIIM | All rights reserved
Why develop an ontology?
To improve knowledge sharing and reuse, and make software more adaptable to an environmentsoftware more adaptable to an environment
Share common understanding of the structure of information among people or software agentsEnable reuse of domain knowledgeMake domain assumptions explicitSeparate domain knowledge from operational knowledge
38
p gAnalyse domain knowledge
Source: http://www.alphaworks.ibm.com/contentnr/introsemantics
© AIIM | All rights reserved
20
The challenge of meaning
Meaning is a hard problem for machines and humans alikeSame term can have multiple meaningsSame term can have multiple meaningsMultiple terms can have the same meaningUltimately meaning is contextual
Dublin Core designed to disambiguate at a fundamental levelE.g., distinguishes definitively among “Creator” and “Contributor,” and “Publisher”
39
But in the wild, it is much harder to achieve semantic agreement
© AIIM | All rights reserved
Controlled vocabularies
Supporting tools based on collections of terms used to tag, track and describe contenttag, track and describe content
For example, users may wish to organise content according to
business sectorgeographical locationproduct type
40
organisation typepolicy topic
Allow content to be described using only 'official terms'
© AIIM | All rights reserved
21
Controlled vocabularies: Types
Simple lists Lists of terms allowed to be used to describe an information resourceLists of terms allowed to be used to describe an information resource
Synonym ringsA 'ring' of connected terms, all treated as equivalent for searching Synonym rings can be used to link acronyms, variant spellings or scientific / popular terms
Thesaurus
41
ThesaurusHierarchical arrangement of broader and narrower meanings
© AIIM | All rights reserved
Simple lists and synonym rings
Simple list of bovine diseasesAnaplasmosisAnaplasmosis Babesiosis Bovine spongiform encephalopathy (BSE) Cysticercosis
Synonym ring for a BSE
BSE
Mad cows’ disease
Bovine spongiform
encephalopathy
42
Prion disease
© AIIM | All rights reserved
22
Thesaurus
A networked collection of controlled vocabulary terms, using associative relationships
Used to manage and identify the relationships among and between terms
E.g. Equal to, Related to, Opposite of
Some examples from a hypothetical domainLettuce = Frisée (a.k.a, ‘a synonym ring’)Lettuce is a narrower type of Greens
43
Coriander is related to Cilantro; but they are not equal
Useful to reconcile different lexicons across business units or functional groups
© AIIM | All rights reserved
Sample thesaurus
44© AIIM | All rights reserved
23
Ontologies and taxonomies and thesauri
How does this relate to Taxonomies and Thesauri?“We have all agreed to call this thing lettuce Lettuce is a vegetable ”We have all agreed to call this thing lettuce. Lettuce is a vegetable.
There is a much larger potential pool of semantic information that a taxonomy may or may not contain:
“Lettuce grows in the ground. Rabbits are a hazard to lettuce growers. Tomatoes and cucumbers are often eaten with lettuce, and the three of these things together make what is called a salad. But, a salad is not only defined by the collection of these three things in Japan a mixture
45
only defined by the collection of these three things…in Japan, a mixture of seaweed and sesame seeds is a salad. In the Midwestern United States, a collection of Jell-O and radishes is called a salad, and there is no lettuce involved.”
© AIIM | All rights reserved
Agenda
Defining taxonomies and classification
Subject-based classificationTaxonomies FolksonomiesOntologiesThesaurus and Semantic networks
Business case for classification
46
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
24
Benefits of classifying records (1)
Providing linkages between individual records which accumulate to provide a continuous record of activityp y
Ensuring records are named in a consistent manner over time
Assisting in the retrieval of all records relating to a particular function or activity
Determining security protection and access appropriate
47
g y p pp pfor sets of records
Allocating user permissions for access to, or action on, particular groups of records
© AIIM | All rights reserved
Benefits of classifying records (2)
Distributing responsibility for management of particular sets of recordssets of records
Distributing records for action
Determining appropriate retention periods and disposition actions for records
48© AIIM | All rights reserved
25
Agenda
Defining taxonomies and classification
Subject-based classificationTaxonomies FolksonomiesOntologiesThesaurus and Semantic networks
Business case for classification
49
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Standards and guidelines
ISO 15489 - the international standard for records managementg
MoReq2 - the Model Requirements for the Management Of Electronic Records
DIRKS - the Design and Implementation of Record-Keeping Systems methodology
ISO 2788 - Guidelines for the Establishment of
50
Monolingual Thesauri
© AIIM | All rights reserved
26
Agenda
Defining taxonomies and classification
Subject-based classificationTaxonomies FolksonomiesOntologiesThesaurus and Semantic networks
Business case for classification
51
Business case for classification
Standards and guidelines
Classification challenges
© AIIM | All rights reserved
Classification challenges (1)
Laborious and difficult to developTendency to over analyseTendency to over-analyse
May need more than one
Classification of content into a categorisation scheme is ongoing work
52© AIIM | All rights reserved
27
Classification challenges (2)
Categories need ongoing care and feeding (including thesauri, taxonomies, controlled vocabularies(including thesauri, taxonomies, controlled vocabularies and ontologies)
Content changes, Context changesVocabularies changeExperience may breed new perspectives
53© AIIM | All rights reserved
What you have learned
How to leverage classification in general and taxonomies in particular as part of an ECM strategyin particular as part of an ECM strategy
Different approaches to subject-based organisation schemes:
TaxonomiesThesauriSemantic networks
54
OntologiesFolksonomies
Managing classification challenges
© AIIM | All rights reserved