View
216
Download
2
Embed Size (px)
Citation preview
Strategies LLCTaxonomy
May 14, 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Taxonomy 1-2-3
Enterprise Search Summit 2007
Tutorial
2Taxonomy Strategies LLC The business of organized information
Today’s agenda
9:00-9:05 5 min Introduction
9:05-9:10 5 min Warm-up exercise
9:10-9:35 25 min Building taxonomies
9:35-9:45 10 min Taxonomy exercise
9:45-10:05 20 min Taxonomy business case
10:05-10:20 15 min Taxonomy & search
10:20-10:35 15 min Coffee Break
10:35-11:05 30 min Taxonomy ROI
11:05-11:15 10 min ROI exercise
11:15-11:45 30 min Taxonomy governance
11:45-12:00 15 min Q&A
3Taxonomy Strategies LLC The business of organized information
My taxonomy questions
Priority (1-5) Questions
Your title or role:
Your org or industry:
Your dept:
Your name: (optional)
4Taxonomy Strategies LLC The business of organized information
Taxonomy Fundamentals: Agenda
Building taxonomies Taxonomy business case Taxonomy & search Taxonomy ROI Taxonomy maintenance
5Taxonomy Strategies LLC The business of organized information
The Taxonomy problem: How to pick from > 5,000 faucets?
By: Category Price Brand Color/Finish # Handles Series Name Water Filter? Faucet Spray Handle Shape Soap Dispenser?
6Taxonomy Strategies LLC The business of organized information
The main issue: What goes here?
When do the things in the list change?
How do we maintain the list?
What rules do we follow?
7Taxonomy Strategies LLC The business of organized information
What's involved in creating a taxonomy?
Metadata Scheme. Data fields for describing content so that it can be found and used.
Vocabularies. Collections of terms that are used to specify some of the metadata properties.
Relationships between content, fields or terms (hierarchical, equivalence, & associative)
Some vocabularies are big & hierarchical, some are small and flat.
Application Profile. Formal representation of metadata & vocabularies.
8Taxonomy Strategies LLC The business of organized information
Seven phases of taxonomy development
Week: 1 2 3 4 5 6 7 8 9 10 11 12
1 Identify Objectives
Conduct interviews
2 Inventory Resources
Identify, gather & review resources
Define fields & purpose
3 Specify Metadata
4 Model Content
Define content chunks & XML
DTDs
5 Specify Vocabularies
Compile controlled vocabularies
6 Specify Procedures
Develop workflow, rules & procedures
7 Test & Train Manually tag small sample
9Taxonomy Strategies LLC The business of organized information
Taxonomy design phases need to be iterated
1 Identify Objectives
2 Inventory Resources
3 Specify Metadata
4 Model Content
5 Specify Vocabularies
6 Specify Procedures
7 Test & Train
Interview core team and stakeholders
Identify, gather & review resources
Define fields & purpose
Define content
chunks & XML DTDs
Compile controlled
vocabularies
Develop workflow rules &
procedures
Plan & Prototype
Manually tag small sample
Gather additional resources,
if any
Revise if needed, bake
into alpha CMS
Revise if needed, bake into alpha
CMS
Revise, use in alpha CMS
alpha workflows in CMS
Alpha Dev & TestReview tagged
samples, default
procedures
Use alpha CMS to tag
larger sample
Modify CMS for
beta
Modify CMS for beta
Revise, use in beta CMS
Modify & extend
workflows
Gather additional sources, if
any
Beta D&T
Interview alpha users
Use beta CMS to tag larger
sample
Finalize training materials & train
staff
Modify for 1.0
Modify for 1.0
Revise using team
procedure
Finalize procedure materials
Final D&T
Interview beta users
10Taxonomy Strategies LLC The business of organized information
Licensing an existing taxonomy
See Factiva’s taxonomy www.taxonomywarehouse.com There are usually license fees, but these will be less than
the effort to develop an equivalent taxonomy. But pre-existing taxonomies rarely fit an organization’s
needs and may require extensive customization.
Recommendation Adopt a faceted approach. Reuse existing (especially internal) vocabularies for as
many of the facets as possible. Plan on doing full-custom “Content Type” and “Topic”
taxonomies.
11Taxonomy Strategies LLC The business of organized information
Free sources for 8 common taxonomies
Taxonomy Definition Potential SourcesOrganization Organizational structure. SP 800-87, U.S. Government Manual, Your
organizational structure, etc.
Content Type Structured list of the various types of content being managed or used.
Dublin Core Type Vocabulary, AGLS Document Type, Your records management policy, etc.
Industry Broad market categories such as lines of business, life events, or industry codes.
SIC, NAICS, Your market segments, etc.
Location Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc.
Function Functions and processes performed to accomplish mission and goals.
Federal Enterprise Architecture Business Reference Model, Enterprise ontology, Your business functions, etc.
Topic Business topics relevant to your mission & goals.
Federal Register Thesaurus, NAL Agricultural Thesaurus, Your research areas, etc.
Audience Subset of constituents to whom a piece of content is directed or intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, Your psycho-graphics or personas, etc.
Products & Services
Names of products/programs & services.
ERP system, Your products and services, etc.
12Taxonomy Strategies LLC The business of organized information
Typical product catalog: A-Z, then idiosyncratic categories
13Taxonomy Strategies LLC The business of organized information
How to analyze existing product catalog categories: Principles and priorities
Preparing a product catalog for facet browsing (aka Guided Navigation) requires a category hierarchy and additional attributes.
Principles1. Categories and subcategories that could be swapped are candidates for
conversion to attributes.2. Repeated lists of subcategories signal a possible need for an attribute.3. The number of attributes should not exceed six or seven, so not all attribute
candidates should be used.• Avoid selecting strongly correlated attributes, such as “Weight” and “Shipping
Weight”.
Priorities1. Choose Categories that apply to many products, over those with few
products.2. Choose Attributes that apply to many Categories over those that apply only
to very few categories.
14Taxonomy Strategies LLC The business of organized information
Product categories example: Wireless carrier
Products
AccessoriesContentPhonesServices
BatteriesCasesChargersDataHands-FreeHeadsetsMiscellaneous
ConferencingInternet / DataLandline PhoneNetwork & Roaming
Relay ServicesSolutionsWireless Data
Versatile PhonesSmart DevicesBasic PhonesPrepaid PhonesInternational Only Phones
Mobile Broad-band Cards
PurchasedSubscription
15Taxonomy Strategies LLC The business of organized information
Product attributes example: Digital cameras in an electronics catalog
Types of attributes Generic attributes
– Brand/Product Family/Model– Price Range– Usually Ships
Merchandising attributes– Usage (E-mail, Internet Browsing, Programming, …)– Segment (Home, Business, Education, Government …)– Region & Country– Most Popular– New– Related Products
Specialized attributes– Capacity (Battery; Memory; MB; GB; BPS, …)– Resolution (DPI; Megapixels; XGA, XGA, UXGA, …)– Size (Display; Screen; ...)– Standard (a, b, g, n, …; scsi, ata, sata, eide, …; dimm, simm,
…)– Type (Camera; Battery; Display; Printer; Server; Storage;
Switch; …)
Resolution3 Megapixels (4)4 Megapixels (5)5 Megapixels (27)6-8 Megapixels (21)
BrandCanon (15)Fuji (10)Kodak (17)Nikon (8)Olympus (9)
TypePoint & Shoot (25)Digital SLR (10)Packages (5)
Price Range$100-250 (5)$250-500 (16)$500-1000 (19)More than $1000 (3)
16Taxonomy Strategies LLC The business of organized information
Faceted taxonomy theory & practice
How many terms are needed to provide sufficient granularity? Not as many as you think!
Post-coordinate indexing allows several simple controlled vocabularies to be combined, rather than using a single large pre-coordinated vocabulary.
17Taxonomy Strategies LLC The business of organized information
The power of faceted taxonomy
4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,00010,000 nodes (104) Easier to maintain Easier to tag by content authors Can be easier to navigate
It’s more effective to increase the number of facets, than to increase the number of terms per facet.
AdvocacyContractors & Grantees
Environmental Professionals
Federal Facilities
General PublicIndustryKidsResearchers & Scientists
Small BusinessStudents
Audience
AdvisoryExposureFood SafetyHealth Assessment
Health EffectHealth Risk Occupational Health
Pesticide Effects
Sun ProtectionToxicity
Health Industry
AllergenBiological Contaminant
CarcinogenChemicalExplosiveLiquid WasteMicroorganismOzonePesticideRadioactive Waste
Substance
Agriculture & Cattle
Automobile Repair
ChemicalDry CleaningElectronics & Computer
EnergyExtractive Industries
Food Processing
Leather Tanning & Finishing
Metal Finishing
18Taxonomy Strategies LLC The business of organized information
Automatically created taxonomies
Documents can be ‘clustered’ based on similarities and differences.
Problems: Typically only a single
hierarchy No overall plan Results hard for people to
navigate
What does “North” mean on this map?
19Taxonomy Strategies LLC The business of organized information
Automatic taxonomy construction software
Software can scan large quantities of content and extract statistically significant words and phrases.
Example: Archive of 10 publications analyzed for
topics related to “copyright.” Software does a poor job of
De-duplication. Turning significant words and phrases
into a larger structure. Discriminating between “gold” and
“garbage.” Software is good for
Getting an understanding of the key noun phrases in a large collection.
Providing test cases for evaluating a taxonomy.
Source: Sample data courtesy of nStein.
20Taxonomy Strategies LLC The business of organized information
Most popular flickr tags on 20 Feb 2007http://www.flickr.com/photos/tags/
Sort flickr categories into 5 or fewer groups. Then label each group.
21Taxonomy Strategies LLC The business of organized information
Taxonomy exercise—Facet grouping
Universal taxonomy facets By location (spatially) By time (chronologically) By type (genre) By physical properties (size, color, shape, etc.) By subject (topic)
Richard Saul Wurman. Information Architects (1996)
22Taxonomy Strategies LLC The business of organized information
Taxonomy exercise— Facet grouping
Location Time Type
Color Subject
Sort flickr categories into 5 or fewer groups. Then label each group.
23Taxonomy Strategies LLC The business of organized information
Taxonomy Fundamentals: Agenda
Building taxonomies Taxonomy business case Taxonomy & search Taxonomy ROI Taxonomy maintenance
24Taxonomy Strategies LLC The business of organized information
Business case and motivations for taxonomies
How are we going to use content, metadata, and taxonomies in applications to obtain business benefits?
25Taxonomy Strategies LLC The business of organized information
What technology analysts have said: Add metadata to search on!
“Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better.”
“Enriching content with structured metadata is critical for supporting search and personalized content delivery.”
“Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.”
“Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.”
26Taxonomy Strategies LLC The business of organized information
Fundamentals of taxonomy ROI
Tagging content using a taxonomy is a cost, not a benefit. There is no benefit without exposing the tagged content
to users in some way that cuts costs or improves revenues.
Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes.
You need to determine those changes, and their costs, as part of the ROI.
27Taxonomy Strategies LLC The business of organized information
Product utilization: Taxonomy compared to search
Conversion rate increases. HomeDepot.com – Double digit increase. 1-800-Flowers.com – More than a 10% increase. Otto Group (Kaleidoscope, Freemans, Grattan, and lookagain
catalogs) – 130% increase.
Lift in average order size.
28Taxonomy Strategies LLC The business of organized information
Product catalog: Taxonomy compared to search
Benefit:Increased conversion rate & revenue lift
Web sales net income $ 80,000,000
Increased conversion rate 30%
$ 24,000,000
Order size lift 10%
$ 8,000,000
Potential revenue increase per year $ 32,000,000
29Taxonomy Strategies LLC The business of organized information
Usability research: Taxonomy compared to search
“We found that users preferred a browsing oriented interface for a browsing task, and a direct search interface when they knew precisely what they wanted.”
Marti Hearst (and others)
“The category interface is superior to the list interface in both subjective and objective measures.”
Hao Chen & Susan Dumais
30Taxonomy Strategies LLC The business of organized information
Usability research: Taxonomy compared to search
0
20
40
60
80
100
120
140
Category List
Me
dia
n S
earc
h T
ime
in
Se
con
ds
In top 20 results
Not in top 20 results
Category is 36% faster
Category is 48% faster
Source: Chen & Dumais
31Taxonomy Strategies LLC The business of organized information
Time saved: Taxonomy compared to search
1 hour per day searching x 36% faster = 22 minutes each day
22 minutes x 250 working days per year = 5500 minutes or 92 hours per year
32Taxonomy Strategies LLC The business of organized information
Time saved: Taxonomy compared to search
Benefit: Increase service efficiency
Number of call center calls per month 50,000
Average cost per call $ 20
Call response costs per month $ 1,000,000
Total call response costs per year $12,000,000
Percentage of self-serviced calls due to improved information browsing 30%
Service costs savings per year $ 3,600,000
33Taxonomy Strategies LLC The business of organized information
Trusted advisers: Taxonomy avoids costs
“The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …”
Sue Feldman,
Sun’s usability experts calculated that 21,000 employees were wasting an average of six minutes per day due to inconsistent intranet navigation structures. When lost time was multiplied by staff salaries, the estimated productivity loss exceeded $10M per year—about $500 per employee per year.
Jakob Nielsen, useit.com
34Taxonomy Strategies LLC The business of organized information
Searching
Creating
Commun-icating
Knowledge workers spend up to 2.5 hours each day looking for information …
… But find what they are looking for only 40% of the time.
Source: Kit Sims Taylor
35Taxonomy Strategies LLC The business of organized information
Creating new
content
Recreating existing content
SearchingCommun-icating
25%8%
Knowledge workers spend more time re-creating existing content than creating new content
Source: Kit Sims Taylor (cited by Sue Feldman in her original article)
36Taxonomy Strategies LLC The business of organized information
Cost saved by not recreating content
Benefit: Increase in productivity
Number of employees 100
Average employee salary $ 80,000
Employee costs per year $8,000,000
Increase in productivity from not re-creating content 25%
Employee cost savings per year $2,000,000
37Taxonomy Strategies LLC The business of organized information
Business case summary
1. Classifications and classification-like schemes are being used to facilitate information seeking in the workplace, and on the web.
2. Users take advantage (and prefer) this type of scheme (faceted navigation) when it is made available in the user interface.
3. Hierarchical or facet navigation can be guided by the User Interface.
4. Facet navigation is best combined with keyword searching. E.g., keyword search followed by faceted navigation of results.
38Taxonomy Strategies LLC The business of organized information
Taxonomy Fundamentals: Agenda
Building taxonomies Taxonomy business case Taxonomy & search Taxonomy ROI Taxonomy maintenance
39Taxonomy Strategies LLC The business of organized information
Do taxonomies actually improve search?
Input (Query) Side “Search” using a small set of pre-defined values instead of trying
to guess what word or words might have been used in the content.
Have synonyms mapped together so searches for “car” and “automobile” return the same things.
Output (Results) Side Organize search results into groups of related items. Sorting and filtering Refining search results
40Taxonomy Strategies LLC The business of organized information
Finding information should not be about “Feeling Lucky”
41Taxonomy Strategies LLC The business of organized information
Google search on “pcb” –Returns > 28M items
Taxonomy could suggest “polychlorinated
biphenyls”
42Taxonomy Strategies LLC The business of organized information
169,169 items
169,169 items
Categorized results Refine search by clicking on categories
43Taxonomy Strategies LLC The business of organized information
Taxonomy in action on the results side: www.CareerBuilder.com search on IT positions
By Category By Company By City By State
44Taxonomy Strategies LLC The business of organized information
Typical search on “database”: List of ranked hits on www.oracle.com/prNavigator.jsp
Select item
45Taxonomy Strategies LLC The business of organized information
Faceted search on “database”: Categorized results + Ranked list
Select item, or
Refine search by clicking on categories
46Taxonomy Strategies LLC The business of organized information
Taxonomy Fundamentals: Agenda
Building taxonomies Taxonomy business case Taxonomy & search Taxonomy ROI Taxonomy maintenance
47Taxonomy Strategies LLC The business of organized information
Key Factors in ROI (Return on Investment)
Breadth “How many people will metadata affect?”
Repeatability “How many times a day will they use it?
Cost/Benefit “Is this a costly effort with little or no benefits?”
Source: Todd Stephens, Dublin Core Global Corporate Circle
48Taxonomy Strategies LLC The business of organized information
Some common taxonomy ROI scenarios
Product catalog Increased conversions Increased self-service & use Increased productivity
Customer support Cutting requests for information costs Increased web statistics (page hits) Higher ACSI (American Customer Satisfaction Index) score
Knowledge worker productivity Less time searching, more time working Avoiding re-creating information that already exists
Compliance Improved regulatory compliance Improved enforcement Higher PARS (Performance & Accountability Reports) FDIC, SOX, HIPAA, etc. compliance
49Taxonomy Strategies LLC The business of organized information
How to estimate costs—Tagging
Taxonomy Facet Hier?TypicalCV Size
Time/ Value (min)
Avg # values /
Item $ / MinCost/
Element
Audience N 10 0.25 2 $ 0.42 $ 0.21
Content Type N 20 0.25 1 $ 0.42 $ 0.11
Organizational Unit Y 50 0.5 2 $ 0.42 $ 0.42
Products & Services Y 500 1.5 4 $ 0.42 $ 2.52
Geographic Region Y 100 0.5 2 $ 0.42 $ 0.42
Broad Topics Y 400 2 4 $ 0.42 $ 3.36
TOTALS 1080 5 15 $ 7.04
Inspired by: Ray Luoma, BAU Solutions
50Taxonomy Strategies LLC The business of organized information
How to estimate costs—Assumptions
ASSUMPTIONS
Enterprise SW License $ 100,000
Maintenance/Support 15%
SW Implementation x 200%
Legacy Content Items 100,000
Content Growth Rate 15%
Tagging/Item $ 7.04
Enterprise Taxonomy $ 100,000
51Taxonomy Strategies LLC The business of organized information
How to estimate costs—Total cost of ownership (TCO)
Description Year 1 Year 2 Year 3 Year 4 Year 5
SW
Licenses $ 100,000
Maintenance $ 15,000 $ 15,000 $ 15,000 $ 15,000
Implementation $ 200,000
App Tech Support $ 30,000 $ 30,000 $ 30,000 $ 30,000
Tagging
Legacy Content $ 703,500
Ongoing $ 105,525 $ 121,354 $ 139,557 $ 160,490
Taxonomy
Creation $ 100,000
Maintenance $ 15,000 $ 15,000 $ 15,000 $ 15,000
TOTAL $ 1,103,500 $ 165,525 $ 181,354 $ 199,557 $ 220,490
52Taxonomy Strategies LLC The business of organized information
Benefits Assumptions
Productivity Assumptions
Employee costs per year (100 employees, $75,000 per year) $ 7,500,000
Increase in productivity (from not recreating content) 25%
Cost savings $ 1,875,000
Percentage realized in first year 10%
Service Efficiency Assumptions
Customer service calls cost/year $ 12,000,000
Efficiency (from customer self-service) 30%
Cost savings $ 3,600,000
Percentage realized in first year 10%
53Taxonomy Strategies LLC The business of organized information
Sample ROI Calculations
Description Year 1 Year 2 Year 3 Year 4 Year 5
Costs
Software Licenses/ Maintenance $ 100,000 $ 15,000
$ 15,000
$ 15,000
$ 15,000
Implementation/Support $ 200,000 $ 30,000 $ 30,000
$ 30,000
$ 30,000
Taxonomy Creation/ Maintenance $ 100,000 $ 15,000
$ 15,000
$ 15,000
$ 15,000
Legacy/Ongoing Tagging $ 703,500 $ 105,525 $ 121,354
$ 139,557
$ 160,490
Benefits
Productivity increases $ - $ 187,500 $ 1,875,000 $ 1,875,000 $ 1,875,000
Service efficiency gains $ - $ 360,000 $ 3,600,000 $ 3,600,000 $ 3,600,000
Yearly Net Benefits $(1,103,500) $ 381,975 $ 5,293,646 $ 5,275,443 $ 5,254,510
Payback period 1.1 Years until Benefits = Costs
Inspired by: Todd Stephens, Dublin Core Global Corporate Circle
54Taxonomy Strategies LLC The business of organized information
ROI exercise—Why tag?
Tagging content using a taxonomy is a cost, not a benefit. There is no benefit without exposing the tagged content to users in
some way that cuts costs or improves revenues. Putting taxonomy into operation requires UI changes and/or
backend system changes, as well as data changes. You need to determine those changes, and their costs, as part of
the ROI.
List the top 5 benefits from tagging content. Then, rank the benefits by priority.
Priority (1-5) Questions
55Taxonomy Strategies LLC The business of organized information
ROI exercise—Benefits from tagging content
Priority (1-5) Questions
List the top 5 benefits from tagging content.
Then, rank the benefits by priority.
Potential benefits from tagging content
1. Reduce information requests
2. Reduce cost per UU (unique user)
3. Expand to new audiences
4. Improve customer satisfaction
5. Improve performance & accountability
6. Increase number of successful website searches
7. Increase number of links (internal cross-cutting & external)
8. Reduce time to build websites
9. Increase metadata consistency & quality
10. Decrease time to create & publish marketing information
11. Improve e-commerce
12. Decrease product development lifecycle
56Taxonomy Strategies LLC The business of organized information
Why implement a taxonomy?
Find relevant information quicker. Discover information you didn’t know you had. Avoid duplicate efforts to “reinvent the wheel” Learn from mistakes. Create better quality work product. Provide overview as well as details about a subject. Demonstrate relationships between content. Reduce complexity.
Taxonomy & Content Classification
57Taxonomy Strategies LLC The business of organized information
Taxonomy Fundamentals: Agenda
Building taxonomies Taxonomy business case Taxonomy & search Taxonomy ROI Taxonomy maintenance
58Taxonomy Strategies LLC The business of organized information
Taxonomy requires a business processes
Taxonomies must change, gradually, over time if they are to remain relevant.
Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions.
59Taxonomy Strategies LLC The business of organized information
Taxonomy change process overview
Working Copiesof CVs, maintain in
Taxonomy Tool
Site Search Tool
Portal
Project Archives
’
DMS’
Metatagging Tool
Search UI
2: NASA Taxonomy Teamdecides when to
update snapshots ofexternal CVs
4: Updated versions ofCVs to Consumers
NASA Taxonomy Governance Environment
3: Team adds value to snapshots through
definitions, synonyms, classification rules,
training materials, etc.
Internally CreatedCVs
Codes
NASA Competencies
CVs from otherNASA Sources
External StandardVocabularies
’
’
2: Taxonomy Team decides when to update CV snapshots
Taxonomy Facets
3: Team adds value via definitions, synonyms, classification rules, training materials, etc.
1: External controlled vocabularies (CVs) change on their own schedule
Taxonomy Governance Environment
4: Updated versions of CVs published to consumers
CV Consumers
CV Sources
Subject Codes
Expertise
Other Internal
External Standard
Site Search Tool
Portal
Working Papers
Web CMS
DAM
Tagging Tool
Search UI
Internally Created
Taxonomy Tool
CV = Controlled Vocabulary
60Taxonomy Strategies LLC The business of organized information
Who should maintain the taxonomy?
The taxonomy (and metadata specification) should be produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders.
The team should plan on maintaining the taxonomy as well as building it.
Maintenance will not (usually) be anyone’s full-time job. Exact mix of people on team will change.
It should be built in an iterative fashion, with more content and broader review for each iteration.
61Taxonomy Strategies LLC The business of organized information
Taxonomy maintenance: Generic team charter
Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme. Associated taxonomy materials, such as:
– Editorial Style Guides.– Taxonomy Training Materials.– Metadata Standard.
Team rules and procedures for change management. Taxonomy Team will consider costs and benefits of
suggested changes. Taxonomy Team will:
Manage relationship between providers of source vocabularies and consumers of the Taxonomy.
Identify new opportunities for use of the Taxonomy across the enterprise to improve information management practices.
Promote awareness and use of the Taxonomy.
62Taxonomy Strategies LLC The business of organized information
Taxonomy team: Generic roles
Business Lead
Technical Specialist
Taxonomy Specialist
Content Specialist
Content Owners
Keeps committee on track with larger business objectives. Balances cost/benefit issues to decide appropriate levels of
effort. Obtains needed resources if those on committee can’t
accomplish a particular task.
Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems.
Committee’s liaison to content creators. Estimates costs of proposed changes in terms of editorial
process changes, additional or reduced workload, etc.
Suggests potential taxonomy changes based on analysis of query logs, indexer feedback.
Makes edits to taxonomy, installs into system with aid of IT specialist.
Reality check on process change suggestions.
63Taxonomy Strategies LLC The business of organized information
Where taxonomy changes come from
experience
End User
Firewall
Taxonomy
Content TaggingLogic
ApplicationUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of NASA
experience
End User
Taxonomy Team
FirewallFirewall
Taxonomy
Content TaggingLogic
TaggingLogic
ApplicationUI
ApplicationUI
TaggingUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of the organization
Team Considerations
1. Business goals.
2. Changes in user experience.
3. Retagging cost.
Recommendations by Editor
1. Small taxonomy changes (labels, synonyms)
2. Large taxonomy changes (retagging, application changes)
3. New “best bets” content.
Application Logic
64Taxonomy Strategies LLC The business of organized information
Taxonomy maintenance processes
Different organizations will need to consider their own change processes.
Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes.
Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency.
Organization 3: Marketing reps ask for a change, taxonomy editor makes demo, web representative approves it.
Change process MUST also consider cost of implementing the change
Retagging data. Reconfiguring auto-classifier. Retraining staff. Changes in user expectations.
65Taxonomy Strategies LLC The business of organized information
Taxonomy maintenance workflow
Problem?
Problem?
Yes
Yes No
No
Suggest new name/category
Review new name
Taxon-omy
Copy edit new name
Add to enterprise Taxonomy
Analyst Editor Copywriter Sys Admin
Taxonomy Tool
66Taxonomy Strategies LLC The business of organized information
Sample taxonomy editor: Data Harmony
Hierarchy Browser
Standard Term Info
67Taxonomy Strategies LLC The business of organized information
Taxonomy editing tools vendors
Abi
lity
to E
xecu
telo
whi
gh
Completeness of VisionVisionariesNiche Players
Most popular taxonomy editor is
MS Excel
An immature area– No vendors are in
upper-right quadrant!
MultiTes is widely used, cheap with
functionality
High functionality
/high cost products ($100K+)
68Taxonomy Strategies LLC The business of organized information
Taxonomy maturity model
Taxonomy governance processes must fit the organization.
As consultants, we notice different levels of maturity in the business processes around content management, taxonomy, and metadata.
Honestly assess your organization’s metadata maturity in order to design appropriate governance processes.
The following slides present results from a survey of metadata and taxonomy practices at 87 organizations. How does your organization compare?
69Taxonomy Strategies LLC The business of organized information
2005 Maturity survey: Search practices
n=87Not current
practiceBeing
developed In practiceFormer practice
NA or Unknown
Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3)
Search engine indexes multiple repositories in addition to web sites. 25% (15) 21% (13) 44% (27) 2% (1) 8% (5)
Spell Checking. 31% (19) 18% (11) 38% (23) 0% (0) 13% (8)
Synonym Searching. 41% (25) 23% (14) 30% (18) 0% (0) 7% (4)
Search results grouped by date, location, or other factors in addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4)
Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5)
Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. (Best Bets) 46% (28) 25% (15) 21% (13) 0% (0) 8% (5)
Advanced computation of relevance based on data in addition to the text of the document. 43% (26) 16% (10) 25% (15) 0% (0) 16% (10)
A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. 68% (41) 7% (4) 10% (6) 0% (0) 15% (9)
A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal. 57% (34) 15% (9) 17% (10) 0% (0) 12% (7)
70Taxonomy Strategies LLC The business of organized information
2005 Maturity survey: Metadata practices
n=87Not current
practiceBeing
developed In practiceFormer practice
NA or Unknown
Metadata standards are developed for the needs of each system with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6)
An Organization-wide metadata standard exists and new systems consider it during development. 37% (22) 37% (22) 20% (12) 0% (0) 7% (4)
The Organization-wide metadata standard is based on the Dublin Core. 52% (30) 16% (9) 21% (12) 0% (0) 12% (7)
Multiple repositories comply with metadata standard. 52% (31) 20% (12) 17% (10) 0% (0) 12% (7)
A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. 48% (29) 20% (12) 20% (12) 0% (0) 12% (7)
The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12)
A centralized metadata repository exists to aggregate and unify metadata from disparate sources. 57% (34) 17% (10) 17% (10) 0% (0) 10% (6)
Metadata is manually entered into web forms. 15% (9) 12% (7) 61% (36) 3% (2) 8% (5)
Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9)
Metadata is generated automatically, then reviewed manually for correction. 48% (29) 18% (11) 17% (10) 2% (1) 15% (9)
71Taxonomy Strategies LLC The business of organized information
2005 Maturity survey: Taxonomy practices
n=87 Not current practice
Being developed In practice
Former practice
NA or Unknown
Org Chart Taxonomy - One based primarily on the structure of the organization. 36% (21) 10% (6) 34% (20) 5% (3) 15% (9)
Products Taxonomy - One based primarily on the products and/or services offered by the organization. 37% (22) 10% (6) 32% (19) 5% (3) 15% (9)
Content Types Taxonomy - One based primarily on the different types of documents. 28% (16) 21% (12) 40% (23) 5% (3) 7% (4)
Topical Taxonomy - One based primarily on topics of interest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4)
Faceted Taxonomy - One which uses several of the approaches above. 32% (19) 29% (17) 34% (20) 0% (0) 5% (3)
The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. 75% (44) 3% (2) 14% (8) 0% (0) 8% (5)
The Taxonomy follows a written 'style guide' to ensure its consistency over time. 47% (28) 22% (13) 20% (12) 0% (0) 10% (6)
The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. 35% (21) 17% (10) 40% (24) 2% (1) 7% (4)
The Taxonomy was validated on a representative sample of content during its development. 28% (17) 22% (13) 33% (20) 3% (2) 13% (8)
A Roadmap for the future evolution of the Taxonomy has been developed. 38% (23) 40% (24) 13% (8) 0% (0) 8% (5)
Strategies LLCTaxonomy
May 14, 2007 Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Questions?
Mike Lauruhn415-378-2747
Donna Fritzsche312-804-5629
Joseph A. Busch415-377-7912
Ron Daniel Jr925-368-8371
73Taxonomy Strategies LLC The business of organized information
Taxonomy 1-2-3: Webography (1)
H. Chen, S. Dumais. “Bringing order to the web: automatically categorizing search results.” Proceedings of CHI 2000. pp. 145-152. http://research.microsoft.com/copyright/accept.asp?path=http://research.microsoft.com/~sdumais/chi2001.pdf&pub=ACM
Sue Feldman. “The high cost of not finding information.” 13:3 KM World (March 2004) http://www.kmworld.com/publications/magazine/index.cfm?action=readarticle&Article_ID=1725&Publication_ID=108
P.R. Hagen. Must search stink? Forrester Research, June 2000.
K. Hall. Content tagging strategies. Giga Information Group, February 2001.
M. Hearst, A. Elliott, J. English, R. Sinha, K. Swearingen & K. Yee. “Finding the flow in website search.” 45 Communications of the ACM (Sept 2002) http://www.ischool.berkeley.edu/~hearst/papers/cacm02.pdf
J. Morrison. “How to create effective taxonomy.” ZDNet Asia, August 18 2004. http://www.zdnetasia.com/builder/program/dev/0,39045513,39190441,00.htm
74Taxonomy Strategies LLC The business of organized information
Taxonomy 1-2-3: Webography (2)
Jakob Nielsen. Web Design and Development.
Eric T. Peterson. “Home Depot uses Endeca to consolidate search and navigation, dramatically increasing conversion: case study.” Jupiter Research (July 11, 2005)http://www.jupiterresearch.com/bin/item.pl/research:casestudy/79/id=96483/
S. Phillips, E. Maguire, C. Shilakes. Content management: The new data infrastructure–Convergence and divergence out of chaos. Merrill Lynch, June 2001.
K.S. Taylor. "The brief reign of the knowledge worker," 1998. http://online.bcc.ctc.edu/econ/kst/BriefReign/BRwebversion.htm
Taxonomy & content classification: market milestone report. Dephi Group, 2002. http://www.delphiweb.com/knowledgebase/documents/upload/pdf/2176.pdf?session=%5Bg_sid%5D
Taxonomy Warehouse. www.taxonomywarehouse.com
Richard Saul Wurman. Information Architects (1996)
75Taxonomy Strategies LLC The business of organized information
Vendors Taxonomy Editing Tools URLs
Knowledge Workbench www.convera.com/solutions/retrievalware/KnowledgeWorkbench.aspx
Cuadra STAR/Thesaurus www.cuadra.com/products/thesaurus.html
Thesaurus Master www.dataharmony.com/products/tm.htm
Knowledge Engineering Workbench
www.entrieva.com/entrieva/html_site/knowworkbench.htm
MetaTagger www.interwoven.com/products/content_intelligence/index.html
SmartDiscovery www.inxight.com/pdfs/Taxonomy_FinalWeb.pdf
MS Excel
Intelligent Topic Manager www.mondeca.com
MultiTes Pro www.multites.com
Taxonomy/Authority File Manager
www.nstein.com/epub/ncm-taxonomy.asp
Protégé http://protege.stanford.edu/
SchemaServer www.schemalogic.com
Synapticawww.factiva.com/products/taxonomy/synaptica.asp?node=menuElem1511
Taxonomy Manager www.teragram.com/solutions/taxonomy.htm
Term Tree www.termtree.com.au
Enterprise Vocabulary Server
www.webchoir.com/products/wvs.html
Designer www.wordmap.com/Enterprise/Taxonomy_and_metadata_management.html