View
215
Download
2
Embed Size (px)
Citation preview
Strategies LLC
Taxonomy
April 11, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
4 Myths about Taxonomies
ITIMG – Industrial Technical
Information Managers Group Meeting
Newport Beach, CA
2TAXONOMY STRATEGIES LLC The business of organized information
Who I am
Over 25 years in the business of organized information Founder & Principal, Taxonomy Strategies Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies Program Manager, Getty Foundation Manager, Pricewaterhouse Assistant Director for Technical Services, Hampshire College Chief, Technical Services, Paul Weiss Rifkind Wharton & Garrison
Metadata & taxonomies community leadership. President, American Society for Information Science & Technology Trustee, Dublin Core Metadata Initiative Co-Founder, Networked Knowledge Organization Systems/Services Adviser, National Research Council Computer Science and
Telecommunications Board Reviewer, National Science Foundation Division of Information and
Intelligent Systems
3TAXONOMY STRATEGIES LLC The business of organized information
Recent & current projects
Government Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (
www.firstgov.gov) Head Start Infocomm Development Authority of
Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (
www.usda.gov)
Commercial Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhouse Coopers Siderean Software Sprint Time Inc.
Commercial subcontracts Agency.com – Top financial services Critical Mass – Fortune 50 retailer Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant
NGO’s CEN IDEAlliance IMF OCLC
4TAXONOMY STRATEGIES LLC The business of organized information
What I do
Organize Stuff
5TAXONOMY STRATEGIES LLC The business of organized information
Agenda
Myth #1: The Web has changed everything Myth #2: Taxonomies are monolithic hierarchies Myth #3: Literary warrant Myth #4: Knowledge workers
6TAXONOMY STRATEGIES LLC The business of organized information
Finding information should not be about “Feeling Lucky”
7TAXONOMY STRATEGIES LLC The business of organized information
Something is wrong with this picture
“…search is so fundamental that people should have been focusing on it all along. The reality of the situation is that there was a great assumption that search was actually working just fine.”
— Harley Manning, Research Director
8TAXONOMY STRATEGIES LLC The business of organized information
Why doesn’t search work?
For search engines to work, they need better stuff to work on!
Otherwise it’s Garbage in… …and garbage out.
Correctly matching content with questions (regardless of the technology) requires better content to work on.
9TAXONOMY STRATEGIES LLC The business of organized information
How to fix search … add metadata to search on
“Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better.”
“Enriching content with structured metadata is critical for supporting search and personalized content delivery.”
“Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.”
10TAXONOMY STRATEGIES LLC The business of organized information
What is metadata? Another view of Dublin Core
Asset metadata – Who, Where & When:
Title, Creator, Publisher, Contributor, Date, Type,
Format, Identifier, Source, Language
Subject metadata –What & Why:
Subject, Description, Coverage
Relational metadata – Links between and to:
Relation
Use metadata – How can it be used:
Rights & Permissions
Functionality
Dif
fic
ult
to
Ge
ne
rate
Better resource description = Better navigation &
discovery
11TAXONOMY STRATEGIES LLC The business of organized information
Dublin Core is a little more complicated
Elements1. Identifier2. Title3. Creator4. Contributor5. Publisher6. Subject7. Description8. Coverage9. Format10. Type11. Date12. Relation13. Source14. Rights15. Language
AbstractAccess rightsAlternativeAudienceAvailableBibliographic citationConforms toCreatedDate acceptedDate copyrightedDate submittedEducation levelExtentHas formatHas partHas versionIs format ofIs part of
Is referenced byIs replaced byIs required byIssuedIs version ofLicenseMediatorMediumModifiedProvenanceReferencesReplacesRequiresRights holderSpatialTable of contentsTemporalValid
RefinementsBoxDCMITypeDDCIMTISO3166ISO639-2LCCLCSHMESHPeriodPointRFC1766RFC3066TGNUDCURIW3CTDF
EncodingsCollectionDatasetEventImageInteractive ResourceMoving ImagePhysical ObjectServiceSoftwareSoundStill ImageText
Types
12TAXONOMY STRATEGIES LLC The business of organized information
Metadata is a data model– A scheme for e-Forms
Element Namespace Source Purpose
Identifier dc:identifier System supplied Basic accountability
Registrar dc:creator LDAP validated Accountability & maintenance
Form Name dc:title User Text search, results display
Form Number dcterms:alternative User Text search, results display
Revision Date dcterms:modified User Filter or rank search results
Agency dc:publisher FIPS 95-2Key index to retrieve & aggregate assets
Form Type dc:typeForm Type vocabulary Browse or group search results
Industry Code us:naics NAICS codes Browse or group search results
Jurisdiction dc:coverage FIPS 5-2 Browse or group search results
Purpose us:feabrmFEA Business Ref Model Browse or group search results
... … ... ...
Subject
13TAXONOMY STRATEGIES LLC The business of organized information
How is Dublin Core used in corporate environments?
57%
43% 43%
29%
0%
10%
20%
30%
40%
50%
60%
De facto Simple Access enabler Compliance
Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin Core metadata in Corporate Environments
14TAXONOMY STRATEGIES LLC The business of organized information
Dublin Core framework for corporate use
Not just 15 elements A framework to enable cross-resource exploration and
useDublin Core is framework for “integration metadata” at BellSouth
15TAXONOMY STRATEGIES LLC The business of organized information
Agenda
Myth #1: The Web has changed everything Myth #2: Taxonomies are monolithic hierarchies Myth #3: Literary warrant Myth #4: Knowledge workers
16TAXONOMY STRATEGIES LLC The business of organized information
Hierarchical classification of things into a tree structureHierarchical classification of things into a tree structure
What is a taxonomy? Systematics view
Kingdom Phylum Class Order Family Genus Species
AnimaliaChordata
MammaliaCarnivora
CanidaeCanis
C. familiari
Linnaeus …
Segment Family Class Commodity
44-Office Equipment and Accessories and Supplies .12-Office Supplies
.17-Writing Instruments
.05-Mechanical pencils
.06-Wooden pencils
.07-Colored pencils
UNSPSC …
17TAXONOMY STRATEGIES LLC The business of organized information
Jurisdiction
Industry Impact
BRM Impact
Form TypeAgency AudienceKeyword Topic
Taxonomic metadata – e-Forms exampleTaxonomic metadata – e-Forms example
0001 Legislative
1000 Judicial1100
Executive Office of Pres
0003 Exec Depts1200 Agriculture1300 Commerce9700 Defense9100 Education8900 Energy7500 HHS7000 DHS8600 HUD1400 Interior1500 Justice1600 Labor1900 State6900 Transport2000 Treasury3600 Veterans
Ind AgenciesIntl Orgs
ApplicationApprovalClaimInformation
requestInformation
submission
InstructionsLegal filingPaymentProcuremen
tRenewalReservationService
requestTestOther inputOther
transaction
Agriculture & food
CommerceCommunica-
tionsEducationEnergyEnv proForeign relsGovtHealth &
safetyHousing &
comm devLaborLawNamed grpsNational defNat resourcesRecreationSci & techSocial pgmsTransport
AllGeneral
CitizenBusinessGovtEmployeeNative American
Non-resident
TouristSpecial
group
00 Generic11
Agriculture21 Mining22 Utilities23
Construct31-33
Manuf42
Wholesale44-45
Retail48-49 Trans51 Info52 Finance54
Profession55 Mgmt56 Support61
Education62 Health
Care71 Arts72
Hospitality81 Other
Services92 Public
Admin
FederalState +Local +Other +
Citizen SrvcsSocial SrvsDefenseDisastersEcon DevEducationEnergyEnv MgmtLaw EnfJudicial
CorrectionalHealthSecurityIncome Sec
IntelligenceIntl AffairsNat ResourTransportWorkforceScience
DeliverySupport Manageme
nt
TaxonomiesTaxonomies
Metadata Elements
18TAXONOMY STRATEGIES LLC The business of organized information
The power of taxonomy facets
4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,00010,000 nodes (104) Easier to maintain Can be easier to
navigate
19TAXONOMY STRATEGIES LLC The business of organized information
Taxonomic metadata example:Form SS-4. Employer Identification Number (EIN)
Facet Values
Agency IRS
Content Type Information Submission
Industry Impact
Generic
Jurisdiction Federal
Programs & Services
Support Delivery of Services/General Government/Taxation Management
Keyword Topic
Commerce/Employment taxes
Audience Business
20TAXONOMY STRATEGIES LLC The business of organized information
Methods used to create & maintain metadata
71%
57%
43% 43%
0%
10%
20%
30%
40%
50%
60%
70%
80%
Forms DistributedProduction
Centralizedproduction
Not Automated
Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin Core metadata in Corporate Environments
21TAXONOMY STRATEGIES LLC The business of organized information
Agenda
Myth #1: The Web has changed everything Myth #2: Taxonomies are monolithic hierarchies Myth #3: Literary warrant Myth #4: Knowledge workers
22TAXONOMY STRATEGIES LLC The business of organized information
Literary warrant
The “literature” on which a controlled vocabulary is based.
The “official names” of people, organizations, events, places, and things has been published sources
Type of Entity Authoritative Sources
Author names Title page
Places
US Board on Geographic Names, National Geo-Spatial Intelligence Agency, ISO 3166, UN Statistics Division
Subjects Existing literature
23TAXONOMY STRATEGIES LLC The business of organized information
Why vocabulary differences are necessary
Terminology is needed before “literature” establishes warrant.
Categories are needed for internal purposes such as sorting, analysis, and other ad hoc groupings.
Organizations, places, and other entities change over time.
24TAXONOMY STRATEGIES LLC The business of organized information
Folksonomies: Emergent topics
25TAXONOMY STRATEGIES LLC The business of organized information
Some vocabulary differences are necessary: Grouping
ISO 3166-1
UN Code
Internal Code Name Official Name
AUT 40 122 Austria Republic of Austria
BEL 56 124 Belgium Kingdom of Belgium
DNK 208 128 Denmark Kingdom of Denmark
FRA 250 132 France French Republic
DEU 276 134 GermanyFederal Republic of Germany
SMR 674 135 San MarinoRepublic of San Marino
ITA 380 136 Italy Italian Republic
LUX 442 137 LuxembourgGrand Duchy of Luxembourg
… … … … …
26TAXONOMY STRATEGIES LLC The business of organized information
Some vocabulary differences are necessary: Entities change over time
Name Part ofEffective
Dates Entity TypeSerbia and Montenegro Europe 2003- Independent state
Serbia and Montenegro
Federal Republic of Yugoslavia 1991-2003 Republic
Yugoslavia Europe 1929-1991 Independent state
27TAXONOMY STRATEGIES LLC The business of organized information
Sources for 7 common taxonomies
Taxonomy Definition Potential Sources
Organization Organizational structure. FIPS 95-2, U.S. Government Manual, Your organizational structure, etc.
Content Type Structured list of the various types of content being managed or used.
DC Types, AGLS Document Type, AAT Information Forms , Records management policy, etc.
Industry Broad market categories such as lines of business, life events, or industry codes.
FIPS 66, SIC, NAICS, etc.
Location Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, etc.
Function Functions and processes performed to accomplish mission and goals.
FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc.
Topic Business topics relevant to your mission and goals.
Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, etc.
Audience Subset of constituents to whom a piece of content is directed or intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, etc.
Products and Services
Names of products/programs & services.
ERP system, Your products and services, etc.
28TAXONOMY STRATEGIES LLC The business of organized information
How Dublin Core is extended?
100%
86%
57% 57%
0%
20%
40%
60%
80%
100%
120%
Doc Types Products &Services
Roles InconsistentEncoding
Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin Core metadata in Corporate Environments
29TAXONOMY STRATEGIES LLC The business of organized information
Business process document types: Local document type lists are commonly invented
Oil & gas services company document types
analysis, appraisals, assessments, forecasts, predictions
agendas, plans, designs, schedules, workflow
applications, proposals, requests, requirements
permits, consents, approvals, rejections, certificates
work orders, correspondence
auditing, compliance, testing, inspections, operations reports
lessons learned, after-action reviews, meeting minutes, FAQs
policies, procedures, training manuals, standards, best practices
research notes, journal articles
newsletters, bulletins, press releases
ads, brochures, data sheets, technical notes, case studies, price lists
checklists, templates, forms, logos, branding
software, database forms
30TAXONOMY STRATEGIES LLC The business of organized information
What controlled vocabularies are being used?
57%
29%
14%
43%
0%
10%
20%
30%
40%
50%
60%
ERP LDAP Business Process ISO 3166
Language CodesBase: 20 corporate information managers CEN/ISSS Workshop on Dublin Core
– Guidance information for the deployment of Dublin Core metadata in Corporate Environments
31TAXONOMY STRATEGIES LLC The business of organized information
Agenda
Myth #1: The Web has changed everything Myth #2: Taxonomies are monolithic hierarchies Myth #3: Literary warrant Myth #4: Knowledge workers
32TAXONOMY STRATEGIES LLC The business of organized information
Searching
Creating
Commun-icating
Knowledge workers spend up to 2.5 hours each day looking for information …
… But find what they are looking for only 40% of the time.
— Kit Sims Taylor
33TAXONOMY STRATEGIES LLC The business of organized information
Creating new
content
Recreating existing content
SearchingCommun-icating
26%9%
Knowledge workers spend more time re-creating existing content than creating new content
— Kit Sims Taylor
34TAXONOMY STRATEGIES LLC The business of organized information
High cost of not finding information
“The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …”
— Sue Feldman,
High cost of poor classification
Poor classification costs a 10,000 user organization $10M
each year—about $1,000 per employee.
— Jakob Nielsen, useit.com
35TAXONOMY STRATEGIES LLC The business of organized information
Opportunities and challenges
80% of enterprise data is unstructured. Outputs from back office systems are documents—
queries & reports.
Avoiding unnecessary recreation of content. Enabling decision-making transparency. Promulgating policies & guidelines. Managing intellectual property. Supporting product & services throughout their life cycle
—development, marketing, sales & support.
36TAXONOMY STRATEGIES LLC The business of organized information
Productivity, loyalty, and revenue have provided the ROI
37TAXONOMY STRATEGIES LLC The business of organized information
Intranet has provided the best ROI
Intranet
Web/online customer sales
Web dev infrastructure
Middleware to link Web to ERP
e-billing/payment systems
Web/online business sales
Wireless Web access
Extranet/supply chain
e-marketplace/ portal
None
Strategies LLC
Taxonomy
April 11, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Joseph A. Busch+ 415-377-7912
[email protected]://ww.taxonomystrategies.com