17
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

Embed Size (px)

Citation preview

Page 1: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

Content Categorization Tools Taxonomies & Technologies for

Infrastructure Solutions

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

2

Agenda

KAPS Group & Categorization Research The Answer is Taxonomy, What is the problem? Machine Categorization

– Companies, Methods, Directions

The Place of Taxonomy in the Enterprise– Taxonomy as an infrastructure activity– Foundation for Content Management, Search, Portals, Smart

Applications

Page 3: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

3

KAPS Group

KAPS Background – Knowledge Architecture Consultants– Organize and contextualize content, communities, and tasks– Professional Services partner to Categorization Companies

Categorization research– Evaluated 20+ companies– More companies, more new technologies– The answer is categorization, not Google

Page 4: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

4

The Answer is Taxonomy.What is the Problem? Professionals spend more time looking for information than

using it Professionals spend up to 2 hours a day searching Corporate Intranets Survey

– Can’t find anything– Search Stinks - Can’t find good content– No good content

Page 5: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

5

The Answer is Taxonomy.What is the Problem?

Infoglut: More information is being generated every day in modern companies than our entire corpus from the Athenian golden age

Quantity of information overwhelms our ability to present and classify it.

Search is not enough.– Humans search concepts, not strings

Page 6: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

6

A Modest Proposal:A Solution to Infoglut Bury all new content for 2,500 years Lose most new content in a library fire Unless you can convince a group of monks that your

content is worth copying, it gets tossed Dark Ages Solution: Stop writing for a thousand years

Page 7: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

7

Infoglut:A Really Radical Solution Hire librarians, editors, information architects to categorize

your content

OR Develop technologies that:

– support and enhance the ability of authors and editors to characterize content

– enhance the ability of users to find content

AND Create a hybrid human/automatic solution

Page 8: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

8

New Technologies: Categorization Explosion

Autonomy Semio Verity Inxight Topical Net Mohomine LingoMotors H5Technologies YellowBrix Entopia

Bridgewell MetaTagger Applied Semantics Sageware SmartLogik Inktomi/Quiver Stratify Vivisimo Textology Other - Tacit

Page 9: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

04/19/23 Inxight Confidential

Auto-Categorization: Methods

– Semi-Automatic: Rules, If-Then• Maximum precision & flexibility

– Catalog by Example: Bayesian, SVM, Neural• Training Sets (5-500)• Speed, Learning

– Statistical Clustering• Set of Documents & Taxonomy Level

– Semantic Analysis & World Knowledge

Page 10: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

04/19/23 Inxight Confidential

Origins of Auto-Categorization

News Feeds and Content providers• uniform content, size and structure• professional writers• Simple or standard vocabulary

Corporate intranet• Wildly varied content• Mix of good, bad, and ugly writers• Tower of Babel: Acronyms, special meanings

Page 11: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

04/19/23 Inxight Confidential

New Technologies: The Human Element

Automatic Categorization is Not Humans are better, but not as consistent

– Bring outside contexts to the document• Purpose, similar documents, common sense

– Understandable mistakes Computers are faster and cheaper

– Faster yes, Cheaper ?– Cost of poorer quality categorization

• Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

The Best Answer is Hybrid or Cyborg Categorization

Page 12: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

12

Summary

No clear leader in categorization No one has it all. Immature industry and pent up demand No out of the box solutions: Support Distributed Hybrid Look for

• Advanced Algorithms• Clustering, Auto-Summarization, noun phrase extraction• World Knowledge, import public & custom taxonomies• Integration – rules, metadata, components & product

• CM, Search, Portals, Expertise, Collaboration, Applications

Page 13: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

13

Location of Taxonomy in the Enterprise:An Infrastructure Activity

Technology• $Millions and 1,000’s of

people

Organizational• Recognized Value• fundamental to business

activity

Intellectual• A couple of librarians• No budget• First to be laid off

3 Infrastructures

Technological Organizational Intellectual

Page 14: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

14

Location of Taxonomy in the Enterprise:An Infrastructure Activity

Technology• $Millions and 1,000’s of

people

Organizational• Recognized Value• fundamental to business

activity

Intellectual• A couple of librarians• No budget• First to be laid off

3 Infrastructures

Technological Organizational Intellectual

3 Infrastructures

Technological Organizational

Intellectual

Page 15: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

15

Creating an Intellectual Infrastructure

Knowledge Audit / Knowledge Map Knowledge Creating

– Innovation, Content Management, E-learning

Knowledge Sharing / Transmission– Collaboration, Retrieval - content, experts

Knowledge Using– Smart Applications, CRM, Portals

Knowledge Architecture People

Page 16: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

16

Content Management and Taxonomy

Taxonomic Publishing Model– Publish by Category, not web site– Web Site the wrong unit of organization

Distributed Work Flow• Collaborative Categorization and keywords by Subject Matter

Experts, aided by software

Content Re-Organization– Rich Web of Related Content

• Basic information + contexts

Content Re-Organization: Next Steps– Document can be wrong unit of organization

Page 17: Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture

17

Taxonomy and SearchKnowledge Retrieval: Information + Contexts Information Retrieval: ProductName

– List of Documents, ranked by frequency of keyword

Knowledge Retrieval: ProductName– Personal & Community & Historical Filters – List of Documents – about product– Categorized list:

• Features of Product• Comparisons of Products• Legal / Policy documents• Activities associated with product

– Background Resources • Glossaries, Communities