Upload
melvin-nelson
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Creating Atomic ContentFor Taxonomy / Content Database Driven Documents
Mark Cashman
atomic
adj. [from Gk. `atomos', indivisible] 1. Indivisible; cannot be split up.
Copyright © 2002 by Mark Cashman
The Documentation Problem
Regardless of what you are documenting, you face numerous challenges…
– A need for consistency – to achieve brand identity, clarity of presentation, or ease of navigation and understanding.
– The requirement to organize and link complex and interrelated content into a coherent whole.
– Controlling and leveraging the duplication of content across documents and projects to attain consistency, minimize effort, and maximize economy.
– Using the same material in a different order or combination for different applications.
– Multiple levels of detail or different perspectives on the same material to suit different audiences, media, or connection bandwidths.
Copyright © 2002 by Mark Cashman
The Many Uses Of Content
Multiple media
Various bandwidths / size constraints
Various levels of detail
Need for translations
Copyright © 2002 by Mark Cashman
Some Terms
Content– Text, Image, Audio, Video– Spreadsheet– Database rows / select statement– … anything
Metadata– Data about content
Name Description Author Production Date Licensing Embargo …anything
– Ingest: The process of putting content into a system and creating its metadata
Repurposing - Converting content to a different medium or level of detail for a new use.
Copyright © 2002 by Mark Cashman
More Terms
Storage Format– Raw text– HTML– XML– Database text– Postscript– … anything
Delivery Medium– Paper– Web, high bandwidth, low bandwidth, wireless– TV
Classification– Identifying content as a member of a class– Class has a name and a meaning and a relationship with other classes
Taxonomy– A hierarchy or network of classifications– Can be ad hoc or standardized
Copyright © 2002 by Mark Cashman
Content Workflow
Find out what you need. Get it from the right source – in-house or external. Bring it into a system so it will be available. Make sure it is right for the application and any legal or regulatory constraints. Index and add metadata to it so it can be found. Release it. Publish it. Be prepared to revise it.
RequirementDefinition
Sourcing Assignment
Purchase
Creation Editing Approval IndexingNeed forRevision
Ingest
Integration
Release
Publish
Copyright © 2002 by Mark Cashman
Implications for Content Management
Centralized storage and management of content to facilitate search and reuse.
Computer-based storage, indexing and metadata to provide rich search terms and rapid, context-sensitive recombination.
Delivery-media-independent content storage format.
Ability to apply translation and media specialization to content as needed.
Consistent and automatic updating of all accessible presentations of content when changes are made (may require republishing of print, broadcast and presentations).
Organization separated from content.
Copyright © 2002 by Mark Cashman
Implications for Content
Finest possible granularity.
Ability to be classified by hand or automatically into multiple rich categories.
Independent of other, related content items.
Multiple levels of detail for the same content item.
Multiple representations for the same content item with metadata allowing appropriate selection of representation based on destination media and other constraints.
Copyright © 2002 by Mark Cashman
Types of content organization
Sequential ordering (and reverse).
Context sensitive ordering.
Hierarchical linking.
Network linking.
Heterogenous organization (several of the above).
Copyright © 2002 by Mark Cashman
Sequencing Content Navigation
Manual, fixed (books, papers).
Manual, embedded (HTML, indexes).
Manual, envelope (Search engines with manual classification, taxonomy driven websites with manual classification).
Automatic, embedded (Database driven HTML with keyword driven links).
Automatic, envelope (Search engines with keyword classification, taxonomy driven websites with keyword classification).
Copyright © 2002 by Mark Cashman
Repositories for Content
Source files in directory structure with metadata database.
Content database – content stored in or referenced by database.
Content and taxonomy database – content and classification stored in or referenced by database.
Copyright © 2002 by Mark Cashman
An Example
New England Trail Review.com– Network taxonomy driven– Content / taxonomy database, text content internal, other content external– Multiple representations for content– Display templates control page content– Look and feel table controls page colors and common text / images
Principles are universal– Documentation of trails is sequential.– Documentation of special sights is non-sequential.– Common documentation elements for specific trails.– Related material on a class and a content item level.
Copyright © 2002 by Mark Cashman
A Classified Item of Image Content
Metadata– Name– Description– Type
Text Detail
Abstract Item
Content Item
Whole Content Classification
Sub element Classification
Copyright © 2002 by Mark Cashman
A Classified Item of Text Content
Metadata Text Detail Is The Content Multiple Uses
Copyright © 2002 by Mark Cashman
A Presentation Template
Driven by classification of content.
Flexible in accepting multiple items where appropriate.
Copyright © 2002 by Mark Cashman
Alternate Views Of Content 1
Full size images, paged, for high bandwidth connections
All images have description as the ALT text, for use by screen readers
Copyright © 2002 by Mark Cashman
Alternate Views Of Content 2
Small images, paged, for lower bandwidth connections
Entry point to lowest bandwidth, one full size image per page view
All images have description as the ALT text, for use by screen readers
Copyright © 2002 by Mark Cashman
Alternate Views Of Content 3
Single image per page, for lowest bandwidth
All images have description as the ALT text, for use by screen readers
Copyright © 2002 by Mark Cashman
What’s Wrong With This Content?
Use of sequencing words “after”, “descending”, “soon” prevent reversing the trail.
Reference to other steps on the trail make it confusing if this is classified in an additional category.
Copyright © 2002 by Mark Cashman
Multiple Classifications For The Same Item 1
Location independent
Sequence independent
Relies on ability of reader to order the content by its sequence on the page.
Copyright © 2002 by Mark Cashman
Multiple Classifications For The Same Item 2
Works in the alternate context
Plays well when sequence doesn’t matter or when it does.
Copyright © 2002 by Mark Cashman
Evolution Of A Taxonomy
Taxonomies will change over time.
Content must be adaptable to new classifications.
Destroying a category can be dangerous if outsiders can bookmark based on a category. Think about your audience.
A purchased or standard taxonomy generates a tension between stability and flexibility.
Ad hoc categories will appear. They may or may not be justified, and can corrupt the taxonomy.
Library science and scientific taxonomists can and should help establish and evolve your content taxonomy.
Copyright © 2002 by Mark Cashman
Atomic Content Databases For Knowledge Management
Classified content can be the core of a knowledge management system.
Atomic text fragments are low overhead for SMEs (Subject Matter Experts) to produce.
Atomic text and images can be extracted from existing documents through automatic processes, but may require SME intervention to atomize fully.
Newsgroups and email can be mined for atomic text.
Artificial intelligence classifiers may be able to generate initial taxonomies for large bodies of atomic text, but a human agent must also be involved.
Copyright © 2002 by Mark Cashman
Text Content Challenges
Must be independent of sequencing.
Must be independent of context to allow multiple classifications.
Must contain terms that justify each selected classification.
Must be short to maximize reuse.
Copyright © 2002 by Mark Cashman
Text Content Guidelines
Never say “soon”, “later”, “before”, etc.
Use orientation independent directions – “north” and “south” rather than “left” or “right”, ”south-sloping” rather than “uphill” or “descending”.
Let the name tell about the primary context (such as where the image was photographed).
Use a description that would make sense to someone who might not be able to see well, so they can still use the images.
Think about every aspect of what the content refers to and write it with a view toward potential future classifications.
Keep it between one and three paragraphs in length.
Copyright © 2002 by Mark Cashman
Purchased Content Challenges
Third party content producers do not think in terms of atomic content. Their content will be large grained.
Licensing restrictions may prevent “busting up” content for reuse.
Content may have internal links or links to other large grained content it depends on.
Content will not be written in a way that makes it easy to extract pieces and use them separately.
Copyright © 2002 by Mark Cashman
Purchased Content Guidelines
Select content based on its “fine-grained ness”.
Negotiate contracts to allow repurposing and splitting up of content.
Avoid heavily interlinked content, or accept the internal cost of turning links into classifications.
Negotiate contracts which allow modification of the content for atomic reuse.
Copyright © 2002 by Mark Cashman
Enabling Technologies
Digital Asset Management Systems Search Engines Predefined and standard taxonomies XML / XSL Database Management Systems Workflow Systems Web Presentation Systems
Integration is required and expertise is not widespread
Copyright © 2002 by Mark Cashman
Summing Up
Atomic content facilitates a wider range of reuse and repurposing than large grained content.
Context and delivery medium independence is important for maximal reuse and repurposing.
Databases for content and metadata are critical to the reuse of large bodies of atomic content.
Taxonomies can be created or purchased, and are also critical to reuse.
Ingest is the most expensive part of dealing with atomic content.
Training and breaking old habits is the hardest part of creating atomic copy.
A variety of technologies exist to aid in supporting atomic content and taxonomy driven communication efforts, but integration is required.