Upload
scott-abel
View
4.976
Download
3
Embed Size (px)
DESCRIPTION
Presented by Joe Gollner at Documentation and Training West, May 6-9, 2008 in Vancouver, BCThis workshop introduces a proven framework for discussing, designing, developing and deploying content management and processing solutions. With the growing demands being placed on organizations to provide better content that is precisely tailored to a user's needs, it has become essential that a new level of rigor and discipline be applied to how these types of solutions are built. This establishes the pressing need for "Content Engineering".
Citation preview
A I t d ti tAn Introduction to Content Engineering
J G ll
Copyright © Stilo International plc 2008
Joe Gollner VP e-Publishing [email protected]
Introduction to Content Engineering: Topics
What is Content?
Content Engineering & the Content Processing RoadmapContent Engineering & the Content Processing Roadmap
The Business Context of Content Engineeringg g
Aims:E t bli h th t f d d f C t t E i iEstablish the nature of, and need for, Content EngineeringDefine a rubric of terminology for the tools and techniques that constitute a practical working framework for di i d i i d l i d d l idiscussing, designing, developing and deployingcontent management and processing systems
What is Content?
Content is how we Communicate
C i h h i l fContent is the physical formof human communication
Content is meaningfulgbecause it entails context
Narrative StructuresNarrative StructuresImplied Associations
Associative MemoryAcquired PerspectivesImperfect Expression
Associative MemoryAcquired PerspectivesImperfect Interpretation
Content is typically serializeddue to the ways we
express, store and interpret information
Content AccumulatesContent exchanges are infinitely
multi-directionalContent is innately complexand the rate of increase in complexity is unbounded
Content persists over time and can be transmitted over distances
The Document as the Popular Face of ContentThe document has proven to be a p
powerful device for communicating and retaining content
While documents provide effective physical containers for content, they also lead to multiple modes of exchange and potential obsolescence
Content is EverywhereThis has been true since the dawn of
i ili ti d it i t d ilcivilization and its importance grows daily
Content populates an ecosystem where people receive, internalize, modify, create and share that content. Content connects everything.
What is Changing about Content?
Remarkably a great dealRemarkably, a great dealWhy remarkably?
Content has been evolving for millenniaThere have been large revolutions in the past
The emergence of writingGutenberg Printing PressTelegraph / TelephoneComputer Telecommunications
The Evolution of the Web & the Convergence of Mediae o u o o e eb & e Co e ge ce o ed aChanged historical forms for storing & exchanging contentIntroduced new levels of immediacy, universality & persistenceCreated new forms of content containersCreated new forms of content containers
Blog / Wiki / Podcasting / YouTube Personal communication platforms (knowledge appliances)
Content Trends
Volumes are increasingVolumes are increasingA study at Berkeley measured the amount of new information created in 2002 as 5 exabytes (equal to 37 000 Libraries of Congress)to 37,000 Libraries of Congress)
Electronic accessibility is the normThe same study found that over 92% of all new content was stored electronicallyThis content is coming online (the Deep Web)
Variety of formats is increasingVariety of formats is increasingVideo / Audio / Rich formats are gaining groundMore sophisticated formats emerging
XML b d i li t t lXML-based specialist protocolsShared application components
The Truth about Content
We are faced with:We are faced with:Massively expanding content volumesDiversifying venues for content deliveryProliferating format varietiesRising expectations of usersEscalating specialization of contentEscalating specialization of contentEvolving interconnectedness of contentMultiplying problems related to content securityp y g p yContinuing lifecycle challenges (obsolescence remains a risk)Increasing complexity of content
(the reintegration of data & documents)(the reintegration of data & documents)Growing recognition of the central importance of content
What Lies Ahead?
What are the biggest challenges you face today in managing and using content?
What do you suspect will be the biggest challenge you will be facingin the next five years?y
What are the opportunities emergingWhat are the opportunities emergingto leverage content in your business?
An Essential Response: Content Engineering
Working DefinitionWorking DefinitionThe application of rigorous engineering discipline to the design developmentto the design, development and deployment of content management and processing systemsprocessing systems
Distinguishing FeaturesSystematic approachProgressive use of technologyAwareness of
Lifecycle considerationsyTotal cost of ownershipSolution scalability
Engineering and ContentOrganizing workOrganizing work
Laying outwork spacesSequencing ofSequencing of process stepsOptimizing tasksRefining toolsgImproving materialsTransferring results between stagesgSharing resourcesPerforming maintenanceTroubleshootingproblems
Differential Analyzer – Vannevar Bush (1930s)
Content EngineeringContent EngineeringContent Engineering
Governing disciplineGoal-directed
C t t M tContent ManagementProtect Value
Content ProcessingContent ProcessingEnhance Value
PeopleCreate Value
PlanningDesigningg gAuthoringClassifying
Content Management ComponentsContent ManagementContent Management
ControlOrganize resources, access and lifecycleand lifecycleChangeFacilitate the evolution of content and the associatedcontent and the associated servicesDeployEnable the servicesEnable the servicesthe content makespossible
Control Change Deploy
Content Management and Content Processing
A Close RelationshipA Close RelationshipCM cannot exist without content processing services
Expanding CM services demands more processing
The sophistication of the processing functions increases more rapidly thanincreases more rapidly than management functions
Many CMS solutions areMany CMS solutions are constrained by weakcontent processing capabilities
Content Processing Components
Content ProcessingContent ProcessingConvertTransformPublish
Key Focus in C t tContent Engineering
Content Processing ComponentsContent ProcessingContent Processing
ConvertTransformPublishPublish
TransformationBreaks down into
RefactorRelateCollectCollectResolveCompile
Emphasis on leveraging efficient automation
The Content Processing RoadmapACQUIRE ENRICH DELIVER
CONTEXT Import SelectMetadata
ContentProcessing Convert Collect Compile
ManageImport Select PublishCONTENT
ContentProcessing Refactor Relate Resolve
CONNECTIONS Import SelectLinks
Content Processing Roadmap
Simplified view of content processing systemsSimplified view of content processing systems
Presents the lifecycles of and interconnections between: ContextContentConnectionsConnections
Introduces three lifecycle stagesAcquireAcquireEnrichDeliver
The Roadmap is an Activity DiagramRepresents what needs to happen & the relationships between them
Convert ContentACQUIRE ENRICH DELIVER
CONTEXT Import SelectMetadata
ContentProcessing CompileCollectConvert
ManageImport Select PublishCONTENT
ContentProcessing Refactor Relate Resolve
CONNECTIONS Import SelectLinks
Converting Content
??
Conversion: changing the format of legacy content to make it increasinglysuitable for efficient management, revision, reuse and publishing.
Conversion Process Template
Source to S bj tTargetInteractionSource
Analysis
Source to Target
Mapping
SubjectMatterExperts
Legacy
Target XML
Schema
Guidance
Modify Conversion
Process
LegacySourceContent
ModifiedConversion
Rules
ManualEditing
ExistingConversion
Rules
Execute C i Result Identified
I iExample 1
Conversion Process
esuAnalysis
de edIssues Interaction
pSet
SampleSet 10%
2
Validation &Verification
ApplicationTests
CompleteSet 100%
3Complete
Refactor ContentACQUIRE ENRICH DELIVER
CONTEXT Import SelectMetadata
ContentProcessing Convert CompileCollect
ManageImport Select PublishCONTENT
ContentProcessing Relate ResolveRefactor
CONNECTIONS Import SelectLinks
Refactoring Content
Refactoring: restructuring content, without loss of meaning, to improve itsg g , g, psuitability for management, maintenance and specifically reuse. Refactoring entails two activities: bursting & normalization
Refactoring Strategies
Strategy needed to ensure adequate returns on investmentStrategy needed to ensure adequate returns on investmentApproach must balance cost, risk, effort and time in a practical way
Con
vers
ion
Out
puts
Com
pare
Out
puts
Collect MetadataACQUIRE ENRICH DELIVER
CONTEXT Import SelectMetadata
ContentProcessing Convert CompileCollect
ManageImport Select PublishCONTENT
ContentProcessing Refactor Relate Resolve
CONNECTIONS Import SelectLinks
Collecting Metadata
M t d t t f d t th t id i f ti b t th d tMetadata: a set of data that provides information about other data.Collecting Metadata: extracting, validating, integrating, supplementing, synchronizing and storing metadata from, and about, the content.
The Function of MetadataMetadata is used to make the context of content explicitp
Used to facilitate Control
SecurityLimitation of rights
Orderly storage & retrievalDiscovery
SearchingNavigating
Exchange
Surprisingly important pointThe boundary between metadata and content is never completely clear Yale University Library
Sources of MetadataMetadata can be supplied from an external sourceMetadata can be supplied from an external source
System dataCaptured when content is created / modified
Subject informationSubject information Declaring details about the subject matter
Keywords, short descriptions,…Externally managed data about subjectExternally managed data about subject
Author contributions Annotations, justifications, abstracts,…
P t t ( iti ll i t t)Process context (critically important)Relating content to business process events
Metadata can be extracted from the contentSpecific aspects of the content are selected as valuable metadataOften one of the more precise aspects of subject-specific markup
The Storage of Metadata
Useful Design Pattern: Detachable MetadataUseful Design Pattern: Detachable MetadataKey metadata clustered into a document sub-componentShareable amongst many usesIncorporated into documentwhen important to do so &only then
Ontologies, Taxonomies & Metadata
The Meaning of MetadataThe Meaning of MetadataMetadata categories and values relate content to aspects of an Ontologyan OntologyThe Ontology provides the context for metadata
OntologiesDescribe a domain of knowledgegCan be used as the basis of:
Taxonomies (classification schemes)Link networksLink networksContext driven navigational aids
Establish RelationshipsACQUIRE ENRICH DELIVER
CONTEXT Import SelectMetadata
ContentProcessing Convert CompileCollect
ManageImport Select PublishCONTENT
ContentProcessing Refactor ResolveRelate
CONNECTIONS Import SelectLinks
Establishing Relationships
Explicit Links (Actual)
Identifier Source Target Type
A1
A2
Implicit Links (Potential)
Identifier Source Target Type
B1
B2
Reuse Links (Physical)
Identifier Resource Request ConditionIdentifier Resource Request Condition
R1
R2
Links: the connections or relationships that represent a significant portion of the meaning and value of content
Relationship Considerations
Effective linking is central to content usability & valueEffective linking is central to content usability & valueAbility to provide content tailored to a specific user context depends on being able to facilitate immediate access to additional information
Linking is highly contextualNot all relationships are relevant at the same timeHow relationships are presented is format and media specificOften leads to additional rendition requirements for content objects
Multiple renditions of graphics (thumbnail, low-res, high-res)
Links have become acknowledged as First-Class ObjectsSubject to specific management and processing measuresj p g p gIdeally expressed & managed separately from the content (overlays)Associated with metadata & constituting important content metadata
Link Management
Increasingly Link Analysis:Increasingly importantIncreasingly
metadata
Outbound Links: Intact or brokenTransclusions: Where usedInbound Links: Track-back / Where citedExternal Links: Network participation
complexLink Analysis
Significant
metadataOutbound Link
Significant processingLeverages external
Transclusion Link
kexternal storage of links& link metadata
Li k ti
Inbound LinkBidirectional External Link
Link generationbecoming critical
Link Base
Deliver ContentACQUIRE ENRICH DELIVER
CONTEXT Import SelectMetadata
ContentProcessing Convert Collect Compile
ManageImport SelectCONTENT Publish
ContentProcessing Refactor Relate Resolve
CONNECTIONS Import SelectLinks
Delivering Content
Compile Publish
R l bl t t d i t ti t li bl l ti hi
Resolve
p
Resolve: assemble content and instantiate applicable relationshipsCompile: convert resolved content into a form suitable for renditionPublish: render the content in the forms required by the context
The Goal: High Fidelity AutomationPrint Publishing
(PDF)Content (PDF)
Deliver- Resolve- Compile- Publish
Web Publishing(Portal / Portable)
Publish
Output Print Products PDF
Rules
s
Delivery ProcessingAssembling the inputs
Content requested Content
Res
olve
TemplatesOutput Plan
(Map & View)
utpu
t Var
iant
s
Ren
der
ansf
orm
atio
ns
Content requestedSupporting assetsApplicable stylesheets & rules
Resolve into a processable whole
Content
Output Web XHTML
Assets
Ou
Tra
Compile
Resolve into a processable wholeCompile formattable content representationsPublish final formatted renditions
Products XHTML
Content Processing & Validation
ValidationValidationEssential capabilityEnables consistent
iprocessingStreamlines processes
Validation must beAccurateManageableInformativeActionableActionablePro-activeContinuously improving
Validate & Transform: SimpleContent Validation
DTD structural rulesInstance conformance
Content TransformationTraditionally focused on arranging content for formattingSupporting primarily structural manipulation
V lid t d O t tValidated OutputsInputs to rendition processesHTML outputsXML outputs
Content
Schema Rules
Validate & Transform: ComplexContent Validation & Verification
Instance
Structure Validation Content Verification
Schema structural rulesRules governing content valuesInstance conformanceInstance conformance
Content TransformationContinuous process of improvement
TransformationProcessing
Continuous process of improvementParse, validate, align, verify…repeatManipulation of many content types
Validated OutputsInputs to rendition processesHTML t t
Outputs
HTML outputsXML outputsData outputs for applications
Complexity and the Cost of Quality
Complexity is inherent inComplexity is inherent in the nature of content
Increasing content complexity increases the amount and sophisticationamount and sophistication of content processing tasks
I iIncreases in content processing tasks results in a significant increase in the gtotal cost of quality
Solution ArchitecturesAssembles
Content EngineeringAssembles
componentsto provideintegrated
Engineering
gservices
Technologyselection &
SolutionArchitectures
Content Processing
Content Management
selection &integration
StandardsConvert Transform Publish
selection &integration
Multiple
Refactor Collect Compile
Multiple solution instances will exist
Relate Resolve
Validate
Solution ResponsibilitiesEssential Capabilities
Format interpretationContent mappingBursting & normalization
Metadata LinksBursting & normalizationContent validationMetadata collection
&Link extraction & creationManagement
Content
Import PublishContentProcessing
MetadataLinks
DeliveryDeliveryAssembly & transformationProduct rendition
Content
Managing Solution Risk
Integration risk representsIntegration risk representsThe potential loss of servicesThe potential loss of assets
Integration risk increases with the increase in the number of technologies used to build a solution
System complexityCan be managed Ultimately limits solutionUltimately limits solution affordability and even viabilityAddressed in design selections
Solution Component Dependencies
MediaSources
ProcessRules Schemas
StructureMaps Content
Files<XX>
StyleSh t
ProcessingScripts
DocumentTemplates Data
SImport
SheetsABCScriptsTemplates SourcesSources
Relationships
A
LogReports
QualityReports
AnalysisReports
Bx y.. .... ..
ConfigurationFiles
Because all components within a solution evolve their inter-dependencies require explicit description and management.
Technology Selection
Key ConsiderationsKey ConsiderationsSolution contextScored against
i trequirementsScoring scale
0 – No Fit6 T t l Fit6 – Total Fit
Results weighedagainst acquisition cost
Technology Lifecycle Considerations
Solution context includes High HighMeasuring OverallSolution context includes
UrgencyComplexity
Measuring Overall Productivity over Time
CriticalityConstraints
TiProjected lifecycleExpected lifespanRate of change
LowTime
Complexity
Rate of changeInfluencing factors
High High
Evaluating Standards as Potential ToolsIndependenceIndependence
From parochial interests, proprietary claims, external influences
FormalityOf creation, validation, approval & modification process
StabilityOf standard over time & the backward compatibility of changesOf standard over time & the backward compatibility of changes
CompletenessSufficiency for declared scope as well as availability of
f l d t ti & f i l t tiuseful documentation & reference implementations
AdoptionExtent of support amongst tool vendors, authorities & userspp g , &
PracticalityThe extent to which all, or parts, of the standard can be deployed
Evaluating a Specialized Industry Standard
ScenarioScenarioIndustry specificationB dBroad scopeSpecialized stakeholder
itcommunityContinuouslychanging & di& expanding
StrategyImplement whereImplement where necessaryAddress risk areas
Evaluating a Cross-Industry Standard
ScenarioScenarioAddressing widespread issuesB d t k h ldBroad stakeholder communityMatureFurther capabilities emerging
StrategyPlan for adoptionC id f iConsider for use in variety of areas
Content Solution Architecture FrameworkEnterprise
trols
Programs Domains
Document SourcesActive
Con
t
ed s Publishing Services WebDocument Sources
Ontology Sources
External
Spe
cial
ize
Mod
els
Rul
esLegacy
Publishing Services
Discovery Services
ApplicationInte
grat
e
Content ArchitectureData Sources
Inputs Outputs
MechanismsUsers Tools
Data Services
Authors
Subject Matter Experts
Content Management
Content Processing
Resources
Budget
Mechanisms
Administrators
Information Architects
Developers
Content Authoring
Development Tools
Web Services
Budget
Personnel
Infrastructure
Content ArchitectureEstablishes Content
E i iEstablishesworking modelof the knowledgedomain
Content Architecture
Engineering
The knowledgethat has informed
SolutionArchitectures
Content Processing
Content Management
informedthe content
The knowledgeConvert Transform Publish
gbeing encapsulatedin the solutions
Refactor Collect Compile
Supports multiple solution instances
Relate Resolve
Validate
The Central Role of the Content Architecture SpecializedService Content Discovery TaxonomiesRequirementsArchitecture Requirements
Concept ReferenceTaskData Data
Description
D i ti
DescriptionTopic
Procedure
SpecializedInformation Types
Description
Data Data
DescriptionDataData
Specialized
Procedure
Procedure
Effectivity
y
Data DataData
Delivery Processes
FormattingAnnotation Procedure
Procedure
Specialized Domains
Data Data
Data
Data
Change Procedure
Procedure
Content Solution Design Principles
The nature of content demands an adaptable architectureThe nature of content demands an adaptable architecture
Technology components should be loosely-coupledContent must always be available in its simplest self describing formContent must always be available in its simplest self-describing form
Data stores should be replaceable by stored instancesTrue for content, metadata and links
Content processing events can be performed many waysSimple methods must be present, sophisticated methods may be
All interfaces established as the exchange of validated contentgProcessing rules are, themselves, managed & processable content
Content Processing should be extensively leveragedg y gContent validation, analysis and reporting at every stage Used to manage & optimize solution components to improve efficiency
Content Engineering Maturity Model
Modeled on the Software Engineering Institutes (SEI)Modeled on the Software Engineering Institutes (SEI)Capability Maturity Model Integration (CMMI)
“managed” used instead of “quantitatively managed” for level 4“ t d” d i t d f “ d” f l l 2“repeated” used instead of “managed” for level 2“reactive” used instead of “performed” for level 1
ObjectiveLevel
Content Engineering Maturity ModelObjective
Follow softwareengineering in
h i i th
Optimized
Managed
5
4
emphasizing theimportance of formalization &quantitative methods
Defined
Repeated
3
2
quantitative methodsfor continuousimprovement
Reactive
Incomplete
1
0
CE Maturity Model: Level 0 Incomplete
IncompleteIncompleteOften the complete absence of a documented processA process that is documented but not followed also qualifies
FeaturesNew requirementsqaddressed usinglegacy toolsEach solution seeks cost minimizationNo persistentinfrastructureNo improvementbetween projects
CE Maturity Model: Level 1 Reactive
ReactiveReactiveA process exists for specific goalsSufficient for the needs of some projectsNot institutionalized and not integrated with institutional processes
Features LevelContent Engineering Maturity Model
Not designed tohandle new orchanging
Optimized
Managed
5
4g g
requirementsCan result in multiple solutions
Defined
Repeated
3
2p
each created as areaction
Reactive
Incomplete
1
0
CE Maturity Model: Level 2 Repeated
Repeated
☺Repeated
A managed process exists and is supported by basic infrastructurePredictability can be achieved in process performance & projectsReviews are conducted to identify & initiate improvements
Features LevelContent Engineering Maturity Model
A common set of tools has been selected
Optimized
Managed
5
4
Procedures exist for stepsSolution
Defined
Repeated
3
2Solution componentsdocumented
Reactive
Incomplete
1
0
CE Maturity Model: Level 3 Defined
Defined
UnusualDefined
Standardization in processes established on an institutional levelCommon tools & techniques used across processes & projects
FeaturesA single Level
Content Engineering Maturity Modelginfrastructure usedto support multipleprocesses &
Optimized
Managed
5
4
projectsProcesses definedwith reference to
Defined
Repeated
3
2
enterprise modelsInterrelationships are known
Reactive
Incomplete
1
0
CE Maturity Model: Level 4 Managed
Managed
IdealManaged
Processes are managed using quantitative measurementAutomation is maximized in the execution of process stepsA single integrated & managed environment supports all processes
Features LevelContent Engineering Maturity Model
Infrastructure components managed as content
Optimized
Managed
5
4g
with automation used to adapt behaviour
Defined
Repeated
3
2
High levels ofquality sustained
Reactive
Incomplete
1
0
CE Maturity Model: Level 5 Optimized
Optimized
MythicOptimized
Continuous orientation towards improvementContinuous refactoring of solution and content to achieve efficienciesContinuous identification & implementation of heightened standards
Features LevelContent Engineering Maturity Model
Systematic analysis& correction of variations
Optimized
Managed
5
4
Proactive identification of newproducts & services
Defined
Repeated
3
2pthat can be offeredIndustry innovation
Reactive
Incomplete
1
0
General ObservationsContent is inherently complexy p
Current trends have moved content to the center of attention
Content Engineering is an essential responseContent Engineering is an essential responseProvides the necessary discipline & the conceptual frameworkContent has not typically received this level of attention in the past
Effective Content Processing is central to successContent Management services are enabled by content processesAdaptive content processing is essential for addressing changeAdaptive content processing is essential for addressing change
Effective Content Solutions are designed to cover the complete content lifecycle and to reflect all stakeholder perspectives
The efficient management and processing of content remains an elusive goal for most organizations
Content Engineering and Business Value
The design of Content Solutions shouldThe design of Content Solutions shouldContinuously minimize the costs of acquiring, enriching, managing and delivering contentand delivering contentContinuously improve content quality through enrichmentC ti l i thContinuously increase the benefits realized throughthe delivery of contentC ti l d i kContinuously reduce risksthreatening content assets or the services being supported
Each of these represents an increase in value
Top Ten Secrets of Content Solution SuccessDon’t underestimate your content or your businessDon t underestimate your content or your businessDon’t underestimate the power of good automationChose an appropriate tool set and validate your choicesDon’t invest in content management technology too earlyCarefully plan and execute migration activitiesTake a “customer service” focus in delivering tangiblebenefits (new products / services) from your investmentsBe demanding of your suppliers (expect quality)Be demanding of your suppliers (expect quality)Engage your stakeholders and “take control” of the solutionLeverage standards, don’t be enslaved by themg , yBe an active part of the community as a way to learn and as a way to share what you have learned
Discussion
Questions & CommentsQuestions & Comments
Contact
Joe GollnerJoe GollnerVP e-Publishing SolutionsStilo Internationaljgollner@stilo [email protected]