39
Sustainable XML for Publishing Applications: DITA Makes It Possible Eliot Kimber, Really Strategies, Inc. DocTrain East 2008

Sustainable XML for Publishing Applications: DITA Makes It Possible

Embed Size (px)

DESCRIPTION

Presented by Eliot Kimber at Documentation and Training East 2008, October 29-November 1, 2008 in Burlington, MA. XML applications for publishers have largely failed to realize the full potential inherent in the technology. While larger publishers could make the investment necessary to realize significant return on the use of XML technology, smaller enterprises simply could not, for a number of reasons, but fundamentally because the startup costs and ongoing costs of ownership were simply too high. The DITA standard fundamentally changes the equation, bringing several unique features that, together, serve to lower both the startup cost and ongoing costs, making the use of XML for publishers much more affordable than it ever has before. At the same time, advances in supporting technologies important to Publishers, such as improved support for XML in Adobe Creative Suite and Microsoft Office, powerful new XML search and retrieval systems such as MarkLogic, and a new generation of lower- cost XML editors, as serve to make the use of XML for Publishing applications more attractive than it ever has been before.

Citation preview

Page 1: Sustainable XML for Publishing Applications: DITA Makes It Possible

Sustainable XML for Publishing Applications:

DITA Makes It Possible

Eliot Kimber, Really Strategies, Inc.

DocTrain East 2008

Page 2: Sustainable XML for Publishing Applications: DITA Makes It Possible

Preliminaries

Page 3: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Who Is This Talk For?

Publishers who want to implement XML-based solutions

Publishers who have XML-based solutions that need to be enhanced, refined, or upgraded

Creators of XML-aware tools applicable to publishing use cases

Service providers who support the development and use of XML-based solutions for Publishers: Integrators Data conversion houses Consultants

Page 4: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

About Me

Senior Concept Prover at Really Strategies Inc. 20+ years experience with generalized markup

(GML, SGML, XML, etc.) Career focus on large-scale hyperdocument creation

and management Focus for last 8+ years on Publishing use cases

around XML-based publishing workflows Active member of the DITA Technical Committee Founding member of the XML Working Group Long-time member of the XSL-FO Working Group Co-editor of the ISO/IEC HyTime standard

Page 5: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Audience Survey

Who is here? Publishing for profit? Publishing as a cost? Technical Documentors? Service providers? People just interested in DITA?

DITA knowledge: No idea what DITA is? Know about DITA a little? Familiar with DITA concepts and details? Using DITA now or implementing DITA-based solution?

Page 6: Sustainable XML for Publishing Applications: DITA Makes It Possible

Brief Overview of DITA

Page 7: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 20087

What Is DITA?

OASIS Open Standard: Darwin Information Typing Architecture

An XML architecture standard for representing human-consumed information

Some distinguishing aspects of DITA as an XML architecture: Formal mechanism for controlled definition of new

vocabularies (“specialization”) Optimized for information modularity (“topics”, “maps”) and

blind interchange Standardized document type implementation design patterns Growing off-the-shelf processing infrastructure Sophisticated hyperlinking features (“relationship tables”)

Currently at version 1.1, version 1.2 in final stages of review and approval

Page 8: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Key DITA Concepts Briefly Explained

Topics Topic content is paragraphs and stuff Standalone units of information Topics may directly contain other topics

Maps Hierarchical sets of links to topics Establish organizational hierarchies for sets of topics May have many maps over the same topics Can impose metadata onto topics Can impose topic-to-topic hyperlinks (relationship tables)

Specialization New element types are “subclasses” of existing types

Page 9: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Map Two

Map One

Oooh, A Picture

TopicsMaps

TopicA

TopicB

TopicC

TopicD

TopicE

TopicF

Page 10: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Output Results

Map Two

Map One

Map to PDF

I. Topic C 1.1 Topic B 1.1.1 Topic A 1.1.2 Topic D 1.2 Topic F

I. Topic F 1.1 Topic B 1.2 Heading 1.1.1 Topic A 1.1.2 Topic E

Page 11: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Specialization

DITA standard defines a set of base element types: Topic, map, section, paragraph, figure, table, phrase, data

All other elements based on these base types Establishes a formal class hierarchy for all element

types in any DITA document Every element type maps back to some standard-

defined base type Declaration mechanism is formal and simple:

Uses element attributes (“class=“) Can be processed by almost any XML tool, including CSS

selectors Even works for DTD-less documents

Page 12: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA Compared to Other XML Options

Page 13: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200813

Compare DITA With…

DocBook Book-focused (not inherently modular)

Can use XInclude to manage information in modular fashion No facility comparable to DITA maps

Mature standard Very large tag set reflecting union of wide set of requirements No formal vocabulary extension mechanism Blind interchange not really possible Deep off-the-shelf infrastructure

Page 14: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200814

Compare DITA With…

NLM Optimized for journals not books No formal vocabulary extension mechanism Little off-the-shelf infrastructure

Page 15: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200815

Compare DITA With…

PRISM/PAM Essentially XHTML with sophisticated metadata Optimized for serialization, not authoring and archiving Little off-the-shelf processing infrastructure

Page 16: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200816

Compare DITA With…

Custom XML application Expensive to develop and maintain Can be optimized for local requirements Processing infrastructure must be built from scratch

Content management Authoring tool configuration and customization Publishing pipelines Interchange transforms

No blind interchange possible

Page 17: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200817

About That…XML Application Development Costs

Information requirements analysis is always required Using a standard XML application still requires that

you determine how to apply it to your requirements All useful standard XML applications…

…Provide more stuff than you need …Fail to provide some things specific to your requirements

Amount of analysis required reflects your business problem, not standard chosen

Thus: cost of analysis is essentially invariant regardless of implementation choice

Main variable is cost of system implementation: Implementation of XML document types (DTDs) Implementation of management and processing

Page 18: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200818

XML System Cost Analysis

Three distinct cost domains: Initial system development Cost of use (training, skills required, cost of tools) Maintenance and refinement over long time scale

Ideal implementation base minimizes all costs: Low cost of acquisition and implementation Low cost of use, skills and knowledge common in user

population, tools are appropriately priced Low cost of refinement, extension, interchange, management

Cost evaluated in terms of value: Short term: ability to meet immediate requirements with

lowest initial cost Long term: ability to support new requirements with lowest

cost of maintenance and extension

Page 19: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200819

DITA Largely Meets the Ideal

Lowest possible cost of initial solution development Implementing custom doctypes very low cost Many off-the-shelf tools “just work” with little or no

customization or configuration Large and growing body of use-case-specific DITA modules

Large and growing body of DITA knowledge Standard is well written Many service providers with solid DITA knowledge Growing body of published DITA how-to information

Controlled extension (“specialization”) means: Knowledge about one DITA application transfers to all other DITA applications Extensibility and interchange are optimized Implementations can optimize their own modularity and flexibility

Page 20: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200820

My Assertion: DITA Is Almost Always Best Fit

DITA can be easily and practically applied to almost all documentation use cases (not just tech docs)

DITA’s unique features minimize initial cost of ownership and implementation

DITA’s unique features optimize interchange of content

DITA’s unique features maximize flexibility and stability of supporting tools

Therefore: DITA provides maximum value compared with other

alternatives Main cost is acceptance of a few constraints that enable

DITA’s value

Page 21: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA Myths Busted

Page 22: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200822

DITA Myth One: DITA Is Only For Tech Docs

DITA is a layered, flexible standard Originally driven by technical documentation

requirements… …but, core features are completely generic DITA has been used for:

Government reports Financial standards Test preparation books Travel guides

No inherent restrictions on the kind of publications DITA will work well for

Page 23: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Forest for the Trees: It’s Still Just XML

DITA has lots of cool features, some quite sophisticated

This sophistication can be scary But... …It’s still just XML You don’t have to use any particular feature of DITA Users don’t necessarily need to know it’s DITA If it being DITA-based doesn’t help at the moment,

don’t talk about it To the non-DITA-aware it looks like any other custom

XML application

Page 24: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200824

DITA Myth Two: DITA Requires Topic-Based Writing DITA standard is optimized for modularity But it does not require that content be stored or written as

modules Use of DITA maps is entirely optional Topics can physically contain other topics An entire book could be marked up as a single XML document

consisting of one root topic and many child topics Such a topic would be indistinguishable from any other similar

XML document (e.g., an NLM article, a DocBook document)

Page 25: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200825

DITA Myth Three: DITA Is Hard

DITA has lots of features, some quite sophisticated Making full use of all these features requires understanding

those features, of course But at its simplest, DITA is just like any other XML document

type for publications: sections, paragraphs, lists, figures, tables, and inlines.

Thus, a DITA application need only be as sophisticated as you need it to be to satisfy your specific requirements

Complexity and “difficulty” of DITA is concentrated in the data processing requirements, not in authoring

Ability to easily define custom vocabularies means you can optimize markup names and structures to reflect local culture and practice

Page 26: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200826

In Short: Why DITA? Why Not DITA?

DITA can be applied where any other applicable XML standard can be applied At lower absolute cost With greater flexibility With greater potential value

Cost of using DITA at worst no greater than using XML generally

So why not use it?

Page 27: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Yeah, But…

I’ve said a lot of stuff What do you need from me to be convinced?

Page 28: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA As Applied to Publishing

Page 29: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Publishing-Specific Challenges

Existing vendor solutions and community knowledge focused on tech doc requirements

Many vendors don’t really get DITA Many tools don’t yet fully support specialization Many older tools limited by architecture and

implementation choices made years ago Many service providers still building understanding of DITA

Publishing requirement for high-quality composition always a challenge for any XML-based solution

Publishers have different business drivers from tech doc

Page 30: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Note to Vendors and Service Providers

Potential market for DITA as a publishing solution orders of magnitude larger than potential market for DITA as a tech doc solution

Many more publishers and units of publication than tech doc producers

Tech doc is a cost center Publishing is profit center In many ways, the value of DITA to publishing is

more compelling than it is for tech doc Just saying…

Page 31: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200831

Publishing: Open Toolkit Alone Won't Cut It

Pages are and will always be important Need a path from DITA XML to publishing tools

InDesign Quark Etc.

No technical barrier to a generic DITA-to-InDesign process

Products like Typefi could add significant value Several advantages for Publishers:

Uses existing layout design skills, tools, and methods Can be 100% automatic or include human tweaking Can leverage Toolkit for preprocessing

Page 32: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200832

Where Could Publishers Go From Here?

DITA as specialized in the DITA spec not always appropriate for Publishers Too constrained in some areas Needs more ways to capture format intent May not match existing publishing practice or conventions

well Might be useful to define a separate publishing-

specific specialization family rooted at DITA topic rather than at concept/task/reference

Can a novel be single topic or small set of topics? [Yes]

Does that help? Does it it hurt?

Page 33: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200833

Business Process Improvement Implications

Base cost of using XML is essentially unchanged Same challenges for legacy conversion Applying XML at end of process Using XML as input to revision process

Cost of developing initial markup design can be significantly lower

Can have more generic, reusable processing components

DITA encourages and enables small modules Makes recombination at small granularity possible and

manageable Adapts well to delivery to portable devices.

Page 34: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Business Improvement Implications (Cont.)

Incremental cost of DITA-based systems should go down over time as infrastructure acretes

Enables local optimization of markup without impeding interchange within an enterprise or across enterprises

Provides controlled, formal framework for defining common components used across parts of enterprise or communities of interest

Enables use of more sophisticated features as needed

Page 35: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200835

Potential Information Economy Improvements

Reduce tight coupling between suppliers and consumers (aggregators, republishers, etc.)

No need to agree on rigid, overly-general standards to enable interchange

Supplier need not have full publishing infrastructure in order to supply high-quality content

Information consumers can apply a generic DITA processing infrastructure to content from many suppliers

Increases value of information in DITA form Reduces impedance of interchange

Page 36: Sustainable XML for Publishing Applications: DITA Makes It Possible

Wrap Up

Page 37: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

In Conclusion

DITA has lots of unique goodness of direct and compelling value to publishers

DITA can be used in simple ways to good effect with low cost of entry

DITA’s low cost and strong features represent a compelling value for almost all XML-based publishing use cases

Full DITA infrastructure still being developed Off-the-shelf DITA-to-InDesign processes Standard publishing-specific DITA modules Community knowledge of how best to apply DITA to

publishing use cases

Page 38: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 2008

Questions?

?

Page 39: Sustainable XML for Publishing Applications: DITA Makes It Possible

DITA for Publishers DocTrain East 200839

Thank You

Eliot KimberReally Strategies

[email protected]