Eml 2003 Nordstrom 01

  • Upload
    aaslaan

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Eml 2003 Nordstrom 01

    1/17

    Extreme Markup Languages 2003 Montral, QubecAugust 4-8, 2003

    Linking strategies

    Ari NordstrmSrman Information & Media AB

    Abstract

    We need a linking strategy for creating good, single-source, cross-references

    that work as well on paper as in a web page. There is more to the problem of

    linking than the semantics of the link, because the text surrounding a link

    should probably be different, depending on the medium of publication. Paper-

    based links, for example, generally work best with generated text, while

    hyperlinks are hotspots naturally inserted in the document content. A supportive

    authoring environment is needed for writers of single-source material.

  • 8/8/2019 Eml 2003 Nordstrom 01

    2/17

    Linking strategies

    Table of Contents

    1 Introduction....................................................................................................................................1

    2 The problem and some solutions....................................................................................................1

    2.1 One easy way out..................................................................................................................2

    2.2 A more modern approach......................................................................................................3

    2.3 An (even more) modern approach.........................................................................................4

    3 A word about existential issues and uniqueness.............................................................................5

    4 The problem revisited and expanded..............................................................................................5

    4.1 An approach to single-source writing....................................................................................6

    4.2 Profiling links........................................................................................................................6

    4.3 Purely XLink.........................................................................................................................7

    4.4 An entity-free solution..........................................................................................................8

    4.5 A real-life example................................................................................................................9

    5 An aside on other types of links.....................................................................................................9

    5.1 Fragment inclusions..............................................................................................................9

    5.2 Images.................................................................................................................................105.3 Profiling any of the above...................................................................................................11

    6 Tools of the trade.........................................................................................................................11

    7 Discipline!....................................................................................................................................12

    8 Conclusions..................................................................................................................................13

    Footnotes.........................................................................................................................................14

    Acknowledgements.........................................................................................................................15

    Bibliography....................................................................................................................................15

    The Author......................................................................................................................................15

  • 8/8/2019 Eml 2003 Nordstrom 01

    3/17

    Linking strategies

    Ari Nordstrm

    1 Introduction

    Im a big fan of FrameMaker, the desktop tool, and have been for many years. Ive written thousands ofpages with it, and it has rarely let me down. I love the tool, as long as I dont have to use it for SGML orXML content. To be completely honest, what always does it for me is FrameMakers cross-referencingcapabilities. Its very easy to create cross-references as an author, and its equally easy to define newcross-referencing formats if youre a template developer. Technically, its a marvel to behold.

    Its not particularly strange, then, that every new linking facility I use is measured against FrameMaker.Every feature, every way I can create, modify, or in other ways manipulate a cross-reference with a newtool is compared to FrameMaker. And you know what: in most cases, in spite of the apparent power ofXLinks, FrameMaker wins. Its sheer magic to have the generated text (e.g., See Section 5.1.3 on Page38) appear, be updated, or traversed. Its a bit perverted, I know, but if thats what it takes, then fine; Imgoing to continue doing it, in awe.

    Now, FrameMaker is a desktop tool, and while its been marketed as a single-source publishing tool, itsheavily geared towards paper output. Its the nature of desktop publishing, really, and theres nothing

    wrong with focusing on paper publishing. But in these times of XML, markup, and all things X, weremoving away from paper, or at least expanding or extending from that particular field of publishing.Single-source publishing is hot, and so whatever tool youre going to pick, youll have to think about howto best use your tool, not only for paper, but also simultaneously for other publishing media, such asCD-ROMs or the web.

    Cross-references, of course, become quite different once you leave paper, or otherwise expand yourhorizons. Page references, obviously, are the first thing to go; while there are pages on the Internet,they are not the kind of pages you can or should number. Your cross-references become hyperlinks, hotspots that traverse a link once clicked on.

    The question is, how do you create good, preferably single-source, cross-references that work just aswell on paper as on a web page? Or more generically, what kind of linking strategy should you use to beable to write single-source and publish the results anywhere, without the awkwardness1 so typical totodays rather limited reuse? And remember, were not just talking about cross-references; the same

    strategies should apply to fragment inclusions, image referencing, or just about any link imaginable.

    NOTE: This is an area where FrameMaker falls a bit short; actually, its a bit on the heavy side with

    the whole single-source concept, and there really isnt a very good way to create links that works

    in both media. What a disappointment for me! Fortunately, theres XML.

    2 The problem and some solutions

    A paper-based cross-reference is basically a page reference, perhaps with the targets title or number, orboth, included. To a reader, its just text similar to other content and just a static pointer, at least until IBMfinishes its active paper project. Heres an example:

    Rabbits are outside the scope of this section. For detailed information about these lovable

    creatures, see Section 7, Rabbits, Page 91.

    In a reasonably well-implemented system, the word Section is generated and probably based on thetarget element type. Rabbits, on the other hand, is fetched from the target elements title; itsreasonable to assume that its the first child of a section element. The page reference is generated

    when producing the paper output, and the exact method of how to achieve it depends on the print processused.

    The source XML might look like this:

    2003 Ari Nordstrm

    Linking strategies

    Extreme Markup Languages 2003 page 1

  • 8/8/2019 Eml 2003 Nordstrom 01

    4/17

    Rabbits are outside the scope of this section. Fordetailed information about these lovable creatures,see .

    For now, lets simply assume that the ref element is an IDREF-based pointer. This particular link, then,

    requires that theres an element with an ID attribute with the value "id-rabbits" somewhere else

    (probably in the aforementioned section element) in the same physical XML file.

    An online reference, on the other hand, is a hyperlink that ideally is embedded in text. Its anactive link, traversed when clicking on it. Now, if we made a direct online translation of the abovemarkup, the result might look like this, with a hyperlink instead of the node count, generated text, andpage reference:

    Rabbits are outside the scope of this section. For detailed information about these lovable

    creatures, see Rabbits.

    While this works, its wordy and doesnt really look like a sentence that a writer would have produced,had she been able to write for online publishing only. Its clear that what we have here is a baddish caseof single-source publishing. Preferable would be something like this:

    Rabbits are outside the scope of this section.

    Nice, clean, and well-adjusted to the online way of reading. But what if we preferred to exclude the linkaltogether when publishing online? If we simply instructed the conversion script to exclude the link, the

    result would be, well, less than adequate:

    Rabbits are outside the scope of this section. For detailed information about these lovable

    creatures, see .

    Whoops. I have a couple of manuals that contain sentences like this, and Im subscribed to a number ofnews services on my PDA where references always seem to mysteriously disappear, so this isunfortunately not a rare occurrence. In almost every SGML or XML project Ive participated in, wherelinks have disappeared in one media or another, the DTD or schema has been to blame, sans direct bugsin stylesheets. What we need is a design approach to creating DTDs (or schemas, for that matter) thatmake conversion and other processing easier and more logical.

    2.1 One easy way out

    The problem with the above lies (mostly) in the scope of the linking element. A very common approach

    is to simply define a mixed content model, like this:

    In other words, there is a linking element but its only there to mark the spot where you want yourgenerated text, page reference, and so on.It does not identify the whole semantics of the link. In fact, thewhole sentence

    For detailed information about these lovable creatures, see Section 7, Rabbits, Page 91.

    is part of the link semantics. It says what the link is for, and it can survive without the ref element no

    better than the ref element can survive without it. Consequently, we need a wrapper element for not

    only the linking element, but for the whole semantic link:

    ...

    A markup that handles the link correctly then causes the following markup in our example:

    Rabbits are outside the scope of this section.For detailed information about these lovable creatures, see.

    This means that if we dont need the link (and its surrounding sentence) online, we can simply instructthe conversion script to exclude the xref-wrapper element and its contents.

    Nordstrm

    page 2 Extreme Markup Languages 2003

  • 8/8/2019 Eml 2003 Nordstrom 01

    5/17

    But what if we need a hyperlink, like the one in our example above, where we removed the paper-basedlink and its paper-styled sentence and used a hyperlink element embedded in the first sentence instead?Well, I guess you can now see where this is going:

    Rabbits are outside thescope of this section. For detailed informationabout these lovable creatures, see .

    This solution is pretty neat because on paper, the xref-wrapper element does everything needed. The

    ref element is used to create a node count, fetch the target elements contents (i.e., the targets title),

    and create a page count. The hlink element does not receive any special treatment; its contents appear

    as ordinary text. Online, however, the xref element is discarded, and the hlink element becomes a

    hypertext link.

    The downside to this approach, of course, is that whenever we need to create a hyperlink version of ourpaper-based link, we need to create the link twice. Its fairly logical, though, because while both links inthe example serve the same purpose, theyre not the same; theyre not identical. Its reasonable to expectthat theyre created separately. After all, they dont have to point to the same target; online, a differenttarget (or no target at all) may be required.

    There are other problems to our basic solution, too: for one thing, the ID/IDREF pair mechanism is

    clearly insufficient since it requires that both the source and the target are located in the same physical

    file. The ID/IDREF mechanism is largely a leftover from SGML, and not very useful when going onlinesince its probable that well use more than one file to achieve our goals. In fact, single-source documentcreation practically requires it.

    Lets refine what we have, then.

    2.2 A more modern approach

    As I pointed out, in my mind, the ID/IDREF mechanism is mostly a leftover from the SGML days. Its

    certainly a useful leftover for some applications, since any such linkmustbe validated by an XMLparser (well-formed XML documents lack the concept of the ID attribute type, a frequent source of

    criticism against XML 1.0). When moving to single-source publishing, XML fragments, and the like,this enforced validation is no longer practical since we cannot guarantee that the target resides in thesame physical file as the source.

    XLink(see [XLink Recommendation]) solves this particular problem.2

    XLink is the W3C linkingrecommendation, intended from the very beginning to provide XML with a standardized way ofexpressing links, or more generically, relations between resources. We no longer have built-in linkvalidation in the XML processor; on the other hand, such validation is often impractical.

    XLink comes in two flavors: simple and extended. Simple XLink is really a glorified HTML link. It has asource and a target, and the link information always resides in the source element. Simple XLink is

    expressed using attributes, which makes it very useful. Heres our ref element, expressed as a Simple

    XLink:

    xlink:href CDATA #IMPLIEDxmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink" >

    Obviously, theres more to XLinks, even the simple ones, but this will do just nicely for our purposes.So, applied on our basic single-source cross-referencing example, we get:

    Rabbits are outside the scope of this section.For detailed information about these lovable creatures, see.

    This is quite wordy, XML-wise, but essentially, it gives us what we need. However, it doesnt solve theproblem of having to create the link twice. A fancy customization of the XML editor can probably help

    Linking strategies

    Extreme Markup Languages 2003 page 3

  • 8/8/2019 Eml 2003 Nordstrom 01

    6/17

    out, but we might benefit from a different linking model, especially if the paper and online targets are

    always, or nearly always, the same.

    2.3 An (even more) modern approach

    Extended XLinkis a way to create multi-ended links. Often, people interpret this as meaning links with

    multiple targets and envision some fancy application that offers a choice of link targets, perhaps

    available using a context menu (in the world of Windows applications, this would be a right-click

    menu). This is literally a half-truth since an extended XLink could just as easily consist of multiplesources but only a single target.3

    Another nice thing with extended XLink is that the links can be expressed out-of-line, that is,

    independently from the resources that participate in the actual link. Allow me to give a crude example:

    If I say Tokyo is the capital of Japan, Ive created a link (or rather, a relation) between the two.However, Im not in Tokyo as I write this. Im not even in Japan, which means that neither Tokyo norJapan knows that Ive created this particular relation (even though I suspect that some Japanese fellowsare nevertheless aware of it anyway). Ive expressed the whole relation out-of-line.

    But herein lies a problem: Both my source and my target locations need to be uniquely identified, withunique names. In my little (admittedly mediocre) example, neither Toyko nor Japan are truly uniquenames even though most people will associate them with the same places that I do. But if they dont,then theyll probably not know it, and I, the creator of the link (relation), certainly wont know it either.

    In other words, my link is really not expressed well enough. See A word about existential issues anduniqueness for more on this.

    Now, getting back to our little cross-referencing example, if we want to express it using out-of-lineextended XLinks, we can start out by identifying the source elements (and the target, or targets) using ID

    attribute values since the nice thing about them is that they at least ensure their uniqueness within aphysical XML document. We can get by with something like:

    Rabbits are outside the scopeof this section. For detailed information aboutthese lovable creatures, see .

    Of course, we need a bit more than that to ensure the uniqueness of the ID attribute values, but for now,

    this will do. Ill get back to this whole uniqueness thing in A word about existential issues anduniqueness, though.

    Note that the linking elements above no longer contain any explicit linking information; theyre merelyelements that happen to have identifier attributes. They no longer know that they participate in a link.This relation we can now define in a separate, unrelated, XML document, like this (leaving out someXLink attributes that, although required, arent needed in this example):

    This is a basic out-of-line, multi-ended link. What it says is that the two source identifiers, identifiedusing locator elements, point to (have a relation with) the same target, identified with another

    locator element. The links use a level of abstraction by identifying the participating resources with

    labels in the arc element, instead of direct xlink:href addresses. This is a very useful quality with

    extended XLink, because it makes it possible to create classes of links, which is more or less exactlywhat we need in our example.4

    Nordstrm

    page 4 Extreme Markup Languages 2003

  • 8/8/2019 Eml 2003 Nordstrom 01

    7/17

    For a programmer customizing an XML editor, it is fairly easy to implement something allowing such a

    link to be created, all at once instead of having to do the same link twice. The problem with such an

    implementation is simply a practical one: the application has no way of knowing which word or words in

    your document you intend to use as your hyperlink text. It can probably help you choose one, and insert

    the required markup, but it doesnt know which to pick.

    Obviously, you dont have to implement XLink to use the basic single-source linking strategy outlinedhere, but it does help since a lot of other properties we need are already in place. Indeed, with the

    exception of the very neatness5 of the extended XLink solution with its multi-ended link, we could evendo this in FrameMaker (+SGML).

    3 A word about existential issues and uniqueness

    The ID/IDREF approach has two things going for it: it guarantees that every ID attribute value used in

    every link target is a) unique, and b) the target exists. Leaving ID/IDREFs behind, we can guarantee

    neither. If we stick to links pointing to targets within the same physical file, we can fulfil a), but since b)

    is beyond us, a)s usefulness is of limited value at best.

    The very concept of single-source publishing will sooner or later lead to multiple files that together makeup the information unit we want, however, so sooner or later, with enough file fragments and links

    pointing all over the place, ID values will clash and well end up with a link that is no better than my

    Toyko-Japan example above. Robust methods of creating unique ID values are required.

    Looking at ID attribute value creation from a practical point-of-view, we can safely say that the job

    shouldnt be left to humans. As an example, my footnotes in this paper have the ID attribute values

    ftnote-1, ftnote-2, ftnote-3, and so on. Whats the probability that these ID attribute values are

    unique, even within this particular conference?

    If you let a bunch of writers handle ID creation all by themselves, this is also about as unique as they

    get. In a big document management system, with ID uniqueness guaranteed by writers, we can forget all

    about linking, fragment reuse, and all that other stuff that was the document management systemsraison dtre to begin with.

    Oh, and what about documentnames? Well, pretty much the same applies. A unique document needs a

    unique name. Once we start to break down XML documents in fragments, each of the fragments needs

    one. And if we keep at it for a while, there are going to be huge numbers of fragments, each of which

    absolutely requires a unique name so we can point to (or from) it, regardless ifits for linking, or simply

    for locating and opening it in an editor.

    So allow me to generalize what Ive said a bit: Every participating resource [in a link] must be uniquelyidentified!

    Does this guarantee that a link target exists? No, of course not. You can have a perfectly unique pointerto a non-existing target. Thus, in addition to unique resource names, we need to ensure that the resourcesexist, and continue doing it for as long as any link pointing at it exists. We dont necessarily need tovalidate the link at once at times, this may not even be possible6 but there must be some way ofensuring that when used, the link is valid.

    Say, this does imply the use of a document management system, doesnt it?

    4 The problem revisited and expanded

    What if we needed toprofile our link, make it conditional? For example, what if we were writing vehicle

    service information, with the document applying to two different models, and we needed a link that

    pointed to one place if discussing the first, and to another if discussing the other model? I know of more

    than one writer that would gladly resort to the lazy solution:

    For more information about the trimming options of the 1.8i engine, see Section 11.4 on Page 120.

    For more information about the trimming options of the 2.0T engine, see Section 11.5 on Page

    128.

    Which is fine, albeit wordy, ifyoure discussing just two car models. but what if the document applied to adozen of them, or twenty, or thirty? Single-source writing should take care of them all, without becoming

    Linking strategies

    Extreme Markup Languages 2003 page 5

  • 8/8/2019 Eml 2003 Nordstrom 01

    8/17

    needlessly wordy. Also, the very mentioning of specific models in text limits what could otherwise be

    constructed as applying to them all.

    4.1 An approach to single-source writing

    A couple of years ago, I and a colleague wrote extensive documentation for two DTDs we had created.

    One of them described a document collection, while the other focused on the individual documents in

    those collections. The DTDs shared a lot of elements and content models, obviously, but they were for

    very different purposes and, above all, for very different users, so they required their owndocumentation.

    We saw the individual document DTDs manuals as a subset of the collection DTDs documentation sowhat we did was to mark up the differences with a role attribute, like this:

    The doc-coll Element...

    The doc Element...

    Im sure you get the idea. A filter removed the document collection-specific sections to produce a

    document DTD-only manual. But we took the idea a bit further than that:

    This document is primarily intended forwriters and editors using the &dtd_name; (DocumentType Definitions), butanyone wishing to acquire a greater knowledge of theDTDs should find it useful.

    See how we used an entity instead of the DTD name? When producing a manual for the documentcollection DTD, the entity value was declared as doc and doc-coll DTDs, but when extracting the

    parts applying to the individual document DTD, it was instead declared as doc DTD. We had a number

    of other entities for this kind of thing, too; we quickly discovered that we needed to generalize themanual contents in some places.

    Keeping in par with this approach, we also used an inline wrapper element (wrap) to rebuild sentences

    to suit the context. In the above example, it was used to go between the singular and plural forms ofwords and phrases, but using this method, we were able to do much more than that.

    So, what does this have to do with profiling links? Its very simple, really. What I wanted to show aboveis an approach for single-source publishing. It isnt for the faint of heart, it requires a lot of disciplinefrom the author, but it is a very simple way of generalizing, and at the same time helping to profile,content.

    4.2 Profiling links

    Going back to the problem of profiling links, lets use our original single-source cross-referencingexample and pretend that instead of just discussing rabbits, we want to include other rodents. In our firstrewrite, the rodent we know we have to include is a rat, but we also know that subsequent editionsmight well want to discuss others. Giving it a first try, we might get something along the lines of7:

    RabbitsRats areoutside the scope of this section. Fordetailed information about these lovable creatures,see

  • 8/8/2019 Eml 2003 Nordstrom 01

    9/17

    An extended XLink solution is more attractive. After all, we are, in fact, discussing a kind of multi-ended

    links. Note that were using an entity to handle the hyperlink contents (that is, the applicable rodents) inquestion:

    &rodents; are outside the scopeof this section. For detailed information about theselovable creatures, see .

    The accompanying linkbase profiles the links:

    Of course, we still need to process the links to set them in context, rats or rabbits,8 but this solutionnevertheless represents a half-decent attempt at profiling. Its still markup-heavy, though, and we areusing extended out-of-line XLinks, requiring lots of processing outside the current document. What if weonly had simple XLink, for one reason or another?

    4.3 Purely XLink

    The XLink spec rather vaguely allows the definition ofroles to link ends by using the xlink:role

    attribute. Originally, the role was designed as a descriptive property, a lot like the xlink:title

    attribute that is strictly intended for the human eye, but later revisions of the spec, as well as the finalrecommendation instead defined it as a URI. Unfortunately, the XLink recommendation specificallyrefrains from defining a processing model for XLinks (as opposed to, for example, XInclude), whichmeans that one has to be defined for any practical application of it.9

    So how can we use xlink:role? It is quite conceivable to give a link target a role and in that way

    profile it. For example, the role attribute could be used to identify a processing facility when publishingthe link, that, depending on the publication format and media, and the currently applicable type ofrodent, could process the link accordingly. The xlink:href attribute could in that case point out a

    general resource applicable to all possible rodents, while the xlink:role attribute would define the

    processing that is required in a particular context. Heres apurely XLinkvariant of our example, usingsimple XLink emulating multi-ended links:

    &rodents;

    are outside the scope of this section.For detailed information about these lovable creatures,see .

    NOTE: Instead of relative URLs, as in the previous examples, Ive chosen to identify the target

    (and its role) by using a URN. In other words, rather than using addresses, Ive used names.

    This, obviously, also requires some processing whenever the document is published, and can be a pain to

    present in the editing environment. It also requires a fairly complex XLink lookup mechanism.

    However, if the writing- and publishing-related rules are well established, and the writers are disciplined

    Linking strategies

    Extreme Markup Languages 2003 page 7

  • 8/8/2019 Eml 2003 Nordstrom 01

    10/17

    enough (and have the appropriate level of support from the authoring environment), its a clean andgeneric solution. It also has the advantage of leaving the specifics of link targeting in the proper contextto the publishing process, allowing, for example, the publisher to increase the number of differentrodents in subsequent editions of the document.

    In this case, if more rodents were required in the document, simultaneously, the publishing processcould, in addition to changing the general entity that handles the hyperlink contents, easily add therequired paper-based references, plus any commas or other separators between them:

    Rabbits and rats are outside the scope of this section. For detailed information about these lovable

    creatures, see Section 7, Rabbits, Page 91 and Section 8, Rats, Page 104.

    Obviously, the kind of single-source authoring depicted here is rather easy to break, for example, byusing the singular form in a sentence (A rabbit is a lovable creature ...), by referencing the rodentsexplicitly instead of using an entity (Rabbits and rants instead of&rodents;), and so on. In the end,

    however, this is what single-source publishing is all about its about being able to customize contentaccording to context without having to rewrite.

    4.4 An entity-free solution

    The use of general entities in Purely XLink is not necessarily a good idea. For one thing, entitiesrequire the presence of a DOCTYPE declaration, but they also limit the choice of tools because

    surprisingly many XSLT/DOM processors out there seem to have deficiences in entity support.

    So lets just leave them out, shall we?

    are outside the scope of this section.For detailed information about these lovable creatures,see .

    Here we assume that the same document management system that keeps track of the required rodentswhen publishing also inserts the required content in the hlink element. In practical terms, the xlink:

    role attribute is what does it since its already used for the purpose in the publishing process. The

    functions fairly easy to implement, too, if you already have that other publishing functionality, but its apain for the author since the hlink element is, in fact, empty. The least the system must be able to do

    is, therefore, to generate text in the XML editor to mark the spot and help the author visualize thecontext. The system can either provide the currently relevant list of rodents, or simply insert a genericstandard text.

    There are a variety of techniques to achieve something workable here. Processing intructions are oneworth mentioning since its fairly easy to implement them aspseudo-elements where the user can entercontent to make life easier during editing.

    NOTE: The processing instructions would, of course, be ignored by the publishing process.

    There is an easier alternative, though. Since were risking more than just one or two types of rodents in afuture version of the document, we should consider a switch mechanism in the DTD that allows us toturn off the hyperlink auto-generate feature and go back to writing the hyperlink text ourselves:

    Most of the rodents mentioned here are outside thescope of this section. For detailed informationabout these lovable creatures, see.

    Note the generate attribute that leaves the hyperlink content to the writer. This will work, for as long

    as the hyperlink content isnt rodent-specific, even though the example itself is almost unforgivablycrude.

    Nordstrm

    page 8 Extreme Markup Languages 2003

  • 8/8/2019 Eml 2003 Nordstrom 01

    11/17

    The lesson here is that this is essentially a style guide issue; you cant rely on clever markup andimplementation alone. By introducing automated hyperlink content generation, we could effectively belimiting, not expanding, the versatility of the system.

    4.5 A real-life example

    Remember the vehicle example I used to introduce this chapter with? That problem is from real life, from awell-known automobile manufacturer. Their solution is to use a document management system to handle

    XML fragments, links, images, and and so on. Everything is profiled according to context, and sincethere are surprisingly many variants of each vehicle model, the system must be able to handle a hugenumber of profiles. An author that modifies service information for a certain vehicle model and variantchecks out the applicable resources (XML fragments, links, images, etc) in that context, but since atleast some of these fragments apply to other profiles as well, the author must take great care to keep thewriting generic enough, so nothing will break.

    When documents (collections of fragments, actually) are checked out, the links describing the relationsbetween resources are handled as inline extended XLinks, like this10:

    NOTE: The inline links only exist when documents have been checked out; the links are handled

    very differently in the database, as link objects.11

    Profiles, contained in the xlink:arcrole and nevis:... attribute values, are handled in groups.

    The URNs that handle profiles are created by the document management system, which is why theprofiles arent human-readable. This example, however, demonstrates a real-life single-source documentthat utilizes the principles outlined in this whitepaper.

    5 An aside on other types of links

    Weve mainly discussed cross-references so far, but obviously, the authoring of good single-sourcedocuments requires other types of links. Fragment inclusions are perhaps the most important, but imagesmay be just as important. Also, online versions of documents might want to embed features that simplycannot be made available in paper documents. For example, online versions of vehicle servicedocumentation may require embedded software instruments such as Volt meters or oscilloscopes tomake a diagnostics procedure truly interactive. Lets have a look at some examples.

    5.1 Fragment inclusions

    The inclusion of fragments may be achieved using a number of techniques. Entities is one; in SGMLsolutions, this was often used for the inclusion of everything from boilerplate texts to whole sections.Heres a typical example of an external general entity, found in a DOCTYPE declaration:

    Linking strategies

    Extreme Markup Languages 2003 page 9

  • 8/8/2019 Eml 2003 Nordstrom 01

    12/17

    The entity would then be included in the document like this:

    It is important for your rabbit to regularly exercise.

    &rabbits-bite;...

    The rabbits-bite.xml file might look like this:

    Rabbits are very cautious animals, easily threatened.

    When lifting up the animal from its cage, it mightattempt to bite you if you move too quickly or liftit in the wrong way.

    The entity approach is a proven concept and works well in many circumstances. An advantage is that a

    validating XML processor must include the text and parse the resulting file.

    If the rest of your DTD (or schema) uses XLink, the downside is that the entity mechanism is another

    linking mechanism, standardized as it may be. It is probably handled using different interfaces than the

    XLink cross-references, which almost invariably causes the application to be less user-friendly. Also, it

    can probably not share any software developed to handle various publishing requirements; it needs itsown.

    An XLink solution might look like this:

    ...

    ...

    The same mechanism is now used for both fragment inclusions and cross-references. A well-designed

    application can use the same code for any required customizations, of course, perhaps with some

    functionality added to handle inclusions. The downside is that theres no built-in embedding orvalidation mechanism in XML for this. Therefore, a normalization process is required when publishing.

    As for other advantages, using XLink offers great benefits whenprofiling fragment inclusions accordingto context (see The problem revisited and expanded); the same profiling mechanisms can be used forevery type of link, provided that the same linking mechanism is used.

    5.2 Images

    Images, like fragment inclusions, can be handled using entities, with a mechanism similar to fragmentinclusions. Again, the downside is that this introduces an additional user interface, as well as additionalcustomizing requirements (instead of simply using the XLink ones) if the cross-referencing system usesXLink.12

    XLink, on the other hand, can handle images, too13:

    xlink:href CDATA #IMPLIEDxlink:show (embed|new|none) #IMPLIEDxlink:actuate (onLoad|onRequest|none) #IMPLIEDxmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink" >

    Several nice features are here offered by XLink. The xlink:show and xlink:actuate attribute

    values, for example, may be changed depending on publishing context. When writing user manuals forsoftware, for example, a screenshot is hardly required for the online help version since youll probablyhave access to the real thing. And if XLink is used for images as well as fragment inclusions and cross-references, images can use the same profiling mechanisms (see The problem revisited and expanded)as the other links, resulting in both cheaper and more robust code.

    Nordstrm

    page 10 Extreme Markup Languages 2003

  • 8/8/2019 Eml 2003 Nordstrom 01

    13/17

    5.3 Profiling any of the above

    Our rodents example is perhaps not the best for illustrating a profiling mechanism applicable to the types

    of links discussed, so lets instead use the vehicle service information example from the beginning ofThe problem revisited and expanded. Lets assume that vehicles need to be profiled according tomodel, model year, and engine variant, and that any of these properties may be used for any type of link. Adeclaration of the necessary attributes would in that case benefit from parameterization, as follows:

    This parameter would then be invoked by any linking element that needs it (leaving out the XLinkattributes for clarity):

    ...

    %profiles.att; >...

    The ref element, in this case, would be used for both cross-references and fragment inclusions. The

    context decides what the element is used for we dont even need special attributes for this. If the ref

    element is found inline, within a wrapper (such as the xref-wrapper element), then its a cross-reference. If found on block level or above, its an inclusion.

    Obviously, this is hardly a revolution in DTD development; parameterization remains a sound, andnecessary, part of any reasonably complex DTD. The point is simply that a consistent DTD design withone linking system instead of many can lead to a simpler DTD, as well as less and better code.

    6 Tools of the trade

    Its one thing to write DTDs that use the kind of linking strategies Ive discussed above, but somethingelse entirely to actually implement them. The vast majority of XML (and certainly SGML) environmentsIve seen lack any kind of support for creating links. As a writer, youre expected to a) enter your ownID attribute values, and b) remember to put these values in corresponding referencing attributes

    elsewhere. Its funny; had we been discussing Word or FrameMaker, no one would ever have acceptedthis, but in the wonderful world of markup, everybody seems to think its natural.

    I dont. Its not that hard to build a linking tool that supports the user, and far from impossible to createan ID attribute value generation mechanism. XLink dialog is a screenshot of a dialog used for creating

    Simple XLink cross-references that weve implemented in a number of environments.

    In its basic form, the XLink dialog and its underlying code isnt that fancy. You pick the element youwant to point to, and the dialog lists the caption or title of that element. The assumption is made that theelement you want to point at is a container element of some kind (a section, a figure, a table, ...) and thecaption or title of that container is either the first or the last child containing #PCDATA, making the

    lookup functionality much easier to implement. Its something you want to keep in mind when writingDTDs, but the assumption holds true in a majority of cases anyway.

    Vital here is that all environments weve built that use this XLink functionality also have an ID

    generation mechanism. The dialog will only display titles and captions of elements that have ID attribute

    values, so while theres nothing to filter out the element types that arent allowed as link targets,14 if we

    only allow ID generation on elements that are allowed, we sort of get what we want anyway. This isnt

    an ideal solution there are a number of reasons to include ID attribute values in other places than just

    allowed link targets but an element type filtering layer does introduce an extra level of complexity aswell as add an unwanted DTD dependency or, at the very least, some additional configurationrequirements.

    Its worth noting that this particular functionality was developed for smaller customers that cannot afford aheavy-duty document management system. They have everything they need stored on their file systems,and they rely very much on well-defined folder and subfolder structures to organize their documents anddocument fragments. A larger customer for example, the one that paid for the extended XLink system I

    Linking strategies

    Extreme Markup Languages 2003 page 11

  • 8/8/2019 Eml 2003 Nordstrom 01

    14/17

    briefly outlined in A real-life example with tens of thousands of reusable fragments and dozens oftechnical writers need more support from the user interface. The basic cross-referencing functionality,however, is almost identical to the one above; the difference is mainly that the extended XLink systemalso includes functionality (in the GUI, buttons that open subdialogs from the main XLink dialog) forprofiling the links in the way Ive described. And, of course, that most targets are handled as URNs andthat theres a document management system to keep track of the them all, allowing for far superior reusefacilities.

    7 Discipline!

    The strategies Ive described in this paper force an often radically different approach to writing thanwould a conventional system. The writer needs to be constantly aware of the fact that her material willbe published in a variety of contexts and on a variety of media. To succeed, it is vitally important to notlimit the implementation to tools and user interfaces alone; a style guide for writers is essential and must

    be provided. A DTD without a style guide is of very limited value. Training is also important, as is aneditorial function that ensures that all produced documents follow the applicable writing guidelines.

    When we implemented the real-life system described in A real-life example, creating single-sourcecross-references as outlined here was one of the most difficult issues to grasp for writers andprogrammers alike. For example, during the project, I discovered that there were several mistakes in bothimplementation and in writing style that had to be corrected. Heres a markup example I found (Ive leftout a lot of markup for clarity):

    For more information, .

    Figure 1: XLink dialog

    Nordstrm

    page 12 Extreme Markup Languages 2003

  • 8/8/2019 Eml 2003 Nordstrom 01

    15/17

    Now whats this? Well, apparently, no one could figure out why the xref wrapper was required in the

    first place, so writers simply contained the linking element (ref) directly in the wrapper. The stylesheet

    designers didnt really get the point either; what one got when printing the above was this:

    For more information, see Chapter 2 on Page 14.

    In other words, not only did the wrapper remain essentially unused (the writers thought it a nuisance tohave to use two elements to create a link), but the link itself generated both the target information

    (Chapter 2 on Page 14) anda verb, see. The verb is a no-no in generated text in almost any system;the idea falls apart as soon as you need to write your linking sentence in any other way than the above.But perhaps more importantly, it also breaks because the text flow of the translated documents cannot beexpected to follow the same construct as seen here. In other words, the verb see cannot always beexpected to be located next to the link or even follow the same grammatical rules. At best, this approachrequires a lot more work to make a foreign-language stylesheets to work; at worst, it just cannot bedone.

    This example may seem trivial but resulted in several difficulties:

    Writers were annoyed at having to use two elements whenever linking.

    Authors unaccustomed to the writing style could only see an incomplete sentence. There was nodirect clue that the cross-reference generated text would add a verb to fill in the blanks.

    Stylesheet designers misinterpreted the DTD and the basic philosophy behind it and, therefore,

    created formatting that severely limited the usability of cross-references when writing.

    The stylesheet behavior drastically complicated the publishing of foreign-language documents,both because those stylesheets required more work (taking into account the often large differencesin grammatical constructs to take into account the use of that imperative noun) and because thetranslators got a document that did not contain all of the required text (the see verb).

    The markup also resulted in fragments that werent single-source since there was no way toremove the paper-based links.

    Luckily, we were able to correct these problems in time, before the erroneous markup was able topropagate through the system.

    This example serves well to illustrate the practical difficulties in implementing a single-source linkingstrategy. The task is not trivial and must not be underestimated. So, what can be done to avoid this kindof pitfalls?

    Make sure that the implementors thoroughly understand the DTD.

    Provide a clear and to-the-point style guide for your DTD. Such a style guide is required readingfor every writer, but also every implementor.

    Plan for training. The management often underestimates this one, especially if their writers alreadyknow what XML is.

    Provide a user support function. This is a full-time job requiring considerable diplomatic skills(not you you are the DTD designer and, thus, the bad guy) and must be extended through notonly the duration of the training period, but also through the first few weeks of production.

    Try to introduce an editorial function (a person that ensures that the guidelines are followed). Thisis a toughie; most technical writers, and certainly the vast majority of managers, fail to seethemselves as being in the publishing business (where an editorial department is seen as anecessity) even though they should.

    And finally, exercise discipline!

    8 Conclusions

    This single-source linking thing is really not that difficult. There are some basic rules that must be

    followed, however, if the results are to be worth the trouble.

    Linking strategies

    Extreme Markup Languages 2003 page 13

  • 8/8/2019 Eml 2003 Nordstrom 01

    16/17

    Always use wrappers to identify semantics. The xref-wrapper in our rodent example is a good

    example of this.

    Consider using separate links for different publishing formats. For example, paper-based linksgenerally work best with generated text, while hyperlinks are hotspots naturally inserted indocument content.

    Use standard mechanisms for linking rather than reinventing the wheel. XLink may not be perfect,

    but it is a W3C recommendation and a de facto-standard for XML. Use one linking mechanism instead of several. This will result in shared and, therefore, cheaper

    and more robust code, easier updates, and a better user environment.

    Devise a single-source authoring strategy. You will need strict authoring guidelines to ensure thatyour content is generic enough to handle different publishing media anddifferent profiles orcontexts. A good DTD without equally good authoring guidelines doesnt get you very far.

    Know when to split documents into fragments to optimize reuse. A strategy for this should resultfrom an initial information analysis.

    See to it that the authoring environment offers enough support for the writers so that their taskbecomes manageable, for example, by introducing a user-friendly lookup mechanism for linkingand a decent search facility in the document management system.

    Know when to stop! Theres a limit to what markup can do without totally alienating the writers.

    Notes

    1. Or downright disaster; take your pick.

    2. Actually, any CDATA-based linking solves this problem.

    3. Its a one-fourth truth, actually, since you can have multiple targets andmultiple sources, inaddition to the having either multiple sources or multiple targets. And, of course, one sourceand one target is another alternative.

    4. A class, in this case, is a link with two sources; both source locators are aliased using thesame label, source. This label is then used to identify the from link end in the arc

    element that describes the link itself.

    5. Neatness Is there such a word?

    6. Getting back again to our basic single-source cross-reference example, it is quite conceivablethat if the online and paper versions of the link point to different targets, they may not both beavailable for validation at the same time, in the same context; well get back to this in the nextsection.

    7. Ive taken the liberty of removing some attributes required by the XLink spec. Hope you dontmind.

    8. Which is why theres a role attribute in the arc elements.

    9. This lack of a defined processing model is also often seen as an advantage since it does notunnecessarily limit any possible implementations. Consequently, the recommendation must,therefore, always be interpreted, and a processing model has to be defined. This means thatwhile any conforming XLink implementation should be able to parse syntactically correct

    XLinks, they may not always behave in the way they were intended to.10. Ive simplified the markup somewhat, since the real thing always contains various processing

    attributes that really arent relevant for our example.

    11. Link objects are the document management systems way of expressing relations (in otherwords, links) between the various file resources in the database underneath. Its an abstractionlayer that enables the document management system to treat links in the same way as it treatsfile resources (XML files, images, etc). Consequently, the file resources are treated asfileobjects by the system. Both are objects to which identical sets of properties can be applied to.

    Nordstrm

    page 14 Extreme Markup Languages 2003

  • 8/8/2019 Eml 2003 Nordstrom 01

    17/17

    12. And even if the cross-referencing system was your old trusty ID/IDREF, (at least) two sets of

    customizations would still be needed.

    13. Ive deliberately left out some of the XLink features, for example, xlink:show and

    xlink:actuate.

    14. You can pick any element allowed by the DTD; thats how this particular dialog works. At thetime, it was the easiest way of implementing the functionality.

    Acknowledgements

    My deepest thanks must go to my dear friend and colleague, Henrik Mrtensson, who first suggested the

    use of an xref wrapper to identify the whole semantic link, instead of just the reference. Hes also a

    Desperate Perl Hacker, and the one I always turn to first to discuss my ideas.

    Thanks also to my friends and colleagues at Srman Information & Media AB.

    Bibliography

    [XLink Recommendation] W3C. XLink Recommendation. http://www.w3.org/TR/xlink/.

    The Author

    Ari NordstrmSrman Information & Media AB

    Gteborg40274Sweden

    Ari Nordstrm is an SGML/XML consultant at Srman Information & Media AB. He is one of thedesigners of Ericssons and Volvos standard DTDs, as well as the designer of a number of DTDsand authoring systems for other companies. His most recent assignment is to design the DTD andsome of the filters required for the exchange of information between Ford companies, such asVolvo, Mazda, and Ford. Ari Nordstrm spends some of his spare time projecting films at theDraken Cinema in Gteborg, Sweden, which should explain why he wants to automate cinemasusing XML.

    Extreme Markup Languages 2003Montral, Qubec, August 4-8, 2003

    This paper was formatted from XML source via XSL

    by Mulberry Technologies, Inc.

    Linking strategies

    http://www.w3.org/TR/xlink/http://www.w3.org/TR/xlink/