DITA, XLIFF and Translation: Truths, Myths and Misconceptions

Preview:

Citation preview

1– 1

DITA, XLIFF and Translation:Truths, Myths and Misconceptions

www.oasis-open.org

Andrzej Zydroń MBCS CITPCTOXML-INTLLeaders in Translation TechnologySeptember 2008azydron@xml-intl.com

1– 2

DITA is cheaper to translate: Truths

Separation of content and format

Component based architecture

You only translate changed topics

1– 3

Translation is the main cost of localization: Myths

1– 4

DITA Granularity: a double edged swordDITA without a CMS – you must be NUTS!You DO NOT NEED a native XML CMSLinks, links, links, links…..Translate topics as soon as they are availableIncreased project management costsNecessitate web services based exchangeNeed to establish long term relationship with Localization Service Providers

1– 5

DITA Translation Pitfalls

DITA comes ready packed with some very dangerous optionsTranslatable acronymsBeware the CONREF for it may TRIPLE your translation costsSpecialize if you dareDITA Translation TC Best Practices

1– 6

DITA + XLIFFOnly part of the picture

XML1.0

Unicode 5.0

XML Vocabulary, e.g. DITA

xml:tm

Author Memory Translation Memory

SRX

GM

X

W3C ITS

Unicode TR29

XLIFF

TMX

1– 7

OAXAL:OASIS Reference Architecture TC

xml:tm

Unicode TR 29

SRX

W3C ITS

GMX-V

DITA/XML

TMXXLIFF

1– 8

OAXAL TC

http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=oaxal

1– 9

Collaborativereview andapproval

Pre-translate content and alert translators

How does it work?

Check content into DocZone server

XMLXMLXML

XMLXMLXMLXMLXMLXML

Vendor localizes XML content with DocZone translation tool

PDF

Write/edit XML content

Publish to all output formats for all markets

HTML

Create graphicsStore in DocZone

Link with XML content

DocZone.com example

1– 10

OAXAL: Why is DITA + XLIFF not enough

Process Automation50% Translation costs process management

MatchingStrange commercial model for translation companiesAutomation, automation, automation

1– 11

DITA + OAXAL putting it together:

DITA/XML+

xml:tm

Unicode TR 29

SRX

W3C ITS

DITA/XML

1– 12

xml:tm namespace

Example of the use of tm namespace in an XML document:

<document xmlns:tm="urn:xml-intl-tm" te="9"><tm:tm><section>

<para><tm:te id="e1">

<tm:tu id="u1.1">Namespace is very flexible.

</tm:tu><tm:tu id="u1.2">

It is very easy to use.</tm:tu>

</tm:te></para>

1– 13

xml:tm namespace

docdoc

titletitle

sectionsection sectionsection

parapara

tmtm

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

Source document tm namespace

viewtete texttexttututexttext

tete sentencesentence sentencesentencetutu tutu

parapara texttext

parapara texttext

parapara texttext

parapara texttext

parapara texttext

tete sentencesentence sentencesentencetutu tutu

tete sentencesentence sentencesentencetutu tutu

texttext

Source document view

1– 14

Author memoryMaintain memory of source textAuthoring statisticsAuthoring tool input

Translation memoryAutomatic alignmentMaintain exact link of source and target textReduce translation costs

xml:tm namespace

1– 15

xml:tm differencing

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Original Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new

Updated Source Document

DOMDifferencing

1– 16

xml:tm author memory

Namespace aware DOM differencingIdentify changes from the previous versionUnique text unit identifiers are maintainedModification historyText units can be loaded into a databaseAuthoring environment integration

1– 17

xml:tm author memory

Namespace aware DOM differencingIdentify changes from the previous versionUnique text unit identifiers are maintainedModification historyText units can be loaded into a databaseAuthoring environment integration

1– 18

1– 19

XLIFF + xml:tm :

DITA/XML+

xml:tm

GMX/V

XLIFF

1– 20

DITA/OAXAL to XLIFF

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Original Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”5”

tu id=”6”

Translated Target Document

Trans-unit id=”1”

XLIFF File

Trans-unit id=”2”

Trans-unit id=”3”

Trans-unit id=”4”

Trans-unit id=”5”

Trans-unit id=”6”

1– 21

xml:tm exact matching

Updated Source Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

deleted

tu id=”8”

modified

new

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

Exact Matching

requires translation

requires translation

Exact match

Exact match

Exact match

Exact match

1– 22

xml:tm matchingUpdated Source

Document

tu id=”1”

tu id=”2”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

non trans

tu id=”8”new:same

Matched Target Document

tu id=”1”

tu id=”3”

tu id=”4”

tu id=”7”

tu id=”6”

tu id=”8”

requires translation

requires proofing

fuzzy match origid="5"

doc leveraged match

tu id=”9” tu id=”9”

DB

requires proofing

DB leveraged match

tu id=”2”requires no translation

non translatable

Exact match

Exact match

Exact match

Exact match

modified

1– 23

xml:tm translated document

docdoc

titletitle

sectionsection sectionsection

parapara

tmtm

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

Translated docuemnt tm

namespace viewtete tekstteksttututeksttekst

tete zdaniezdanie zdaniezdanietutu tutu

parapara teksttekst

parapara teksttekst

parapara teksttekst

parapara teksttekst

parapara teksttekst

tete zdaniezdanie zdaniezdanietutu tutu

tete zdaniezdanie zdaniezdanietutu tutu

teksttekst

translated document view

1– 24

Translation without OAXAL:

source text

source text extract extracted text tm process

prepared text

translatetranslated text

target texttarget text

mergetarget text

QA

1– 25

OAXAL in action

xml:tm source text

extract extracted text

tm process

XLIFFfile

translate

xml:tm target text

merge

Internet

exact matching

leveraged matching

Automated Workflow

web browserweb browserQA

Automated Workflow

1– 26

1– 27

Normal DITA document

1– 28

DITA Document with xml:tm namespace

1– 29

xml:tm version encoded

DITA Document with xml:tm namespace embeded as a Base64 encoded Processing Instruction

1– 30

XLIFF File version after matching

1– 31

Contact Details

Postal address:PO Box 2167Gerrards CrossBucks SL9 8XFUnited Kingdom

Phone: +44 1753 480 467 Fax: +44 1753 480 465 Andrzej Zydroń – azydron@xml-intl.com

Recommended