33
Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO [email protected]

Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO [email protected]

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Understand Localization Standards and Use Them Effectively

John Watkins, President, ENLASO [email protected]

Page 2: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Agenda

• The Standards Universe • Core Standards • Using Standards

Page 3: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

The Universe: Standards Evolution

Page 4: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Standards Definitions

• De facto Standards – influence through prevalence. – Standards may evolve from de facto standards through the cooperation of the

industry and a relevant standards body. • Standards

– Remove barriers for the purpose of performing functions that are within an industry

– Are approved and maintained by neutral third parties – Have input from industry to avoid being locked into a proprietary solution

• Open Standards – Do the above, but are publicly available

(with open access rights) – Natural coordination with open source software – Luckily the core localization standards are

Open Standards

Page 5: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Standards Benefits

• Standards help tools to work together – Eases exchange of information among tools – Freedom to work with a wide variety of tools – Processes are developed independent of the tools

• Customer files • The right linguists

• Consequently – Tools are not constrained – Workflow is easier – Projects can be faster, better, and cheaper

Page 6: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Standards Information

• Standards management span various organizations – OSCAR/LISA -> Disbanded

• Standards developed by OSCAR under LISA now under the Creative Commons Attribution license – See GALA below

• European Telecommunications Standards Institute (ETSI) Localization Industry Standards (LIS) Industry Specification Group as the successor for the LISA/OSCAR portfolio (TMX, TBX, SRX…): http://goo.gl/y4JgF

– GALA Open Standards Initiative • OSCAR standards: http://www.gala-global.org/standards/ • Coordination: ETSI , POASIS, Unicode Consortium, ISO TC 37 • Linport project (open format translation packages): http://www.linport.org/ • QT Launchpad – flexible quality metrics for human and machine translation • Tools Corner

– OASIS (XLIFF, DITA…): http://www.oasis-open.org/standards – W3C (ITS, MultilingualWeb-LT – ITS 2.0): http://www.w3.org/

Page 7: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Core Standards

• Four Standards (three areas) that are open, stable, and work well: – Translation memories

• TMX: Translation Memory eXchange1 Easily exchange of translation memory among tools

– Segmentation • SRX: Segmentation Rules eXchange1

Provide a standard method to describe segmentation rules for TMs that are being exchanged among tools

– Extracted data • ITS: Internationalization Tag Set2

Used for XML to support the internationalization and localization of XML schemas and documents (XML, HTML5)

• XLIFF: XML Localisation Interchange File Format3 To store localizable content and carry it from one step of the localization process to the other, while allowing interoperability among tools

1 See GALA Open Standards: http://www.gala-global.org/lisa-oscar-standards 2 See W3C: http://www.w3.org/TR/its/ 3 See OASIS: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff

Page 8: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Using Standards

• Look at an example project • See the standards involved • Use standards to provide localized files

Page 9: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Using Standards – Open Source

• Open Standards fit with Open Source • We work with the Okapi Framework Project1 • You can use the Okapi Framework to:

1. Manipulate and combine translation memories 2. Extract text with appropriate filters 2. Edit segmentation rules and apply them to content 2. Leverage from TM 2. Machine translate unmatched text 2. Create the translation package for the linguists 3. Rebuild translated files

1 See Okapi Framework project site at: http://code.google.com/p/okapi

Page 10: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

WordfastTM

TMX 1

TMX 2

Trados TM

SRX Rules

HTML File

MIF File

Translation Memory from Trados

Translation Memory from Wordfast

Segmentation rules for the TMs

New version of the documents to translate (from HTML5 and FrameMaker applications)

Example Project

Page 11: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

FrameMaker MIF File

Page 12: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

HTML5 File

Page 13: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

Page 14: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

1) Translation Memories – TMX

• TMX (Translation Memory eXchange) is the standard way to store source text segments and their corresponding translations

• Supported by most CAT tools • Customer provided two TMs:

– Trados – Wordfast

Page 15: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

WordfastTM

TMX 1

TMX 2

Pensieve TM

Rainbow Toolbox

Trados TM

Four different tools sharing data through TMX

1) Combine TMs

Page 16: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

Page 17: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

2) HTML5 Extraction – ITS

• For XML and HTML5 documents, ITS (Internationalization Tag Set) describes what needs to be extracted and how to extract it

• W3C MultilingualWeb-LT WG is finishing the work on ITS 2.0

• Lets use ITS rules to identify localizable text in the HTML5 document

Page 18: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

2) ITS Rules

Page 19: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

HTML File

MIF File

ITS Rules

MIF Filter

HTM

L5 Filter

Content Extraction

ITS rules specify what needs to be translated

2) HTML5 Extraction – ITS

Pipeline Driven by Rainbow

Page 20: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

2) Segmentation – SRX

• Translation is done at the segment level – SRX (Segmentation Rules eXchange) describes

where to break or not break the content into segments

– Having the rules for source segments allows better re-usability of existing TM, increasing exact matches

– Maintain SRX rules with an SRX Editor

Page 21: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

2) Segmentation – SRX

Don’t break segment after VS. V.S. vs. or v.s.

Page 22: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

HTML File

MIF File

ITS Rules

Segmentation

MIF Filter

HTM

L5 Filter

SRX Rules Extraction

2) Segmentation – SRX

Pipeline Driven by Rainbow

SRX Rules are key to sharing TMs

Page 23: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

2) Translation Kit – XLIFF, TMX

• To flow through the translation process, the extracted content needs to be stored in a common format many tools understand – XLIFF (XML Localisation Interchange File

Format) is a standard way to represent extracted content

– TMX files with all the translation candidates found in the TM or from MT

Page 24: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

2) Translation Kit – XLIFF, TMX

Open Source OmegaT TM workbench

Page 25: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Pipeline (Driven by Rainbow)

Translation Kit

Pensieve TM

HTML File

MIF File

ITS Rules

Segmentation

MIF Filter

Pre-translate unmatched

from MT

Pre-translate from TM

Translation Kit Creation

HTM

L5 Filter

SRX Rules

Microsoft MT

HTMLXLIFF

MIF XLIFF TMX Etc.

Extraction Pensieve TM Connector

Microsoft MT

Connector

2) Translation Kit – XLIFF, TMX Tool independent kit

Page 26: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

Page 27: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Pipeline (Driven by Rainbow)

Translation Kit

MIF File

HTMLFile

Translator Kit

Filter

Translation Kit Post-

Processing

MIF Filter

HTML XLIFF

MIF XLIFF TMX Etc.

Extraction

HTM

L5 Filter

3) Post-Processing

Page 28: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

3) Translated FrameMaker MIF

Page 29: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

3) Translated HTML

Page 30: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Three Tasks

1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery

Page 31: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Summary

• We Know – More about our standards – We can (and do) use them today

• Next Steps – Consider requiring Open Standards compliance with

the tools you use to ensure portability • Get Involved in the Standards Community

– GALA Standards Initiative – GALA Connect Groups

Page 32: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

References

• GALA Standards Initiative http://www.gala-global.org/gala-standards-initiative

• TMX 1.4b – Translation Memory eXchange http://www.gala-global.org/oscarStandards/tmx/

• ITS 1.0 – Internationalization Tag Set http://www.w3.org/TR/its/

• SRX 2.0 – Segmentation Rules eXchange http://www.gala-global.org/oscarStandards/srx/

• XLIFF 1.2 – XML Localisation Interchange File Format http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html

• Okapi Framework (Open Source & cross-platform) http://code.google.com/p/okapi/

Page 33: Understand Localization Standards and Use Them Effectively · Understand Localization Standards and Use Them Effectively John Watkins, President, ENLASO jwatkins@enlaso.com

Questions?

John Watkins, President, ENLASO [email protected]