Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Understand Localization Standards and Use Them Effectively
John Watkins, President, ENLASO [email protected]
Agenda
• The Standards Universe • Core Standards • Using Standards
The Universe: Standards Evolution
Standards Definitions
• De facto Standards – influence through prevalence. – Standards may evolve from de facto standards through the cooperation of the
industry and a relevant standards body. • Standards
– Remove barriers for the purpose of performing functions that are within an industry
– Are approved and maintained by neutral third parties – Have input from industry to avoid being locked into a proprietary solution
• Open Standards – Do the above, but are publicly available
(with open access rights) – Natural coordination with open source software – Luckily the core localization standards are
Open Standards
Standards Benefits
• Standards help tools to work together – Eases exchange of information among tools – Freedom to work with a wide variety of tools – Processes are developed independent of the tools
• Customer files • The right linguists
• Consequently – Tools are not constrained – Workflow is easier – Projects can be faster, better, and cheaper
Standards Information
• Standards management span various organizations – OSCAR/LISA -> Disbanded
• Standards developed by OSCAR under LISA now under the Creative Commons Attribution license – See GALA below
• European Telecommunications Standards Institute (ETSI) Localization Industry Standards (LIS) Industry Specification Group as the successor for the LISA/OSCAR portfolio (TMX, TBX, SRX…): http://goo.gl/y4JgF
– GALA Open Standards Initiative • OSCAR standards: http://www.gala-global.org/standards/ • Coordination: ETSI , POASIS, Unicode Consortium, ISO TC 37 • Linport project (open format translation packages): http://www.linport.org/ • QT Launchpad – flexible quality metrics for human and machine translation • Tools Corner
– OASIS (XLIFF, DITA…): http://www.oasis-open.org/standards – W3C (ITS, MultilingualWeb-LT – ITS 2.0): http://www.w3.org/
Core Standards
• Four Standards (three areas) that are open, stable, and work well: – Translation memories
• TMX: Translation Memory eXchange1 Easily exchange of translation memory among tools
– Segmentation • SRX: Segmentation Rules eXchange1
Provide a standard method to describe segmentation rules for TMs that are being exchanged among tools
– Extracted data • ITS: Internationalization Tag Set2
Used for XML to support the internationalization and localization of XML schemas and documents (XML, HTML5)
• XLIFF: XML Localisation Interchange File Format3 To store localizable content and carry it from one step of the localization process to the other, while allowing interoperability among tools
1 See GALA Open Standards: http://www.gala-global.org/lisa-oscar-standards 2 See W3C: http://www.w3.org/TR/its/ 3 See OASIS: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff
Using Standards
• Look at an example project • See the standards involved • Use standards to provide localized files
Using Standards – Open Source
• Open Standards fit with Open Source • We work with the Okapi Framework Project1 • You can use the Okapi Framework to:
1. Manipulate and combine translation memories 2. Extract text with appropriate filters 2. Edit segmentation rules and apply them to content 2. Leverage from TM 2. Machine translate unmatched text 2. Create the translation package for the linguists 3. Rebuild translated files
1 See Okapi Framework project site at: http://code.google.com/p/okapi
WordfastTM
TMX 1
TMX 2
Trados TM
SRX Rules
HTML File
MIF File
Translation Memory from Trados
Translation Memory from Wordfast
Segmentation rules for the TMs
New version of the documents to translate (from HTML5 and FrameMaker applications)
Example Project
FrameMaker MIF File
HTML5 File
Three Tasks
1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery
1) Translation Memories – TMX
• TMX (Translation Memory eXchange) is the standard way to store source text segments and their corresponding translations
• Supported by most CAT tools • Customer provided two TMs:
– Trados – Wordfast
WordfastTM
TMX 1
TMX 2
Pensieve TM
Rainbow Toolbox
Trados TM
Four different tools sharing data through TMX
1) Combine TMs
Three Tasks
1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery
2) HTML5 Extraction – ITS
• For XML and HTML5 documents, ITS (Internationalization Tag Set) describes what needs to be extracted and how to extract it
• W3C MultilingualWeb-LT WG is finishing the work on ITS 2.0
• Lets use ITS rules to identify localizable text in the HTML5 document
2) ITS Rules
HTML File
MIF File
ITS Rules
MIF Filter
HTM
L5 Filter
Content Extraction
ITS rules specify what needs to be translated
2) HTML5 Extraction – ITS
Pipeline Driven by Rainbow
2) Segmentation – SRX
• Translation is done at the segment level – SRX (Segmentation Rules eXchange) describes
where to break or not break the content into segments
– Having the rules for source segments allows better re-usability of existing TM, increasing exact matches
– Maintain SRX rules with an SRX Editor
2) Segmentation – SRX
Don’t break segment after VS. V.S. vs. or v.s.
HTML File
MIF File
ITS Rules
Segmentation
MIF Filter
HTM
L5 Filter
SRX Rules Extraction
2) Segmentation – SRX
Pipeline Driven by Rainbow
SRX Rules are key to sharing TMs
2) Translation Kit – XLIFF, TMX
• To flow through the translation process, the extracted content needs to be stored in a common format many tools understand – XLIFF (XML Localisation Interchange File
Format) is a standard way to represent extracted content
– TMX files with all the translation candidates found in the TM or from MT
2) Translation Kit – XLIFF, TMX
Open Source OmegaT TM workbench
Pipeline (Driven by Rainbow)
Translation Kit
Pensieve TM
HTML File
MIF File
ITS Rules
Segmentation
MIF Filter
Pre-translate unmatched
from MT
Pre-translate from TM
Translation Kit Creation
HTM
L5 Filter
SRX Rules
Microsoft MT
HTMLXLIFF
MIF XLIFF TMX Etc.
Extraction Pensieve TM Connector
Microsoft MT
Connector
2) Translation Kit – XLIFF, TMX Tool independent kit
Three Tasks
1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery
Pipeline (Driven by Rainbow)
Translation Kit
MIF File
HTMLFile
Translator Kit
Filter
Translation Kit Post-
Processing
MIF Filter
HTML XLIFF
MIF XLIFF TMX Etc.
Extraction
HTM
L5 Filter
3) Post-Processing
3) Translated FrameMaker MIF
3) Translated HTML
Three Tasks
1. Consolidate TMs 2. Prepare the translation package for linguists 3. Post-process the files for delivery
Summary
• We Know – More about our standards – We can (and do) use them today
• Next Steps – Consider requiring Open Standards compliance with
the tools you use to ensure portability • Get Involved in the Standards Community
– GALA Standards Initiative – GALA Connect Groups
References
• GALA Standards Initiative http://www.gala-global.org/gala-standards-initiative
• TMX 1.4b – Translation Memory eXchange http://www.gala-global.org/oscarStandards/tmx/
• ITS 1.0 – Internationalization Tag Set http://www.w3.org/TR/its/
• SRX 2.0 – Segmentation Rules eXchange http://www.gala-global.org/oscarStandards/srx/
• XLIFF 1.2 – XML Localisation Interchange File Format http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html
• Okapi Framework (Open Source & cross-platform) http://code.google.com/p/okapi/
Questions?
John Watkins, President, ENLASO [email protected]