Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
The Case for Connectivity
David Filip, ADAPT Centre, Trinity College Dublin
Klaus Fleischmann, Kaleidoscope, GALA Board Member
Serge Gladkoff, GALA Ambassador, Logrus Global
TAPICC Standards Initiative
Introduction Serge Gladkoff
GALA Ambassador
Logrus Global
Klaus Fleischmann
GALA Board of Directors
Kaleidoscope GmbH
David Filip Trinity College Dublin – ADAPT
Dr. David Filip is the Convener/Chair of the OASIS XLIFF OMOS TC. He also serves as the Liaison Officer, Secretary and Editor of OASIS XLIFF TC, XLIFF TC Liaison at Unicode Localization Interoperability (ULI) TC, Advisory Editorial Board Member for the MultiLingual Magazine, Programme Committee Member for the ASLING Translating and the Computer Conferences, Co-moderator Standards & Interoperability IG at JIAMCATT, NSAI expert at ISO TC 37 SC3 and SC5, ISO/IEC JTC1 WG9, WG10, SC38, TBX Steering Committee member, TAPICC Steering Committee member.
Agenda
General Intro, Project
Status
(Short) Q&A
Track 1
Workshop
Tracks 2-4 Overview
Q&A
TAPICC Goal
Enable Innovation and Growth Through
• Common base standards
• Interoperability
• Automation
• Collaboration
• Seamless and valid data exchange
Interoperability
Disclaimer I know there are less than 380
lines on the screen, but you get the idea.
But…
But…
€
Things change… …and become very costly to maintain
• For clients
• For LSPs
• For tools vendors
myTool myWorkflow
The Vision: Standard APIs
1000+ IT firms
1500+ CMS systems
Breakdown of Steps
Find, categorize and prioritize use cases
Find out what solutions already exist
Make this information retrievable
Harmonize business (meta)data models
Create implementable classes
Useful deliverables to GALA and the industry
Properties
Community-Driven
Open-Source Legal Base
Interactive and collaborative
Administered by GALA
Pre-Standardization
Level
Grounded in XLIFF and UBL
Setup
Deliverables
• Categorized resource catalog
• Data model mapping
• API Classes
Mode
• Open Source
• Steering Committee
• Subcommittees
• Community participation
Scope
Track 1
• Supply Chain Automation
• Business metadata
model
• Payload standardization
Track 2
• Transfer of localizable content on
segment / unit level
• Between localization or
other tools
Track 3
• Markup / Enrichment of
localizable content
• TM Matches, MT Output,
Terminology, „Good enough“ layout, QA data
etc.
Track 4
• Enable a high-fidelity
rendering of layout
information
• To allow in-layout
translation
Benefits
Learning materials
Common data model to reuse
Consultations about XLIFF and CLDR, API and software
Sample code
Implementable classes
Savings on R&D
Automation and Interoperability
For GALA members, industry and project participants
Questions so far?
TRACK ONE WORKSHOP
Look at Status of Tracks 1-3
Discuss required business metadata for a „New Basics“
common model
Define concrete subtasks
Find a "buy-side" steering
committee member
What happened so far…
September
Project Statement
Initial Call
Steering Committee
Infrastructure (Connect,
github)
Kickoffs in Montreal and
Stuttgart
Definition of „tracks“
OpenSource Policy and
Project Charter
Compiled legacy data:
COTI, XLIFF, TIPP,
Linport, STS
https://www.gala-global.org/publications/translation-api-class-and-cases-project-statement-tapicc
https://www.gala-global.org/tapicc-legal-agreement
Special thanks to James
Bryce Clark, OASIS
The Plan for Track 1
Amsterdam: Brainstorm Medata
Then: harmonize and summarize
Create a draft data model / parameter set with
canonical names and datatypes
Create prominent
serialization (JSON, XML…)
Finalize data model as first
deliverable (eg in
MuleESB)
Defining “the least common denominator ” of universal data model is key.
The overall landscape What exists already and can be leveraged? We don´t want to reinvent wheels.
XLIFF &OMOS COTI, CMIS TIPP, LINPORT,STS
CLDR Many existing
APIs Enthusiastic participation
XLIFF Bitext Payload Standard - https://www.oasis-open.org/committees/xliff
CLDR http://cldr.unicode.org/
• The Unicode CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available
We will not reinvent the wheel
CMIS Content Management Interoperability Services
DQF / MQM issue typology
TIPP Based on XLIFF for Payload, plus Metadata and Workflow https://code.google.com/archive/p/interoperability-now/
• Envelope concept
• Manifest XML contains all metadata – manifest.xml
• Package Object Container (Payload XLIFF) – resources.zip
• Requests and Responses
• Specifies tasks to be completed and responses to expect
• Strict XLIFF or generic
• Business Metadata (STS)
• Source content language, audience, complexity…
• Target content language, register, layout…
• Production tasks prepare, translate…
• Environment technology, references…
• Relationships permissions, submissions, expectations
Only one task per language?
Only one target language?
Implementations?
No API Package definition only
LINPORT Based on XLIFF for Payload, plus Metadata and Workflow - http://www.linport.org/
• Bilingual Task-level package
• STS as Business Metadata
• General Title, creator, date, identifier, contributor, rights
• Source Language, ID, type, audience, purpose, subject, term, volume, complexity, status
• Target Language(s), audience, purpose, content correspondence, term, format, guide, register...
• Workflow Layout, preparation, initial, quality check, technology, reference, workplace, copyright…
• Business Qualification, delivery, deadline, compensation, communication
Not a very flexible data model. Same info for all
files / languages. Implementations?
No API Package definition only
COTI German CMS manufacturers: DERCOM - http://www.dercom.de/projekte
• Most complete and adopted project
• 3 Levels
• Level 1: Simply wrapper file for exchange (control file and payload)
• Level 2: Automated transfer of level 1
• Level 3: Concrete API workflow (SOAP-based)
• API describes
• Workflow: Create, start, finish, but also reject, cancel, update,
• Status updates: get/change metadata, report, download document
• Secure data transfer
• Issues
• Generic payload, not XLIFF or CMIS based
• Little „localization“ metadata, but expandable
Extensive API SOAP?
No standard payload, little L10N metadata
Implemented and in use
STS - Structured Translation Specification http://www.ttt.org/specs, 21 parameters in a tree, human readable.
Source content
• Textual characteristics
• Source language
• Text type
• Audience
• Purpose
• Specialized language
• Subject field
• Terminology
• Volume
• Complexity
• Origin
Target content
• Target language information
• Target language
• Target terminology
• Audience
• Purpose
• Content correspondence
• Register
• File format
• Style
• Style Guide
• Style relevance
• Layout
Production Tasks
• Typical tasks
• Preparation
• Translation
• QA
• Additional tasks
Environment
• Technology
• Reference material
• Workplace
Relationships
• Permissions
• Copyright
• Recognition
• Restrictions
• Submissions
• Qualifications
• Deliverables
• Delivery (by)
• Deadline
• Expectations
• Compensation
• Communication
Plus the crowd already out there
1000+ IT firms
1500+ CMS systems
Track 1 – Supply Chain Automation Two levels make sense. But which info goes on which level?
Package level metadata
"Payload" metadata
Package-Level Data "Least common denominator" and extensibility
Existing
• Creator, Date&time, Title, Contact info, Tool, Organization
• Source and target languages, deadline -> Or should this be on the payload level?
• Task information: Type, additional tasks
• Description, Comment
• Source and target terminology &TM data
• Copyright, digital signature,
• Response creator
What else?
• MT admissibility and selection? Engine type?
• Monolingual or multilingual source/target?
• Project participants? (Translators, PMs...)
• Source system attributes ( ID?)
• Workflow-relevant metadata? Quote vs. Project?Different project types?
• Custom metadata?
Payload-Level Data "Least common denominator" and extensibility
Existing
• Creator, Date&time, Title, Tool etc? Or only on package level?
• Source text information: Subject, reference,target audience, register etc.
• Target text information
What else?
• Commercial information: Volume, payment info, analysis data?
• Source system attributes ( ID?)
• Reference external data (TMs, termbases, MT engine data)
• Quality requirements, QE metadata, DQF metadata ?
• Custom metadata?
• Automation information (Error handling, subtasks...)
Your turn now!
Tracks 2-4: What’s going on..
• XLIFF 2.1
• XLIFF OMOS – OM & JLIFF, TBX <->XLIFF mapping
• TBX revision at ISO
• ULI starts a new work item on wordcount and similarity algorithm
• FREME an NLP service framework
• UBL a major business document exchange standard
29
TAPICC tracks
• 1) business metadata (see above)
• 2) real time unit level exchange between CAT tools (all sorts)
[XLIFF OMOS OM & JLIFF]
• 3) real time enrichment of bitext units with metadata
• Including matches [XLIFF OMOS - TMX successor],
• terminology [XLIFF OMOS – TBX mapping, FREME],
• entity disambiguation [FREME],
• error and QA reporting [XLIFF 2.1, FREME], you name it..
• 4) Real time previews of translated content in native
• Currently on the back burner
• Possibly XLIFF 2.2, some capability in XLIFF 2.0 already
30
FREME: A content enrichment framework
• https://freme-project.github.io/community/
• GitHub repos
• https://github.com/freme-project
• Apache license
31
FREME: Design of the framework
• Client makes a Web service request.
• The broker evokes the actual e-Service.
• The e-Services are part of the server (e.g. e-Entity), or provided externally (e.g. e-Translation).
• Supportive modules provide conversion of digital content formats or pipelining of services (e.g. e-Terminology followed by e-Translation)
• FREME = a framework, not a platform: modular approach & ease of extensibility
FREME: All You need is standards
• HTTP to make web service requests
• No dependency on a given programming language
• Standards to represent enrichment information
• See next slide
• Write a wrapper for your existing tools to enable them to produce & consume the enrichment information
• Enable distributed data and language technology services
XLIFF OMOS TC – Quick Facts • Convened 8th Dec 2015, made great progress since
• Members: BYU, ENLASO, Hoyos Labs, Genivia, Intel, LRC, Microsoft, SDL, Spartan Software, TCD – ADAPT, UNIGE, Vistatec, WIPRO, +2 Individual
• Charter https://www.oasis-open.org/committees/xliff-omos/charter.php
• Purpose – Even more interoperability, NOT ONLY through XML. Take the data model to new environments. Facilitate roundtrip among XML and JSON pipelines and more..
• Scope/Deliverables
• Abstract Object Model for XLIFF 2 (XLIFF 2.x) https://github.com/oasis-tcs/xliff-omos-om/
• JSON Serialization of that model https://github.com/oasis-tcs/xliff-omos-jliff
• TBX mapping, TMX next
• Etc.
• IPR Mode – Non-Assertion
34
XLIFF OMOS TC – Membership
35
Bryan Schnabel, Felix Sasaki
XLIFF OMOS TC – Charter – Scope/Deliverables
https://www.oasis-open.org/committees/xliff-omos/charter.php
• Purpose – Even more interoperability, NOT ONLY through XML. Take the data model to new environments. Facilitate roundtrip among XML and JSON pipelines and more..
• Scope/Deliverables
• Abstract Object Model for XLIFF 2 (XLIFF 2.x) – serialization independent
• JSON Serialization of that model -> JLIFF 1
• TMX next – major new version with inline data model consistent with XLIFF 2
• Mappings, TBX Basic mapping of XLIFF 2 gls:
• APIs
• Reference Architectures, SOA, ESB ..
36
XLIFF OMOS TC – Charter – IPR Mode
IPR Mode
Non-Assertion
A progressive IPR mode, a kind of RF but even more easy on both IP owners and implementers. Great for Open Source adoption!
No need to negotiate RF licensing conditions for essential use of IP in standards implementations
37
XLIFF OMOS TC – Charter – Audience
Audience
• Multilingual content and software architects and strategists, multilingual content publishers
• GILT services architects and developers
• Content owners and managers that seek to publish their content in multiple localized versions
• Software providers for internationalization, localization, and translation tools and processes, including language technology components
• Technical communicators employing localization tools and processes for multilingual publishing of their content
• Localization service providers who need to interact seamlessly with localizable and localized content of their customers
38
XLIFF OMOS TC – Audience - FEISGILTT
Save the dates October 31-November 1 2017
6th FEISGILTT 8th XLIFF Symposium @ LocWorld35 Silicon Valley
FEISGILTT 2016|2015|2014|2013|2012
39
XLIFF OMOS TC – Audience – FEISGILTT http://locworld.com/feisgiltt2016-cfp/
7th XLIFF Symposium
• Hot topics
• XLIFF Object Model
• XLIFF in JSON
Federated Interoperability Track
1st TMX Symposium
• Is this the time for TMX 2.0? If you are a stakeholder in the TMX community heavily relying on TMX 1.4b, we want to hear your feature wishlist. TMX related submissions can be proposed on the FEISGILTT EasyChair https://easychair.org/conferences/?conf=feisgiltt2016
40
XLIFF OMOS TC – Audience - Join
Public TC page
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff-omos
• Join
• Send a comment
https://www.oasis-open.org/committees/comments/index.php?wg_abbrev=xliff-omos
• Publicly archived mailing lists
https://lists.oasis-open.org/archives/xliff-omos/
41
Questions?
THANK YOU Want to Contribute? Discussion
Join the Connect Group: www.gala-
global.org/tapicc -> TAPICC group
Submit information on projects we have missed
so far via the Connect Group
Submit information on projects we have missed
so far via the Connect Group
Participate in the public review of our
output
Participate in collaborative R&D and
content creation, contribute code