Welcome to the Cloud! Terminology as a Service, CHAT2013

Preview:

DESCRIPTION

Presenter: Andrejs Vasiljevs (Tilde) This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312

Citation preview

Welcome to the Cloud! Terminology as a Service

Andrejs Vasiļjevs

Tilde

tekom 2013 / Wiesbaden / 07.11.2013.

Term identification in the source text Consulting online databases and local files for translation

equivalents Creating and maintaining terminology glossaries Sharing term glossaries and involving others in their

polishing Structuring data in the industry standard formats Integrating term glossaries in CAT and other productivity

tools Keeping terminology up to date etc.

Complexity of terminology works

cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data

Terminology as a Service

TaaS User Needs Survey Results:Importance of terminology work

43.5%

39.9%

14.8%1.8%

Very important

Quite important

Less important

Not important

TaaS User Needs Survey: willingness to share

24.9%

19.2%

14.2%

11.4%

7.6%

6.0%

16.7%

Yes, provided that…

Joint contribution to the DBAccess controlLegal aspectsExternal quality controlLittle effortAnonymityOther

48.6%

22.0%

16.5%

8.3%4.6%

No, because…

Legal restrictions

Poor quality/Lack of time

Own asset

Risk of misunderstanding

60.5% 39.5%

Tilde Latvia (Coordinator)

TAUS Netherlands

Kilgray Hungary

Cologne University

of Applied Sciences Germany

University of Sheffield UK

TaaS Partners

Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries

Provide instant access to term translation equivalents and translation candidates for professional translators through CAT tools

Domain adaptation of statistical machine translation systems by dynamic integration with TaaS provided terminology data

TaaS Mission

Automatic extraction of monolingual term candidates from user uploaded documents

Automatic retrieval of translation equivalents from different public and industry terminology databases

Translation candidate acquisition from multilingual web data

Facilities for cleaning-up by users automatically acquired terminological data;

Data sharing and integration facilities through APIs and export tools

Key services of TaaS

Research

Development

Usage

Focus areas

Term extraction

Collection of domain specific multilingual corpora

Max(FTC)

Usability

Outreach

Sustainability

Quality

Performance

Scalability

Interoperability

TaaS Services

TAUS Datarepository of multilingual translation memories

EuroTermBankdatabank of federated multilingual terminology

IATEinter-institutional termbank of European Union

META-SHARE distributed Pan-European repository of language resources

Target Repositories

Support for industry standard formats

Integration into CAT and productivity tools

API to integrate TaaS services into various software applications

Integration

Term identification and annotation

HTML Term AnnotationTerm entries for terms identified in EuroTermBank are stored in TBX format in a <script> element that is placed in the HTML5 document.

XLIFF Term Annotation

Identifying and marking terms

Machine users

TaaS Terminology Services

ITS 2.0 enriched content

ITS2.0term-annotated content

export / visualisation

Showcase Web Page

Terminology Annotation

Web Service API

Plaintext

Term-annotated content

ITS 2.0 enriched content

ITS2.0term-annotated

content

CAT Tools MT Systems

ITS 2.0 enriched content

ITS2.0term-annotated

content

Human users(e.g., translators,

terminologists)

New W3C standard for InternationalizationTag Set ITS 2.0

TaaS Architecture

Presentation Layer

Web Page UI Public API

Application Logic LayerTerminology

collection management

User management

Terminologycollection

search

Terminology collection creation

Data Storage Layer(Shared Term Repository)

High-performance Computing (HPC) Cluster

SGE

External TDBs

CAT tools MT

htt

ps

RES

T

htt

p/h

ttp

sh

tml

htt

ps

RES

T

htt

ps

RES

T

incl

ud

ed

CPUCPU

incl

ud

ed

Shared Term Repository

DB

File Store

Web Browsers

HPC frontend

CPU

CPUCPU CPU

CPUCPU CPU

Term extraction workflowsFull collection

creation workflow

Monolingual collection creation

Translation candidateextraction

....

Modules

Result processing

Collection Importer

Marked Text enrichment

Text tagging

with terms

Statistical DB acquisition

Statistical DB feeding

Bilingual Term Extraction System

Parameter retriever

Translation lookup

ETB & STR

IATE

TAUS API

Statistical DB

Collection merger

CPUCPU CPU

Term extractionTXT extractor

TWSC

Kilgray TermExtractor

Collection creator

Term normalizer

Statistical DB

How to instruct SMT to use the right terms?

ko

ks

tim

be

r

Put TaaS in the service for MT

s

do-it-yourselfMT factory on the cloud

Narrow Domain Automotive MT

English – Latvian

DATA

2 M unique parallel sentences

1.9 M monolingual sentences

0.2 M in-domain monolingual

QUALITY

16% improvement from terminology integration

Boost in the quality of machine translation

Come & Trydemo.taas-project.eu

Thank you!andrejs@tilde.com

The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312

Recommended