View
280
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presenter: Andrejs Vasiljevs (Tilde) This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312
Citation preview
Welcome to the Cloud! Terminology as a Service
Andrejs Vasiļjevs
Tilde
tekom 2013 / Wiesbaden / 07.11.2013.
Term identification in the source text Consulting online databases and local files for translation
equivalents Creating and maintaining terminology glossaries Sharing term glossaries and involving others in their
polishing Structuring data in the industry standard formats Integrating term glossaries in CAT and other productivity
tools Keeping terminology up to date etc.
Complexity of terminology works
cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data
Terminology as a Service
TaaS User Needs Survey Results:Importance of terminology work
43.5%
39.9%
14.8%1.8%
Very important
Quite important
Less important
Not important
TaaS User Needs Survey: willingness to share
24.9%
19.2%
14.2%
11.4%
7.6%
6.0%
16.7%
Yes, provided that…
Joint contribution to the DBAccess controlLegal aspectsExternal quality controlLittle effortAnonymityOther
48.6%
22.0%
16.5%
8.3%4.6%
No, because…
Legal restrictions
Poor quality/Lack of time
Own asset
Risk of misunderstanding
60.5% 39.5%
Tilde Latvia (Coordinator)
TAUS Netherlands
Kilgray Hungary
Cologne University
of Applied Sciences Germany
University of Sheffield UK
TaaS Partners
Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries
Provide instant access to term translation equivalents and translation candidates for professional translators through CAT tools
Domain adaptation of statistical machine translation systems by dynamic integration with TaaS provided terminology data
TaaS Mission
Automatic extraction of monolingual term candidates from user uploaded documents
Automatic retrieval of translation equivalents from different public and industry terminology databases
Translation candidate acquisition from multilingual web data
Facilities for cleaning-up by users automatically acquired terminological data;
Data sharing and integration facilities through APIs and export tools
Key services of TaaS
Research
Development
Usage
Focus areas
Term extraction
Collection of domain specific multilingual corpora
Max(FTC)
Usability
Outreach
Sustainability
Quality
Performance
Scalability
Interoperability
TaaS Services
TAUS Datarepository of multilingual translation memories
EuroTermBankdatabank of federated multilingual terminology
IATEinter-institutional termbank of European Union
META-SHARE distributed Pan-European repository of language resources
Target Repositories
Support for industry standard formats
Integration into CAT and productivity tools
API to integrate TaaS services into various software applications
Integration
Term identification and annotation
HTML Term AnnotationTerm entries for terms identified in EuroTermBank are stored in TBX format in a <script> element that is placed in the HTML5 document.
XLIFF Term Annotation
Identifying and marking terms
Machine users
TaaS Terminology Services
ITS 2.0 enriched content
ITS2.0term-annotated content
export / visualisation
Showcase Web Page
Terminology Annotation
Web Service API
Plaintext
Term-annotated content
ITS 2.0 enriched content
ITS2.0term-annotated
content
CAT Tools MT Systems
ITS 2.0 enriched content
ITS2.0term-annotated
content
Human users(e.g., translators,
terminologists)
New W3C standard for InternationalizationTag Set ITS 2.0
TaaS Architecture
Presentation Layer
Web Page UI Public API
Application Logic LayerTerminology
collection management
User management
Terminologycollection
search
Terminology collection creation
Data Storage Layer(Shared Term Repository)
High-performance Computing (HPC) Cluster
SGE
External TDBs
CAT tools MT
htt
ps
RES
T
htt
p/h
ttp
sh
tml
htt
ps
RES
T
htt
ps
RES
T
incl
ud
ed
CPUCPU
incl
ud
ed
Shared Term Repository
DB
File Store
Web Browsers
HPC frontend
CPU
CPUCPU CPU
CPUCPU CPU
Term extraction workflowsFull collection
creation workflow
Monolingual collection creation
Translation candidateextraction
....
Modules
Result processing
Collection Importer
Marked Text enrichment
Text tagging
with terms
Statistical DB acquisition
Statistical DB feeding
Bilingual Term Extraction System
Parameter retriever
Translation lookup
ETB & STR
IATE
TAUS API
Statistical DB
Collection merger
CPUCPU CPU
Term extractionTXT extractor
TWSC
Kilgray TermExtractor
Collection creator
Term normalizer
Statistical DB
How to instruct SMT to use the right terms?
ko
ks
tim
be
r
Put TaaS in the service for MT
s
do-it-yourselfMT factory on the cloud
Narrow Domain Automotive MT
English – Latvian
DATA
2 M unique parallel sentences
1.9 M monolingual sentences
0.2 M in-domain monolingual
QUALITY
16% improvement from terminology integration
Boost in the quality of machine translation
Come & Trydemo.taas-project.eu
Thank [email protected]
The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312