View
290
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates go to http://www.statmt.org/mosescore/ or follow us on Twitter - #MosesCore
Citation preview
TAUS MACHINE TRANSLATION SHOWCASE
Creating Competitive Advantage with Rapid Customization & Deployment of Moses 10:20 – 10:30 Thursday, 10 October 2013 Tony O’Dowd KantanMT
No Hardware. No So,ware. No Hassle MT.
Tony O’Dowd Founder & Chief Architect
Localiza6on World 2013
TAUS – MT Showcase
What we aim to cover today? � User Scenario #1
� Building Produc?on MT Systems � Structured Approach � Build – Measure – Learn Process
� User Scenario #2 � Retraining with Post-‐Edits
� RoundTable Inc. – their story
� User Scenario #3 � Selec?ng the best engine for the job
� Milengo – their approach � GeLng the Translator involved
� Q&A
20 Minutes
TAUS – MT Showcase
What is KantanMT.com? � Sta6s6cal MT System
� Cloud-‐based � Highly scalable � Inexpensive to operate � Quick to deploy
� Our Vision � To put Machine Transla?on
� Customiza?on � Improvement � Deployment
� into your hands
Ac6ve KantanMT Engines
6,632 Training Words Uploaded
23,653,605,925 Member Words Translated
362,291,925
Fully Opera?onal 7 months
TAUS – MT Showcase
Measure – KantanMT engine calibra?on
� Track using KantanWatch™ � Compare engines quickly � Monitor produc?on data � Use your own test/tune data sets
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
� What to look out for? BLEU F-‐Measure TER Wordcount
24% 50% 66% 172K
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
� Learn from examining the output
§ Catalogue Errors § Untranslated text § Incorrect numeric
formaLng § Invalid characters § High level of post-‐edi?ng
required
§ Conclusions
§ Engine coverage is bad due to low wordcount
§ Post-‐Edi?ng is high due to low engine coverage
§ Training data doesn’t contain correct numeric formaLng
§ Bad formaLng in training data
Low OK High Low
TAUS – MT Showcase
Learn – KantanMT Experimenta?on
§ Ac6on Plan § Coverage – More training
data required, relevant and of high quality. Also use a Glossary File to improve terminology consistency and accuracy.
§ Numeric Forma_ng – Use PEX rule to post-‐edit transla?on and fix numeric formats
§ Invalid Character – Use PEX rule to fix this invalid character issue
§ Post-‐Edi6ng – By increasing the quan?ty of training data the KantanMT engine will perform be]er overall
Low OK High Low
� Learn from examining the output
TAUS – MT Showcase
Ac6on Plan – focus on improving measurements
TAUS – MT Showcase
Build Measure Learn : The Results � Analyse output
§ Untranslated text § Numeric FormaLng § Invalid Character
TAUS – MT Showcase
User Scenario #2 � Long history of MT usage
� In-‐house exper?se � Large customer demand � Using MT since 2005 � Now manage their own in-‐house system on the KantanMT.com
� Goal � Faster project turnaround ?mes � More service offerings to client base � More produc?on capacity � Cost efficiencies
About RoundTable Studio RoundTable Studio is a leading provider of transla?on and localiza?on services for the Spanish and Brazilian Portuguese language markets.
Early Adopter
TAUS – MT Showcase
User Scenario #2 � Business Scenario
� Con?nuous transla?on quality improvement � Reduced post-‐edi?ng/turn-‐around ?mes
Early Adopter
TAUS – MT Showcase
User Scenario #2 � Results
� Greater produc?on capacity � Improvement in quality � Faster project turn-‐around ?mes
Early Adopter
“Since signing up with KantanMT, we have been able to take on more work and increase our capacity levels”
Laura Grossi – MT Specialist, RoundTable Studio
TAUS – MT Showcase
User Scenario #3 � Long history of MT usage
� In-‐house exper?se � Large customer demand
� Originally outsourced MT � 3rd party consultancy company
� Vendor Agnos6c � Microso, Translator Hub � KantanMT.com
� All systems are cloud based � Like hands-‐on approach to managing their own MT engines
About Milengo Milengo provides transla?on, localiza?on and related language services specializing in so,ware, website and documenta?on localiza?on.
TAUS – MT Showcase
User Scenario #3 � Business Scenario
� Select best engine for language combina?on
� Client requests a job that involves a MT component � Finding Training Data
� Data is aggregated from the clients previous transla?ons
� Building Engines � Same training data is provided to each engine � Same language combina?ons � Itera?ve process un?l sa?sfied with system performance (internal process)
TAUS – MT Showcase
User Scenario #3 � Transla6on Quality Analysis
� Sample of 1,000 segments selected � Tabulated & anonymised
� Dispatched to Senior Translators
Source MT Target
Adequ
acy (Score 1-‐5)
Fluency (Score 1-‐5)
Overall q
uality (1-‐4)
Wrong term
inology
Wrong Spellin
g
Source not
Transla
ted/Omissions
Compliance with
client sp
ecs
Literal transla
tion
Text/Information added
Capitalization
Wrong W
ord Form
Wrong Part o
f Speech
Punctuation
Sentence Structure
Tags and
Markup
Locale Adaptation
Spacing
Style Syntax and Grammar Tech
TAUS – MT Showcase
User Scenario #3 � Feedback collated from Senior Translators
� Match best engine for language quality � Very unique – pseudo-‐crowd sourcing of most appropriate engine
� Match engine to best language support � Translators always involved in engine selec?on process
� Feedback to client � Match requirements and quality expecta?ons
TAUS – MT Showcase
User Scenario #3 � Levels of post-‐edi6ng services
� Adequacy Review � All meaning expressed in the source segment appears in
the translated segment � Structural integrity – tags, placeholders � Fit-‐for-‐purpose quality
� Fluency Review � No grammar errors, excellent word selec?on and good
syntax � Publishable quality
� Client picks review � To fit budget, ?me-‐frame, audience, channel etc.
Tony O’Dowd [email protected]