Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Building universal understanding
Combining Crowd and AI to scale professional-quality translation
João GraçaCTO
João GraçaCTO
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41
80% English
The internet, 1997
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 42
5%
Portuguese
8%
Spanish
20%
Chinese
4%
German
6%
Japanese
3%
Arabic
30% English
The internet, 2017
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 43
Language barriers = trade barriers
“Everyone speaks English”
costs the UK
£48BJust 12%
of EU retailers sell online to other EU countries
Just 15% of EU consumers buy online
from other EU countries3.5% UK GDP every year
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 44
5
Machine Translation
AffordableFast
Quality not good enough
Professional Translation
ExpensiveSlow
Lack of fast, affordable translation with human qualityAvailable Solutions
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 45
“All translation firms together are able to translate far less than 1% of relevant
content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 46
Will AI solve translation?
MACHINE ONLYTIME
JOBS
MQM 95
QUALITY
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 47
Will AI solve translation?H
UM
AN
EFF
OR
T
TIME
JOBS
MQM 95
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 48
0%
20%
40%
60%
80%
100%
0 6 12 18 24 30
Good Not sure Bad
MQM
Job
Quality per Job
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 49
Original customer request
Translated customer request
High Q.E.
Unbabel Pipeline
QualityEstimation
Q.E.
MachineTranslation
Low Q.E.
Re-Eval CommunityTranslators
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 50
TranslationMemory
Job ResultMT Router
Customer MT Customer APE
Machine Translation Pipeline
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 51
Customer Support Tickets
30
47,5
65
82,5
100
SMTNMT
Customized
Professional
94,0
80,0
65,0
50,0
MQM
Customer Adaptation
MQM
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 52
Word-Level QE Which words are translated
correctly/incorrectly?
Sentence-Level QE How good is the entire
translation?
Quality Estimation
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 53
Word-level QE example
Hey lá , eu sou pesaroso sobre aquele !OK OK OK OKBA
DBAD
BAD
BAD
BAD
Quality Estimation
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 54
Unbabel Ticket
Bad translation
Good translation
source MT final
QE Training
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 55
Job MachineTranslation
Customer
High Q.E.
Q.E.
QualityEstimation
Low Q.E.
Re-EvalCommunityTranslators
Document-Level QE how good is the entire document?
Human QE Can we evaluate post-edit output?
QE in the Pipeline
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 56
JobMachine
Translation
Customer
Q.E.
QualityEstimation Community
Translators
Customer
Q.E.
QualityEstimation
Data Generation Engine
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 57
Initial text
Submitted text
AfterBefore
NODATA
POINTS
With Data points:Mouse clicksKey pressesTimestamps
Data Generation Engine
Initial text
Submitted text
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 58
Raw data Processed information
At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 14Selected: 0
At 18:03:35:In nugget 3Pressed Shift Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed s Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed i Cursor at 26Selected: 0 At 18:03:35:In nugget 3Pressed e Cursor at 27Selected: 0
At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 14Selected: 0
Initial text“Espero que esto es útil”
• Deleted word “es”• Inserted word “sea”
Submitted text“Espero que esto sea útil”
Keystroke Analysis
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 59
•Editors Pool•Initial Text (MT)•Editor Assignment•Custom Editing Interfaces•Constant Quality Evaluation
Profession translation
Unbabel pillars
Speed Quality
Cost
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 60
50.000 Users
Unbabel Community
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 61
Evaluators
More specialization layers will be created
First tests right after signup
Editors get rated with training tasks
Expert Editor
Testing Phase
Training Content
Paid WorkOnly the best rated editors have access to customer tasks
Editors Pool
Annotators
1
2
3
4
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 62
Evaluation Tool
Document Level Human QE
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 63
Deep Annotations
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 64
Error Analysis
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 65
QE for Annotation
Pre-fill with word level QE
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 66
Editors Profiling
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 67
Pull30 m
6 H
2 D
6 D
20 m
40 m
SLA
1100
1000
1000
1000
1100
1100
Priority Editors Rating
4.2
3.8
4.3
4.8
Native TopicsQueue
G
G
G
G
R
R
Tasks/time
6 m
2 m
10 m
12 m
18 m
45 m
Topics
Editor Assignment
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 68
Regular distribution
3.8old rating
Smart distribution
4.6
Improved rating
Editor Assignment
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 69
Post-Editing Interfaces
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 70
QE on Interfaces
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 71
Post-Editing Interfaces
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 72
Time Spent on Job
WAITING
Translator 1
TIME
DELIVERYMT
Translator 2
WAITING
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 73
Time Spent on Job: Mobile
WAITING
Translator 1
TIME
DELIVERYMT
Translator 2
WAITING -20%
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 74
Spelling
Grammar
Consistency
Terminology
External NLP Services
Spell Check
Syntax Parser
Word AlignerRegister
Smartcheck
Style GuidesCustomer
LearningAnnotations
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 75
Spelling
Grammar
Consistency
Terminology
Register
Smartcheck (QE Version)
Style GuidesCustomer
LearningAnnotations
Q.E.
QualityEstimation
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 76
M A C H I N E + H U M A N
CORE API
CHAT API
CYRANO API
Customer Service Conversational
MESSAGING API
Language OS
Language Engine
Bots
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 77
Unbabel for Customer Service
Unbabel adapts to any workflow
English-speakingagent
Unbabel’s Machine Translation
Distributed Human Translation
API
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 78
94
Customer Replies: Speed & Quality
20 minutes
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 79
Unbabel Chat
Native speaking in multiple languages
API
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 80
Chat
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 81
Editors train the
MT engineCommunity of 100K+ translatorsSkips Humans
Chat Translation Flow
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 82
Chat Messages: Speed & Quality
902 minutes
MT
80%
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 83
Other Use Cases
Reviews
Travel descriptions
Video
SEO
Newsletters
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 84
We’re Hiringhttps://unbabel.com/careers/
Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 85