45
Building universal understanding Combining Crowd and AI to scale professional-quality translation João Graça CTO João Graça CTO Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41

Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Building universal understanding

Combining Crowd and AI to scale professional-quality translation

João GraçaCTO

João GraçaCTO

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 41

Page 2: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

80% English

The internet, 1997

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 42

Page 3: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

5%

Portuguese

8%

Spanish

20%

Chinese

4%

German

6%

Japanese

3%

Arabic

30% English

The internet, 2017

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 43

Page 4: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Language barriers = trade barriers

“Everyone speaks English”

costs the UK

£48BJust 12%

of EU retailers sell online to other EU countries

Just 15% of EU consumers buy online

from other EU countries3.5% UK GDP every year

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 44

Page 5: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

5

Machine Translation

AffordableFast

Quality not good enough

Professional Translation

ExpensiveSlow

Lack of fast, affordable translation with human qualityAvailable Solutions

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 45

Page 6: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

“All translation firms together are able to translate far less than 1% of relevant

content produced everyday” CSA – MT Is Unavoidable to Keep Up with Content Volumes

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 46

Page 7: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Will AI solve translation?

MACHINE ONLYTIME

JOBS

MQM 95

QUALITY

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 47

Page 8: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Will AI solve translation?H

UM

AN

EFF

OR

T

TIME

JOBS

MQM 95

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 48

Page 9: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

0%

20%

40%

60%

80%

100%

0 6 12 18 24 30

Good Not sure Bad

MQM

Job

Quality per Job

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 49

Page 10: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Original customer request

Translated customer request

High Q.E.

Unbabel Pipeline

QualityEstimation

Q.E.

MachineTranslation

Low Q.E.

Re-Eval CommunityTranslators

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 50

Page 11: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

TranslationMemory

Job ResultMT Router

Customer MT Customer APE

Machine Translation Pipeline

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 51

Page 12: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Customer Support Tickets

30

47,5

65

82,5

100

SMTNMT

Customized

Professional

94,0

80,0

65,0

50,0

MQM

Customer Adaptation

MQM

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 52

Page 13: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Word-Level QE Which words are translated

correctly/incorrectly?

Sentence-Level QE How good is the entire

translation?

Quality Estimation

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 53

Page 14: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Word-level QE example

Hey lá , eu sou pesaroso sobre aquele !OK OK OK OKBA

DBAD

BAD

BAD

BAD

Quality Estimation

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 54

Page 15: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Unbabel Ticket

Bad translation

Good translation

source MT final

QE Training

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 55

Page 16: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Job MachineTranslation

Customer

High Q.E.

Q.E.

QualityEstimation

Low Q.E.

Re-EvalCommunityTranslators

Document-Level QE how good is the entire document?

Human QE Can we evaluate post-edit output?

QE in the Pipeline

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 56

Page 17: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

JobMachine

Translation

Customer

Q.E.

QualityEstimation Community

Translators

Customer

Q.E.

QualityEstimation

Data Generation Engine

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 57

Page 18: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Initial text

Submitted text

AfterBefore

NODATA

POINTS

With Data points:Mouse clicksKey pressesTimestamps

Data Generation Engine

Initial text

Submitted text

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 58

Page 19: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Raw data Processed information

At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 14Selected: 0

At 18:03:35:In nugget 3Pressed Shift Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed s Cursor at 25Selected: 0 At 18:03:35:In nugget 3Pressed i Cursor at 26Selected: 0 At 18:03:35:In nugget 3Pressed e Cursor at 27Selected: 0

At 18:03:30:In nugget 3mouseClick Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 16Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 15Selected: 0 At 18:03:31:In nugget 3Pressed Backspace Cursor at 14Selected: 0

Initial text“Espero que esto es útil”

• Deleted word “es”• Inserted word “sea”

Submitted text“Espero que esto sea útil”

Keystroke Analysis

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 59

Page 20: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

•Editors Pool•Initial Text (MT)•Editor Assignment•Custom Editing Interfaces•Constant Quality Evaluation

Profession translation

Unbabel pillars

Speed Quality

Cost

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 60

Page 21: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

50.000 Users

Unbabel Community

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 61

Page 22: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Evaluators

More specialization layers will be created

First tests right after signup

Editors get rated with training tasks

Expert Editor

Testing Phase

Training Content

Paid WorkOnly the best rated editors have access to customer tasks

Editors Pool

Annotators

1

2

3

4

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 62

Page 23: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Evaluation Tool

Document Level Human QE

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 63

Page 24: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Deep Annotations

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 64

Page 25: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Error Analysis

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 65

Page 26: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

QE for Annotation

Pre-fill with word level QE

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 66

Page 27: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Editors Profiling

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 67

Page 28: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Pull30 m

6 H

2 D

6 D

20 m

40 m

SLA

1100

1000

1000

1000

1100

1100

Priority Editors Rating

4.2

3.8

4.3

4.8

Native TopicsQueue

G

G

G

G

R

R

Tasks/time

6 m

2 m

10 m

12 m

18 m

45 m

Topics

Editor Assignment

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 68

Page 29: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Regular distribution

3.8old rating

Smart distribution

4.6

Improved rating

Editor Assignment

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 69

Page 30: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Post-Editing Interfaces

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 70

Page 31: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

QE on Interfaces

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 71

Page 32: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Post-Editing Interfaces

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 72

Page 33: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Time Spent on Job

WAITING

Translator 1

TIME

DELIVERYMT

Translator 2

WAITING

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 73

Page 34: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Time Spent on Job: Mobile

WAITING

Translator 1

TIME

DELIVERYMT

Translator 2

WAITING -20%

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 74

Page 35: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Spelling

Grammar

Consistency

Terminology

External NLP Services

Spell Check

Syntax Parser

Word AlignerRegister

Smartcheck

Style GuidesCustomer

LearningAnnotations

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 75

Page 36: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Spelling

Grammar

Consistency

Terminology

Register

Smartcheck (QE Version)

Style GuidesCustomer

LearningAnnotations

Q.E.

QualityEstimation

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 76

Page 37: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

M A C H I N E + H U M A N

CORE API

CHAT API

CYRANO API

Customer Service Conversational

MESSAGING API

Language OS

Language Engine

Bots

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 77

Page 38: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Unbabel for Customer Service

Unbabel adapts to any workflow

English-speakingagent

Unbabel’s Machine Translation

Distributed Human Translation

API

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 78

Page 39: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

94

Customer Replies: Speed & Quality

20 minutes

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 79

Page 40: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Unbabel Chat

Native speaking in multiple languages

API

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 80

Page 41: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Chat

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 81

Page 42: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Editors train the

MT engineCommunity of 100K+ translatorsSkips Humans

Chat Translation Flow

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 82

Page 43: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Chat Messages: Speed & Quality

902 minutes

MT

80%

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 83

Page 44: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

Other Use Cases

Reviews

Travel descriptions

Video

SEO

Newsletters

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 84

Page 45: Combining Crowd and AIUnbabel for Customer Service Unbabel adapts to any workflow English-speaking agent Unbabel’s Machine Translation Distributed Human Translation API Proceedings

We’re Hiringhttps://unbabel.com/careers/

Proceedings for AMTA 2018 Workshop: Translation Quality Estimation and Automatic Post-Editing Boston, March 21, 2018 | Page 85