22
A Comparison of Statistical Post- Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan; Fred Hollowood; Johann Roturier

A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Embed Size (px)

Citation preview

Page 1: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

A Comparison of Statistical Post-Editing on Chinese and Japanese

Midori Tatsumi and Yanli SunUnder the supervision of:

Sharon O’Brien; Minako O’Hagan;

Fred Hollowood; Johann Roturier

Page 2: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Outline

Introduction11

Evaluation on Sentence Level 33

Analysis on modifications made by SPE22

Conclusion44

Page 3: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Introduction

• Rule-Based Machine Translation (RBMT)

– Three Stages: • Analysis: analyze a source text into abstract lexical and structural

representations

• Transfer: convert the source language representations into target language representations

• Generation: generate the target text

Page 4: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Introduction

• Rule-Based Machine Translation (RBMT)

– Three Stages: • Analysis: analyze a source text into abstract lexical and structural representations

• Transfer: convert the source language representations into target language representations

• Generation: generate the target text

• Statistical Machine Translation (SMT)

– Two Stages:

• Training: automatically learn translation and language knowledge from parallel corpus

• Decoding: translate new sentences using the above learned knowledge

Page 5: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Introduction

• Rule-Based Machine Translation (RBMT)

– Three Stages:

• Analysis: analyze a source text into abstract lexical and structural representations

• Transfer: convert the source language representations into target language representations

• Generation: generate the target text

• Statistical Machine Translation (SMT)

– Two Stages:

• Training: automatically learn translation and language knowledge from parallel corpus

• Decoding: translate new sentences using the above learned knowledge

• Post-Editing (PE)

– Human post-editing

– Automatic post-editing

– Statistical post-editing (SPE)

Page 6: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Introduction

• Statistical Post-editing (SPE) of Rule-Based Machine Translation (RBMT) Output

• Knight & Chander (1994)

• Simard et al. (2007a, 2007b)

Flowchart of RBMT

Human Post-editor

Final output

Output 2

Flowchart of SPE

RBMT

Source

Final output

Output 1

SPE module

SMT

ReferenceRBMT output

RBMT

Source

Output 1

Human Post-editor

Page 7: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Introduction

– Experimental setting

SMT

RBMT

Human Post-editor

SPE module

Source

Final output

Output 1

Output 2

ReferenceRBMT output

Moses

Translation Memory: 529,822 (ZH) and

143,742 (JA)

Systran -UD: 8,832 entries (ZH) and 6,363 entries (JA)

Chinese (ZH); Japanese

(JA)

English

Page 8: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Introduction

– Evaluate SPE: Compare Output 2 and output 1

SMT

RBMT

Human Post-editor

SPE module

Source

Final output

Output 1

Output 2

ReferenceRBMT output

Page 9: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEMethodology

• Pilot project

– Random selection of 100 sentences for each language

• Classify and Evaluate the changes

– Classification(Vilar et al. 2006 )• Alteration, Deletion, Addition of Content/Function words

• Form of Tense/Voice/Imperative/Formality (Politeness)

• Fixed expression

• Reordering

• Punctuation

– Evaluation (Dugast et al. 2007 )• Improvement

• Degradation

• Equivalent

Page 10: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQuantitative Evaluation

• Modifications distribution in Japanese and Chinese

  

Improvement Degradation Equivalent

ZH JA ZH JA ZH JA

AlterationContent words 137 45 19 40 28 25

Function words 38 45 6 9 17 30

DeletionContent words 0 9 0 2 0 1

Function words 51 57 4 5 12 16

AdditionContent words 4 0 3 2 2 0

Function words 12 1 8 2 15 1

Forms

Tense or Voice 6 3 0 0 3 5

Formality 0 1 1 0 0 0

Imperative 0 8 0 0 0 2

Fixed Expression 8 0 0 0 0 1

Word / Phrase Reordering 9 1 3 3 0 1

Punctuation 31 47 4 9 0 4

Total  296 217 48 72 77 85

Page 11: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

• Similarities

Source MT output SPE output

the actions that you specify for that rule

JA: あなたがその規則のために指定する処理

そのルールに指定する処理

After you configure your … ZH: …在 您 配 置 您 的 …配 置

Deletion of function words

Punctuation

Source MT output SPE output

To maintain … JA: 保守するため… 維持するには…

Reverts to … ZH: 恢 复 对… 恢 复 到 ...

Source MT output SPE output

MPE provides an option … JA: オプションを提供 します 。 オプションがあります .

while the synchronization is in progress…

ZH: … , 当 同 步 进 展 中 时 …同 步 处 理 .

Alteration of function words

Page 12: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

• Similarities

Alteration of function words

Source MT output SPE output

the actions that you specify for that rule

JA: あなたがその規則のために指定する処理

そのルールに指定する処理

After you configure your … ZH: 在 您 配 置 您 的… 配 置…

Deletion of function words

Punctuation

Source MT output SPE output

To maintain … JA: …保守するため 維持するには…

Reverts to … ZH: …恢 复 对 恢 复 到 ...

Source MT output SPE output

MPE provides an option … JA: オプションを提供 します 。 オプションがあります .

while the synchronization is in progress…

ZH: … , 当 同 步 进 展 中 时 …同 步 处 理 .

Page 13: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

• Similarities

Alteration of function words

Source MT output SPE output

the actions that you specify for that rule

JA: あなたがその規則のために指定する処理

そのルールに指定する処理

After you configure your … ZH: 在 您 配 置 您 的 配 置

Deletion of function words

Punctuation

Source MT output SPE output

To maintain … JA: …保守するため 維持するには…

Reverts to … ZH: 恢 复 对 恢 复 到

Source MT output SPE output

MPE provides an option … JA: オプションを提供 します 。 オプションがあります .

while the synchronization is in progress…

ZH: , … 当 同 步 进 展 中 时 …同 步 处 理 .

Page 14: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

• Differences

Alteration of content words

Addition of function words

Source MT output SPE output

console commands JA: コンソールは命じます console コマンド

number JA: 番号 数

subdomains ZH: subdomains 子 域

Source MT output SPE output

A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。 黑 色 线 表 明 它 已 禁 用。

On the Spim tab… ZH: 在 Spim 选 项 卡… 在 Spim 选 项 卡 上…

Page 15: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

• Differences

Alteration of content words

Addition of function words

Source MT output SPE output

console commands JA: コンソールは命じます console コマンド

number JA: 番号 数

subdomains ZH: subdomains 子 域

Source MT output SPE output

A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。 黑 色 线 表 明 它 已 禁 用。

On the Spim tab… ZH: 在 Spim 选 项 卡… 在 Spim 选 项 卡 上…

Page 16: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

ReorderingSource MT output SPE output

These threats are then… ZH: 这 些 威 胁 然 后 … 然 后 , 这 些 威 胁…

Source MT output SPE output

(Imperative ending) JA: して下さい します

Source MT output SPE output

In general ZH: 一 般 情 况 下 ,… 通 常 情 况 下 ,…

Fixed expression

Imperatives forms

• Differences

Page 17: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

ReorderingSource MT output SPE output

These threats are then… ZH: 这 些 威 胁 然 后… 然 后 , 这 些 威 胁…

Source MT output SPE output

(Imperative ending) JA: して下さい します

Source MT output SPE output

In general,… ZH: 一 般 情 况 下 ,… 通 常 情 况 下 ,…

Fixed expression

Imperatives forms

• Differences

Page 18: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Analysis of the Modifications Made by SPEQualitative Evaluation

ReorderingSource MT output SPE output

These threats are then… ZH: 这 些 威 胁 然 后… 然 后 , 这 些 威 胁…

Source MT output SPE output

(Imperative ending) JA: して下さい します

Source MT output SPE output

In general,… ZH: 一 般 情 况 下 ,… 通 常 情 况 下 ,…

Fixed expression

Imperatives forms

• Differences

Page 19: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Evaluation on Sentence Level

• Methodology– Same 100 segments– Effect of SPE on Fluency, Adequacy and PE time– Four evaluators per language – Random distribution of MT output and SPE output

Criteria Chinese Japanese

Fluency 0.276 0.598

Adequacy 0.288 0.582

Less PE time 0.284 0.624

• Kappa scores (Inter-evaluator agreement level)

– Japanese: moderate to substantial agreement

– Chinese: generally fair agreement

Source_EN Output 1 Output 2 Fluency Adequacy Less-PE time

Turns on or off the special meaning of metacharacters.

オン / オフ回転メタ文字の特別な意味。

有効または無効にメタ文字の特別な意味します . 1 / 2 / E 1 / 2 / E 1 / 2 / E

Page 20: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Evaluation on Sentence Level Results and Analysis

• Improvement by SPE:

– Chinese ─ Fluency and Adequacy: ≈ 40%, PE time: ≈ 50%

– Japanese ─ Fluency, Adequacy, PE time: ≈ 60%

Language Chinese Japanese

Criteria Fluency Adequacy Less PE Time Fluency Adequacy Less PE Time

MT 12.75 15.50 15.00 14.50 8.00 9.75

SPE 37.75 38.00 48.25 59.25 61.50 62.50

Equal 49.50 46.50 36.75 26.05 30.50 27.75

Total 100 100 100 100 100 100

Page 21: A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;

Conclusions

• SPE generates more improvement than degradation

• Three fold for Japanese; Six fold for Chinese

• Linguistic changes vary between ZH and JA

• SPE changes are generally limited to word level

• SPE improves fluency, adequacy, and shortens PE time