Optimizing Statistical Machine Translation for Text Simpliﬁcation · Optimizing Statistical...

Optimizing Statistical Machine Translation for Text Simplification

TACL paper @ ACL Aug-9-2016

Wei XuCourtney Napoles

Ellie Pavlich Quanze Chen

Chris Callison-Burch

UPenn ➞ Ohio State University JHU UPenn UPenn ➞ University of Washington UPenn

Text Simplification (A Monolingual Machine Translation Problem)

★ for children, disabled, non-native speakers … ★ for other NLP tasks (MT, summarization …)

Our Paper Last Year (TACL ’15)

Problems in Current Text Simplification Research (TACL ’15)

Data Evaluation System

not muchparaphrasing

no reliableautomatic metric

Simple Wikipediawas not that simple

The Newsela Corpus!very high qualityfreely available

(often use Moses as a black-box)

Wei Xu, Chris Callison-Burch, Courtney Napoles. “Problems in Current Text Simplification Research: New Data Can Help” TACL (2015)

This Paper and This Talk (TACL ’16)

Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in TACL (2016)

Crowdsourcing3.5k sentences

8 referencesWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

8 references

SARI the 1st metric

that worksWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

8 references

SARI the 1st metric

that works

state-of-the-artparaphrasingfor simplicity

DataCrowdsourcing3.5k sentences

8 references

★ paraphrasing! (little deletion; no splitting) ★ simple quality control (see our NAACL 2015 paper) ★ freely available

8 references

Wiki Jeddah is the principal gateway to Mecca, Islam’s holiest city, which able-bodied Muslims are required to visit at least once in their lifetime.

Turk #1 Jeddah is the main entrance to Mecca, the holiest city in Islam, which all healthy Muslims need to visit at least once in their life.

Our System

Jeddah is the main gateway to Mecca, Islam’s holiest city, which sound Muslims have to visit at least once in their life.

8 references

Our System

8 references

Our System

EvaluationSARI

the 1st metric that works

EvaluationSARI

SystemoutputHuman

references

keepO ∩ R ∩ I

addO ∩ R ∩ not I

delI ∩ not O ∩ not R

EvaluationSARI

SystemoutputHuman

references

keepO ∩ R ∩ I

★ light-weighted and tunable ★ SARI — “the BLEU for simplification” ★ take input into account!★ use F-measure

EvaluationSARI

SystemoutputHuman

references

keepO ∩ R ∩ I

SARI = d1Fadd + d2Fkeep + d3Pdel

d1 = d2 = d3 = 1/3

★ light-weighted and tunable ★ SARI — “the BLEU for simplification” ★ take input into account!★ use F-measure

Systemstate-of-the-artparaphrasingfor simplicity

★ In-depth adaptation of statistical MT (Joshua) • tuning data (small parallel corpus) • tunable metric (PRO pairwise ranking optimization) • PPDB (no large parallel corpus needed!) • simplification-specific features

Lexical applauded ⟶ praised

Phrasal generally acknowledged ⟶ widely accepted

Syntactic NNP’s JJ legislation ⟶ the JJ law of NNPWei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

Input:

Syntactic NNP’s JJ legislation ⟶ the JJ law of NNP

PPDB (4+ million rules)

Input:

Candidate Simplifications:

Syntactic NNP’s JJ legislation ⟶ the JJ law of NNP

SARI = 0.21

SARI = 0.33

PPDB (4+ million rules)

SARI = 0.15Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

Human Evaluation

Simple Wikipedia

Turk (Wubben et al. 2012) Joshua -BLEU-

Joshua -SARI-

Grammar Meaning Simplicity+

Automatic Simplification

tuning towards BLEU leaves input unchanged(8 references)

Human Evaluation

Simple Wikipedia

Turk Moses + reranking(Wubben et al. 2012)

Joshua -BLEU-

Joshua -SARI-

Human Evaluation

Simple Wikipedia

Turk Moses + reranking(Wubben et al. 2012)

Joshua -BLEU-

Joshua -SARI-

tuning towards BLEU leaves input unchangedWhy? Why? Why?Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch. “Optimizing Statistical Machine Translation for Simplification” in

TACL (2016)

Automatic Metrics (Spearman’s rho correlation w/ human ratings)

Grammar Meaning Simplicity

BLEU 0.59 0.70 0.15

FKBLEU 0.35 0.41 0.24

SARI 0.34 0.40 0.34

Automatic Metrics (Spearman’s rho correlation w/ human ratings)

Grammar Meaning Simplicity

BLEU 0.59 0.70 0.15

FKBLEU 0.35 0.41 0.24

SARI 0.34 0.40 0.34

this is not necessarily a bad thingWhy? Why? Why?

0 0.25 0.5 0.75 1

BLEU correlates w/ Grammar

If a system leaves input unchanged, it gets

perfect Grammar perfect Meaning

and very high BLEU score.because there are multiple references

(unlike bilingual translation)

high Grammar high BLEU

0 0.25 0.5 0.75 1

BLEU is not for SimplificationG

0 0.25 0.5 0.75 1

If a system leaves input unchanged, it gets

perfect Grammar perfect Meaning

and very high BLEU score.But it is not good simplification!

low Simplicity high BLEU

high Grammar high BLEU

0 0.25 0.5 0.75 1

SARI is way better than BLEU

arSARI

0.12 0.24 0.36 0.48 0.6

high Grammarcould be bad simplification

low SARI high Grammar

low BLEU high Grammar

0 0.25 0.5 0.75 1

SARI is way better than BLEU

itySARI

0.12 0.24 0.36 0.48 0.6

low Simplicitymust be bad simplification

low Simplicityhigh BLEU

low Simplicityhigh SARI

Machine Translation

Bilingual(Paraphrase =)Monolingual

naturally availableparallel text a lot much less

sensitive to error less more

objective straightforward sophisticated

standard metric BLEU etc. SARI and ?

Thank you

Sponsor: National Science FoundationData and code available at https://cocoxu.github.io/

Questions?

I am looking for PhD students and post-docs. Email me! Wei Xu @ Ohio State University

Newsela Dataset (TACL ’15)

Wei Xu, Chris Callison-Burch, Courtney Napoles. “Problems in Current Text Simplification Research: New Data Can Help” TACL (2015)

every article at 5 levels of simplification total 50k+ sentences, written by trained editors, comes with comprehension quizzes

Optimizing Statistical Machine Translation for Text Simpliﬁcation · Optimizing Statistical...

Documents

Optimizing Statistical Machine Translation for Text ...cs.jhu.edu/~napoles/res/tacl2016-optimizing.pdf · Optimizing Statistical Machine Translation for Text Simplication Wei Xu 1,

Optimizing your DITA content model for translation

Crowdsourcing Text Simpliﬁcation with Sentence Fusion€¦ · Crowdsourcing Text Simpliﬁcation with Sentence Fusion Max Schwarzer April 29, 2018 Submitted as part of the senior

Translation Validation for Optimizing Compilers · 2012. 11. 28. · Translation validation was ﬁrst introduced by Pnueli, Siegel, and Singer-man in [PSS98b]. They use the result

What can Statistical Machine Translation teach Neural Text ... · evaluating translation utility via semantic frames. Lo and Wu, 2011. • New! Optimizing towards neural semantic

Tutorial A Developer’s Survey of Polygonal Simpliﬁcation

Optimizing Statistical Machine Translation for Text …Optimizing Statistical Machine Translation for Text Simplication Wei Xu 1, Courtney Napoles 2, Ellie Pavlick 1, Quanze Chen 1

Fuzzy Mining – Adaptive Process Simpliﬁcation Based …wvdaalst/publications/p400.pdf · Fuzzy Mining – Adaptive Process Simpliﬁcation Based on Multi-Perspective Metrics Christian

Generalized View-Dependent Simpliﬁcation

Rational Expression Simpliﬁcation with Algebraic Side

Transformation of Formal Models...2012/02/16 · Examples normal forms +simplification of proofs +more efficient construction of parsers elimination of erasing rules +simplification

A comparison of mesh simpliﬁcation algorithmswebdocs.cs.ualberta.ca/.../Polygon_Simplification/7.pdf · 2002. 7. 9. · A comparison of mesh simpliﬁcation algorithms P. Cignoni,

autobooker.com & sunnycar - ITB Kongress · MACHINE TRANSLATION . MACHINE TRANSLATION . CULTURAL DIFFERENCES . OPTIMIZING ROI . CONTACT US – 2 PRESENTERS Niklas Schlappkohl Senior

Semantics, Analysis and Simpliﬁcation of DMN Decision Tablesdumas/pubs/DMNSimplify.pdf · Semantics, Analysis and Simpliﬁcation of DMN Decision Tables Diego Calvanesea, Marlon

An Optimizing Java Translation Framework for Automated

Problems in Current Text Simpliﬁcation Researchxwe/publications/EMNLP_2015_TACL_simplificat… · Problems in Current Text Simpliﬁcation Research TACL paper @ EMNLP Sep-20-2015

Complexity and simpliﬁcation in understanding recruitment in …science.whoi.edu/labs/pinedalab/PDFdocs/... · 2019. 7. 2. · Complexity and simpliﬁcation in understanding recruitment

Volume and Complexity Bounded Simpliﬁcation of Solid Model

Feature Extraction and Simpliﬁcation from colour images ...Feature Extraction and Simpliﬁcation from colour images ... prototype application for completely automated or semi-automated

Topology Simpliﬁcation for Polygonal Virtual Environments · Topology Simpliﬁcation for Polygonal Virtual Environments Jihad El-Sana Amitabh Varshney Department of Computer Science