41
“The More, the Better” in Human and Machine Translation what do we have in common? Ekaterina Lapshinova-Koltunski Germersheim January 29th, 2015 January 29th, 2015 “The More, the Better” 1

“The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

“The More, the Better”in Human and Machine Translation

what do we have in common?

Ekaterina Lapshinova-Koltunski

GermersheimJanuary 29th, 2015

January 29th, 2015 “The More, the Better” 1

Page 2: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Overview

1 Aims and Motivation

2 Related Work

3 Methods and DataDataFeature Selection

4 Analysis and Interpretations

January 29th, 2015 “The More, the Better” 2

Page 3: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Aims and Motivation

January 29th, 2015 “The More, the Better” 3

Page 4: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Translation Phenomenon

January 29th, 2015 “The More, the Better” 4

Page 5: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Translation Phenomenon

January 29th, 2015 “The More, the Better” 5

Page 6: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Translation Phenomenon: Dimensions

January 29th, 2015 “The More, the Better” 6

Page 7: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Translation Phenomenon: Dimensions

⇒ “translation varieties” distinguishednot only according to

1 languages (both source and target)2 registers (text types they belong to)

but also:1 translation methods (human vs. machine)

used in translation process,2 experience (knowledge and data) involved

Assumption: experience relate HT and MT andprovide us with new perspectives on their research.

January 29th, 2015 “The More, the Better” 7

Page 8: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Assumption: Similarities

January 29th, 2015 “The More, the Better” 8

Page 9: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Assumption: Similarities

HUMAN TRANSLATION MACHINE TRANSLATION

amount of human translatorexperience

amount of data used in SMT

similar outcomes in both translation varieties⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓ ⇓

professional vs. novice trans-lators

systems based on small vs.large training data

learn in a similar way, collecting experience, data or knowledgeHowever: they both differ in the type of the experience acqui-red, which is reflected in their linguistic features

January 29th, 2015 “The More, the Better” 9

Page 10: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Aims and Motivation

Assumption: Similarities

January 29th, 2015 “The More, the Better” 10

Page 11: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Related Work

Related Work

January 29th, 2015 “The More, the Better” 11

Page 12: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Related Work

Human and Machine Translation

studies addressing both human and machine translations:[White, 1994], [Papineni et al., 2002], [Babych et al., 2004],[Popovic and Burchardt, 2011], [Popovic and Ney, 2011]all focus solely on translation error analysis, using humantranslation as a referencestudies operating with linguistically-motivated categories:[Popovic and Burchardt, 2011], [Popovic and Ney, 2011] or[Fishel et al., 2012]However: none of them provides a comprehensive analysis ofspecific linguistically motivated features of different translationmethods

January 29th, 2015 “The More, the Better” 12

Page 13: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Related Work

Human and Machine Translation

works on differentiation between human and machine translation:(1) [Volansky et al., 2011] and (2) [El-Haj et al., 2014]:

(1) analysis of human and machine translations, and comparablenon-translated textsa range of features based on the theory of translationese, see[Gellerstam, 1986]claim that the features specific for human translations can be usedto identify MTcoinciding and diversifying features

(2) compare translation style and consistency in human and machinetranslations of Camus’ novel “The Stranger” (French-English andFrench-Arabic)measure: readability as a proxy for styleevaluative and not descriptive character

January 29th, 2015 “The More, the Better” 13

Page 14: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Related Work

Experience: SMT

the relation between large vs. small data plays an important roleGains from using large corpora for translation/language models:

[Estrella et al., 2007][Bertoldi and Federico, 2009][Koehn and Haddow, 2012]

January 29th, 2015 “The More, the Better” 14

Page 15: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Related Work

Experience: HT

relation between experienced vs. inexperienced translators:[Jääskeläinen, 1997], [Jakobsen, 2002],[Martinez Melis and Hurtado Albir, 2001],[Englund-Dimitrova, 2005], [Muñoz Martín and Conde, 2007],[Carl and Buch-Kromann, 2010]

[Jääskeläinen, 1997]: translational behaviour of professionals andnon-professionals who perform translation from English into Finnish[Carl and Buch-Kromann, 2010]: translation processes for studentand professional translators, relating properties of the translators’eye movements and keystrokes to the quality of the producedtranslations⇒ process rather than the product(no description of linguistic features involved)[Englund-Dimitrova, 2005]: a combined process and productanalysis⇒ only cohesive explicitness

January 29th, 2015 “The More, the Better” 15

Page 16: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Related Work

Experience Combined

some parallelism in the observations for both translation varieties[Göpferich & Jääskeläinen, 2009]:

- with increasing translation competence, translators focus on largertranslation units[Koehn, 2010]:

- in (phrase-based) MT, large training data sets make it possible tolearn longer phrases

experienced translators take into account more aspects, e.g. textfunction, context, registersystems trained with larger data set can also capture morecontextual information

January 29th, 2015 “The More, the Better” 16

Page 17: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data

Methods and Data

January 29th, 2015 “The More, the Better” 17

Page 18: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data

Questions and Methodology

compare relations between novice translations and statisticalmachine translations with small databetween professional translations and statistical machinetranslations produced with much datatrace the patterns specific for both human and machinetranslation:⇒ define similarities and differences between the varieties

January 29th, 2015 “The More, the Better” 18

Page 19: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data

Corpus-Based Approach

REQUIREMENT CHOICE

lexico-grammatical corpus-based approachfeaturescorpus contain human and machine transla-

tionsetting 1 two variants of HTsetting 2 two SMT systemsfeatures frequencies of lexico-grammatical

patternsfeature distributions across translation varieties

January 29th, 2015 “The More, the Better” 19

Page 20: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data

Data

January 29th, 2015 “The More, the Better” 20

Page 21: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Data

Corpus

VARTRA cf. Lapshinova (2013)

contains:

variants of translation from English into German= translation varieties produced by:(1) human professional translators (PT1)(2) human inexperienced translators (PT2)(3) a rule-based MT system (RBMT)(4) 2 statistical MT systems (SMT1 and SMT2)

TOTAL number of tokens in translations ca. 600,000

January 29th, 2015 “The More, the Better” 21

Page 22: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Data

Corpus

PT1 – CroCo, [Hansen-Schirra et al., 2012]PT2 – trained translators (over BA) with no/little experienceRBMT – SYSTRANSMT1 – Google Translate (big undefined data)SMT2 – Moses system (small known data)

Each translationcovers 7 registers:

political essays – ESSAYfictional texts– FICTIONinstruction manuals– INSTRpopular-scientific articles– POPSCIletters of share-holders– SHAREprepared political speeches– SPEECHtouristic leaflets – TOU

January 29th, 2015 “The More, the Better” 22

Page 23: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Features: Patterns

January 29th, 2015 “The More, the Better” 23

Page 24: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Feature Requirements

reflect linguistic characteristics of all texts under analysis

content-independent (do not contain terminology or keywords)

easy to interpret

January 29th, 2015 “The More, the Better” 24

Page 25: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Feature Requirements

reflect linguistic characteristics of all texts under analysis

content-independent (do not contain terminology or keywords)

easy to interpret

January 29th, 2015 “The More, the Better” 24

Page 26: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Feature Requirements

reflect linguistic characteristics of all texts under analysis

content-independent (do not contain terminology or keywords)

easy to interpret

January 29th, 2015 “The More, the Better” 24

Page 27: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Nature of Features

Data-driven

Theory-driven

January 29th, 2015 “The More, the Better” 25

Page 28: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Nature of Features

Data-driven Theory-driven

January 29th, 2015 “The More, the Better” 25

Page 29: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Our Features

1 data-driven:- joint work with Marcos Zampieri (DFKI, Saarbrücken)

2 theory-driven:- in [Lapshinova, forthcoming],- in joint work with Mihaela Vela (UdS)

January 29th, 2015 “The More, the Better” 26

Page 30: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Methods and Data Feature Selection

Theory-Driven Features

patterns register translationese1 content vs. grammatical words mode simplification2 nominal vs. verbal word classes and

phrasesfield normalisation / shining

through3 ung-nominalisation field normalisation / shining

through4 nominal vs. pronominal and demon-

strative vs. personalmode explicitation, normalisati-

on / shining through5 abstract or general nouns vs. all other

nounsfiels explicitation

6 logico-semantic relations: additive,adversative, causal, temporal, modal

mode explicitation

7 modal meanings: obligation, permis-sion, volition

tenor normalisation / shiningthrough

8 evaluative patterns tenor normalisation / shiningthrough

January 29th, 2015 “The More, the Better” 27

Page 31: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Analysis and Interpretation

January 29th, 2015 “The More, the Better” 28

Page 32: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Hierarchical Cluster Analysis (HCA)

� discover differences and similarities between subcorpora� discover ‘interesting structures’� group according to different dimensions, i.e. variety and register

cf. Baayen (2008); Everitt et al. (2001)set of dissimilarities for the n objects is usedeach subcorpus is assigned to its own clusteriteratively, two most similar clustersstatistics behind hierarchical cluster analysis quantifies how farapart (or similar) two (sets of) subcorpora areon the top node, all clusters are joined together

⇒ subcorpora similar to each other have a common node on the tree

January 29th, 2015 “The More, the Better” 29

Page 33: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Hierarchical Cluster Analysis (HCA)

January 29th, 2015 “The More, the Better” 30

Page 34: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Statistical test: Correspondance Analysis

Input: frequencies of featuresOutput: a two dimensional graph with:

arrows for the observed feature frequenciespoints for translation varieties

Interpretation:the length of the arrows indicates how pronounced a particularfeature isthe position of the points in relation to the arrows indicates therelative importance of a feature for a register.the arrows pointing in the direction of an axis indicate a highcontribution to the respective dimension

cf. [Baayen, 2008], [Glynn, 2012]

January 29th, 2015 “The More, the Better” 31

Page 35: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Correspondence Analysis

January 29th, 2015 “The More, the Better” 32

Page 36: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Summary and Discussion

parallelism in HT and MT in terms of experinceinfluence on quality?realise that both differ in the type of experience (or data) acquired⇒ need to be modelled differentlymore datamore featuresother methods?

January 29th, 2015 “The More, the Better” 33

Page 37: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Analysis and Interpretations

Summary and Discussion

January 29th, 2015 “The More, the Better” 34

Page 38: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Thank you!Questions? Comments? Suggestions?

[email protected]

January 29th, 2015 “The More, the Better” 35

Page 39: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Babych, B., Hartley, A., and Sharoff, S. (2004).Modelling legitimate translation variation for automatic evaluation of mt quality.In Proceedings of LREC-2004, volume Vol. 3.

Bertoldi, N. and Federico, M. (2009).Domain adaptation for statistical machine translation with monolingual resources.In Proceedings of the Fourth Workshop on Statistical Machine Translation, StatMT ’09, pages 182–189, Stroudsburg, PA,USA. Association for Computational Linguistics.

Carl, M. and Buch-Kromann, M. (2010).Correlating translation product and translation process data of professional and student translators.In Annual Conference of the European Association for Machine Translation, Saint-Raphaël, France.

El-Haj, M., Rayson, P., and Hall, D. (2014).Language independent evaluation of translation style and consistency: Comparing human and machine translations ofcamus’ novel “the stranger”.

Englund-Dimitrova, B. (2005).Expertise and Explicitation in the Translation Process.John Benjamins, Amsterdam/Philadelphia.

Estrella, P. S., Hamon, O., and Popescu-Belis, A. (2007).How Much Data is Needed for Reliable MT Evaluation? Using Bootstrapping to Study Human and Automatic Metrics.In Proceedings of Machine Translation Summit XI, pages 167–174.ID: unige:3461.

Fishel, M., Sennrich, R., Popovic, M., and Bojar, O. (2012).Terrorcat: a translation error categorization-based mt quality metric.In 7th Workshop on Statistical Machine Translation.

Gellerstam, M. (1986).Translationese in Swedish novels translated from English.In Wollin, L. and Lindquist, H., editors, Translation Studies in Scandinavia, pages 88–95. CWK Gleerup, Lund.

Page 40: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

Hansen-Schirra, S., Neumann, S., and Steiner, E. (2012).Cross-linguistic Corpora for the Study of Translations. Insights from the Language Pair English-German.de Gruyter, Berlin, New York.

Jääskeläinen, R. (1997).Tapping the Process: An Explorative Study of the Cognitive and Affective Factors Involved in Translating. Doctoraldissertation.PhD thesis, University of Joensuu, Joensuu.

Jakobsen, A. L. (2002).Translation drafting by professional translators and by translation students.In Hansen, G., editor, Empirical Translation Studies. Process and Product, volume 27 of Copenhagen Studies inLanguage, pages 191–204. Samfundslitteratur, Frederiksberg.

Koehn, P. (2010).Statistical Machine Translation.Cambridge University Press, New York, NY, USA, 1st edition.

Koehn, P. and Haddow, B. (2012).Towards effective use of training data in statistical machine translation.In Proceedings of the Seventh Workshop on Statistical Machine Translation, WMT ’12, pages 317–321, Stroudsburg, PA,USA. Association for Computational Linguistics.

Martinez Melis, N. and Hurtado Albir, A. (2001).Assessment in translation studies: Research needs.journal des traducteurs / Translators’ Journal, 46(2):272–287.

Muñoz Martín, R. and Conde, T. (2007).Effects of serial translation evaluation.In Schmitt, P. A. and Jüngst, H. E., editors, Translationsqualität, pages 428–444. Peter Lang, Frankfurt.

Papineni, K., Roukus, S., Ward, T., and Zhu, W.-J. (2002).BLEU: a method for automatic evaluation of machine translation.

Page 41: “The More, the Better” in Human and Machine Translation › files › 2014 › 08 › ... · 2015-03-02 · translation as a reference studies operating withlinguistically-motivatedcategories:

In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318.

Popovic, M. and Burchardt, A. (2011).From human to automatic error classification for machine translation output.In 15th International Conference of the European Association for Machine Translation (EAMT-2011), Leuven, Belgium.European Association for Machine Translation.

Popovic, M. and Ney, H. (2011).Towards automatic error analysis of machine translation output.Computational Linguistics, 37(4):657–688.

Volansky, V., Ordan, N., and Wintner, S. (2011).More human or more translated? original texts vs. human and machine translations.In Proceedings of the 11th Bar-Ilan Symposium on the Foundations of AI With ISCOL (Israeli Seminar on ComputationalLinguistics).

White, J. S. (1994).The ARPA MT evaluation methodologies: Evolution, lessons, and further approaches.In Proceedings of the 1994 Conference of the Association for Machine Translation in the Americas, pages 193–205.

January 29th, 2015 “The More, the Better” 35