61
TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities Belinda Maia, Diana Santos, Luís Sarmento & Anabela Barreiro - Linguateca

TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

  • Upload
    milos

  • View
    65

  • Download
    0

Embed Size (px)

DESCRIPTION

TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities. Belinda Maia, Diana Santos, Luís Sarmento & Anabela Barreiro - Linguateca. Why MT Matters. Social reasons Political reasons Commercial importance    Scientific and philosophical interest . - PowerPoint PPT Presentation

Citation preview

Page 1: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

TrAva – a tool for evaluating Machine Translation –

pedagogical and research possibilities

Belinda Maia, Diana Santos, Luís Sarmento & Anabela Barreiro -

Linguateca

Page 2: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Why MT Matters

• Social reasons • Political reasons • Commercial importance    • Scientific and philosophical interest

Page 3: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Useful bibliography• ARNOLD, D, BALKAN, L., LEE HUMPHREYS, R. Lee,

MEIJER, S., & SADLER, S. (1994) Machine Translation - An Introductory guide. Manchester & Oxford : NCC Blackwell. ISBN 1-85554-217-X - or at: http://www.essex.ac.uk/linguistics/clmt/MTbook/.  

• COLE, Ron (ed) 1996 "Survey of the State of the Art in Human Language Technology" Chapter 8 - Multilinguality, at the Center for Spoken Language Understanding, Oregon.  http://cslu.cse.ogi.edu/HLTsurvey/ch8node2.html#Chapter8

• MELBY, Alan K. 1995, The Possibility of Language: A discussion of the Nature of Language, with implications for Human and Machine Translation. Amsterdam: John Benjamins Pub. Co.

Page 4: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Machine Translation (MT) – a few dates

• 1947 Warren Weaver - his ideas led to heavy investment in MT 

• 1959 Bar-Hillel - considers FAHQMT - FULLY AUTOMATIC HIGH QUALITY MACHINE TRANSLATION philosophically impossible

• 1964 ALPAC Report - on limitations of MT > withdrawal of funds

• late 1970s the CEC purchase of SYSTRAN and beginning of EUROTRA project.

• Upward trend in the 1970s and 1980s • Today: MT technology - high-end versus low-end

systems • MT and the Internet

Page 5: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

From Arnold et al 1995

Page 6: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

From Arnold et al 1995

Page 7: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 8: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

MT architectures – Arnold et al• Direct architecture - simple grammatical rules +

a large lexical and phrasal database • Transfer architecture - more complex grammar

with an underlying approach of transformational-generative theory + considerable research into comparative linguistics in the two languages involved 

• Interlingua architecture - L1 > a 'neutral language' (real, artificial, logical, mathematical..) > L2

Page 9: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Major Methods, Techniques and Approaches today

• Statistical vs. Linguistic MT – assimilation tasks: lower quality, broad domains – statistical

techniques predominate– dissemination tasks: higher quality, limited domains – symbolic

techniques predominate– communication tasks: medium quality, medium domain – mixed

techniques predominate• Rule-based vs. Example-based MT • Transfer vs. Interlingual MT • Multi-Engine MT • Speech-to-Speech Translation MLIM - Multilingual Information Management: Current levels and Future Abilities - report (1999) Chapter 4 at:

http://www-2.cs.cmu.edu/~ref/mlim/chapter4.html

Page 10: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

MT – present & future uses

• ‘Gist’ translation• Ephemeral texts with tolerant users• Human aided MT

– Domain specific– Linear sentence structure– Pre-edited text or ‘controlled language’– Post-editing

• Improvement of MT – particularly for restricted domains and registers

Page 11: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

MT and the Human translator

• MT is less of a threat to the professional human translator than English

• MT can encourage people’s curiosity for texts in languages they do not understand > and lead to human translation

• MT can be a tool for the human translator• Professional translators can learn to work

with and train MT

Page 12: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

PoloCLUP’s experiment

• Background– Master’s seminar in Semantics and Syntax– Wish to raise students’ awareness of the

strengths and weaknesses of MT– Wish to develop their interest in MT as a tool– Need to improve their knowledge of

linguistics. – Availability of free MT online– Automation of process provided by computer

engineer

Page 13: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Phase 1 - METRA

• http://poloclup.linguateca.pt/ferramentas/metra/index.html

• Translation using 7 online MT programmes

• EN > PT• PT > EN• At present this tool is getting about 60 hits

per day!

Page 14: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 15: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 16: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 17: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

BOOMERANG

• http://poloclup.linguateca.pt/ferramentas/boomerang/index.html

• This tool submits a text for translation – and back-translation – and back-translation…. Until it reaches a fixed point

• This shows that the rules programmed for one language direction do not always correspond to the other language direction

Page 18: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 19: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 20: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

EVAL > TrAva

• Informal class experiment led to a useful research tool

• Several versions of EVAL– Different types of classification of input– Different explanations of errors of output

• Production, correction and re-correction of procedure interesting

Page 21: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

TrAva - procedure• Online EN > PT MT using 4 MT systems:

– Free Translation– Systran– E T Server– Amikai

• Researcher chooses area for analysis – e.g.– ambiguity – lexical and structural mismatches– Homographs and polysemous lexical items   – syntactic complexity – multiword units: idioms and collocations   – anaphora resolution

Page 22: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

TrAva - procedure

• Selection of ‘genuine’ examples from BNC, Reuter’s corpus, newspapers etc.

• Possible ‘pruning’ of unnecessary text (some systems accept limited text)

• No deliberate attempt to confuse the system

• BUT: avoidance of repetitive ‘test suites’

Page 23: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

TrAva - procedure

• Sentence submitted to TrAva• MT results• Researcher:

– Classifies part of sentence being examined in terms of the English lexicon or POS (BNC codes)

– Examines results– Explains errors in terms of Portuguese

grammar

Page 24: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 25: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 26: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 27: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 28: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Access to work done

• Researcher may access work done and review it

• Teacher / administrator can access student work and give advice

• FAQ

Page 29: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 30: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 31: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Present situation

• METRA and BOOMERANG are all free to use online at:

• http://poloclup.linguateca.pt/ferramentas • TraVa is free to use online at:• http://www.linguateca.pt/trava/ • The corpus CorTA – over 1000 sentences

+ 4 MT versions available for consultation at: http://www.linguateca.pt/

Page 32: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 33: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 34: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Conclusions

• It has been a successful experiment• It is useful pedagogically

– As linguistic analysis– As appreciation of MT

• It has interesting theoretical implications > emphasis on ‘real’ sentences and recognition of interconnection of lexicon + syntax + context

Page 35: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Conclusions

• Further work needs to be done on the classifications– E.g. the analysis of ‘error’ as ‘lexical choice’

needs to be able to combine with other possible reasons for error

• A lone researcher can use it to examine a restricted area

• BUT – a large team is needed to overhaul a system properly

Page 36: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Homographs and Polysemy

• Homographs = words with same spelling but different syntactic use

• Polysemy = words with same spelling, but different meaning according to use or context

• BUT – the difference is not as clear-cut as all that

• However – major problem for MT

Page 37: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 38: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 39: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 40: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 41: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 42: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 43: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 44: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 45: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 46: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Complex Noun Phrases

• DETerminante + ADJectivo + Nome• DETerminante + ADJectivo Composto +

ADJectivo + Nome• DETerminante + ADVérbio (em –ly) +

ADJectivo + ADJectivo + Nome

Page 47: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 48: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 49: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 50: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 51: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 52: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 53: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

Lexical Bundles

Page 54: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities

EXAMPLES

• Now let us look at some examples:– Homographs– Polysemy– Complex noun phrases– Lexical bundles

Page 55: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 56: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 57: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 58: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 59: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 60: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities
Page 61: TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities