March 2005 Intro to MT IV 1
Postgraduate Diploma in Translation
Introduction to Machine Translation IV
The Translator’s Workstation
March 2005 Intro to MT IV 2
Recap: MT Methods
MT
Direct MT Rule-Based MT Data-Driven MT
Transfer Interlingua EBMT SMT
March 2005 Intro to MT IV 3
Different Styles of MT
FAMT: fully automatic machine translation FAHQMT FALQMT
MAHT: machine aided human translation HAMT: human aided machine translation
March 2005 Intro to MT IV 4
The Proper Place ofMen and Machinesin Language Translation Martin Kay, 1980 [1997] Machine translation is an excellent research vehicle
but stands no chance of filling actual needs for translators.
Answer is to develop cooperative man-machine systems
Start with word processing and add translation specific enhancements to approach the goal of automatic tranlation.
Be modest: be humble.
March 2005 Intro to MT IV 5
The Translator’s WorkstationOrigins & Development Main idea of TW attributed to Martin Kay (author of
“Proper Place of Men and Machines in Machine Translation”, (1980)
Basic ingredients include Glossaries Multilingual termbanks Translation Memories (TM)
Built on word processing environment Progressive automation of dictionary lookup and
access to TM
March 2005 Intro to MT IV 6
Standard Word Processing Environment includes Spell Check Grammar Check Thesaurus Word Counting Archiving and retrieval of documents
March 2005 Intro to MT IV 7
Translation-Oriented Editing
Basic Idea: add a certain level of linguistic awareness to editing functions.
Translation-oriented word substitution e.g. replace “purchase” with “buy” system:
purchasing → buying purchased → bought
e.g. replace “brume” with “brouillard” system:
brouilard épais → brume épaisse
March 2005 Intro to MT IV 9
Mark Up Languages
Markup is anything added to the content of the document that describes the text.
Formatting instructions: typeface, fonts, paragraphs, bulletted lists.
HTML More abstract levels of content description. XML
March 2005 Intro to MT IV 10
TMX
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of Translation Memory
The purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation vendors
http://www.lisa.org/tmx/specification.html
March 2005 Intro to MT IV 11
Access to Lexical Resources
Online Dictionaries On screen version of traditional printed dictionary Exploitation of hypertext links Editing facilities cf. French Assistant system from
Lernhout and Hauspie Term banks
Gazetteers Encyclopaedic knowledge World Wide Web
March 2005 Intro to MT IV 15
Commercially Available Systems Typically designed for non-linguists ... as an extension of a familiar word
processing environment
March 2005 Intro to MT IV 16
A Typical MAHT
Separate windows for source and target text Source text initially shown in target window,
to be overwritten by translation User highlights a portion of text to be
machine translated. Draft translation is then pasted in, ready for
post-editing. User decides what will be translated by
machine, and can develop a modus operandi.
March 2005 Intro to MT IV 18
Interactive Translation
Most systems allow user a choice of interactive translation in which systems stops and asks translator to make choices.
Can be annoying. Machine may keep asking the same question.
Difficult to resolve this problem in general case.
March 2005 Intro to MT IV 20
Translation Memory
First proposed in 1970s, but not generally available until 1990s.
Database of previous translations Sentence by sentence translation If exact match for new sentence is found, it is
pasted in. If not, TM may highlight those parts of the
new sentence which differ from the stored one.
March 2005 Intro to MT IV 22
Translation Memory
Keys to success are Efficient storage of sentences Efficient matching scheme Most current commercial systems are based
on character string similarity
March 2005 Intro to MT IV 23
Similarity between sentences
1. When the paper tray is empty, remove it and refill it with paper of the appropriate size
2. When the tray is empty, remove it and fill it with the appropriate paper.
3. When the bulb remains unlit, remove it and replace with a new bulb
4. You have to remove the paper tray in order to refill it when it is empty.
March 2005 Intro to MT IV 24
Other Corpus Based Resources Concordance: is a list of words (called keywords, e.g. here ‘sin’), taken from a corpus displayed in the centre of the page and shown in contexts in which they occur Monolingual Bilingual
Other Corpus tools Word sense profilers - WASPS
March 2005 Intro to MT IV 25
Monolingual Concordance Example1 hed it off. * * * ‘What a curious feeling!’ said Alice; ‘I must b 1 against herself, for this curious child was very fond of pretendi2 ‘Curiouser and curiouser!’ cried Alice ( 2 ‘Curiouser and curiouser!’ cried Alice (she was so muc2 Eaglet, and several other curious creatures. Alice led the way, 4 -- and yet – it’s rather curious, you know, this sort of life! 6 eir heads. She felt very curious to know what it was all about,6 out a cat! It’s the most curious thing I ever saw in my life!’ S7 ht into it. ‘That's very curious!’ she thought. ‘But everything’ 7 hought. ‘But everything's curious today. I think I may as well g 8 Alice thought this a very curious thing, and she went nearer to w 8 she had never seen such a curious croquet-ground in her life; it 8 seen, when she noticed a curious appearance in the air: it puzz 9 next, and so on.’ ‘What a curious plan!’ exclaimed Alice. ‘That’s 10 : ‘and I do so like that curious song about the whiting!’ ‘Oh, 10 th, and said ‘That’s very curious.’ ‘It's all about as curious a 10 ous.’ ‘It’s all about as curious as it can be,’ said the Gryphon 11 moment Alice felt a very curious sensation, which puzzled her a 11 er the list, feeling very curious to see what the next witness wo 12 ad!’ ‘Oh, I’ve had such a curious dream!’ said Alice, and she tol 12 her, and said, ‘It was a curious dream, dear, certainly: but no
March 2005 Intro to MT IV 27
Bilingual Concordance
Original text Translation
1. Ainsi, quand il aperçut POUR la première fois mon avion [...]
1. The first time he saw my aeroplane, for instance [...]
2. Alors elle avait forcé sa toux POUR lui infliger quand même des remords.
2. Then she forced her cough a little more SO THAT he should suffer from remorse just the same.
3. -Approche-toi que je te voie mieux, lui dit le roi qui était tout fier d’être enfin roi POUR quelqu’un.
3. “Approach, so that I may see you better,” said the king, who felt consumingly proud of being at last a king OVER somebody.
4. Car, POUR les vaniteux, les autres hommes sont des admirateurs.
4. For, TO conceited men, all other men are admirers.
5. C’est comme POUR la fleur. “ 5. It is just as it is WITH the flower.
March 2005 Intro to MT IV 28
WASPS
A Semi-Automatic Lexicographer's Workbench for Writing Word Sense ProfileS
Adam Kilgarriff, David Tugwell et. al, ESRC 1999-2002
Remit was to explore the synergy between the lexicographer's task of identifying and describing word senses, and the computational task of word sense disambiguation (WSD).