Upload
agatha-townsend
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
1
Elliott Macklovitch
Université de Montréal, Canada
LREC 2006 – Genoa, Italy
TransType2 :
The Last Word
2
What is TransType?
• a novel kind of interactive MT, in which– the user and the system collaborate to draft a
target translation (vs. SL disambiguation)– system’s contributions are completions to the
prefix typed by the user (generated by SMT)– the user is in control of the translation process,
i.e. can always ignore system’s predictions– the system must adapt its predictions to each
new character entered by the user
3
What was TransType2? • an international research project (2002 - 2005),
involving:– 3 university research labs: RWTH (Germany), ITI
(Spain), RALI (Canada)– 2 industrial partners: XRCE (France) & Atos Origin
(Spain)– 2 translation firms, representing end-users: Société
Gamma (Canada) & Celer Soluciones (Spain)
• funded by EC’s FP5 in Europe; federal & Quebec governments in Canada
• applied research: ultimate aim was to provide a practical solution to growing need for HQ transl.
4
5
Target-text mediated IMT
• an intriguing idea…but will it work?– it should (in theory), because each accepted
completion reduces number of keystrokes
– but the user has to evaluate the proposed completions, and this takes time …
need for user trials, involving real translators– TT2 included quarterly trials at the two
translation firms, from month 18 until month 36
6
Two types of evaluation in TT2
• internal technical evaluations– employ automatic metrics, e.g. BLEU, WER
• usability evaluations (5 rounds)
– measure TT’s impact on users’ productivity– ease (or difficulty) with which end-users adapt
to the system– channel for feedback to developers
7
Protocol for in-situ user trials
• corpus: 1 million words of Xerox manuals– available in project’s four languages– partitioned into training, development, test– Xerox terminology glossary; PDF original
• 3 TRs at each agency; Eng. >> Fr. & Sp.– 10 consecutive half-day working sessions– 1st devoted to training, 2nd to ‘dry-run’– baseline comparison: translating within TT2
editor, but with prediction engine off
8
Protocol (cont’d)
• Quality assurance: all translations reviewed by a non-participating reviser (ER4)– principally, for errors of form– productivity gains not at the expense of quality
• use of TT-Player: – reads a detailed trace file that records and times
every interaction between user & system– can play back the session, like a VCR– generates detailed statistics
9
TT-Player (in replay mode)
10
C-TR1 C-TR2 C-TR3 G-TR4 G-TR5 G-TR6
ER3: dry-run (words/hour) 984 432 864 786
ER3: average on 3-4 texts (w/h) 918 774 882 576
ER4: dry-run (w/h) 781 1030 772 518 1081 825
ER4: average on 8 texts (w/h) 1017 1410 725 707 1531 1279
% increase in productivity +30.22 +36.89 -6.08 +36.48 +41.62 +55.03
ER5: dry-run1 (w/h) 924 858 654 864 1338
ER5: average on 8 texts (w/h) 1056 1104 736 1104 1062 912
% increase in productivity +14.29 +28.67 +12.54 +27.78 -20.63
ER5: dry-run2 (w/h) 1602 1416 816 1548 1350
% increase in productivity -34.08 -22.03 -9.80 -28.68 -21.33
ER5: average on 2 dry-runs (w/h) 1290 1137 735 1206 1344
% increase in productivity -18.1 -2.9 0.0 -8.4 -20.9
Productivity results
11
Results of ER 3 & 4
• ER3: three of four participants exceeded DR productivity on at least one text
• ER4: five of six TR’s exceeded their DR rate on 7/8 texts translated with completions– increases quite substantial, from 30-55%– concomitant reduction in effort: target text
produced with ½ no. keystrokes & mouse clicks
• revisers found no more errors in texts produced with TT2 than on DR texts gains not achieved at expense of quality!
12
Problems with ER4 protocol
• scheduling dry-run as 1st session in round – as trial progresses, a gradual improvement in TR
productivity can be observed (‘learning curve effect’)– dry-run first may unduly favour the system
• high degree of full-sentence overlap between test corpus and training corpus (41%)– no error or oversight in selecting test corpus;
rather, a characteristic of this kind of manual– nevertheless, we decided to reanalyze the trace
files, separating repeated from non-repeated sentences and calculating new statistics for each
13
Repetitions in ER4 test corpus
• general correlation between TR productivity and level of full-sentence repetition– counting only novel sentences, increase in the
average productivity of 6 TRs was ~20% over their dry-run productivity
– including repeated sentences, overall increase in productivity was about 32%
– the fact that TT can handle external repetitions correctly is definitely a plus
14
Protocol for ER5
• test corpus drawn from new Xerox manuals– of a type similar to those used for ER4– verified that test corpus contained no repeated
sentences wrt. training corpus
• 2nd dry-run session added at end of round– to counter the argument that a single dry-run in
the first session unduly favoured the system
15
ER5 Productivity results
C-TR1 C-TR2 C-TR3 G-TR4 G-TR5 G-TR6
ER3: dry-run (words/hour) 984 432 864 786
ER3: average on 3-4 texts (w/h) 918 774 882 576
ER4: dry-run (w/h) 781 1030 772 518 1081 825
ER4: average on 8 texts (w/h) 1017 1410 725 707 1531 1279
% increase in productivity +30.22 +36.89 -6.08 +36.48 +41.62 +55.03
ER5: dry-run1 (w/h) 924 858 654 864 1338
ER5: average on 8 texts (w/h) 1056 1104 736 1104 1062 912
% increase in productivity +14.29 +28.67 +12.54 +27.78 -20.63
ER5: dry-run2 (w/h) 1602 1416 816 1548 1350
% increase in productivity -34.08 -22.03 -9.80 -28.68 -21.33
ER5: average on 2 dry-runs (w/h) 1290 1137 735 1206 1344
% increase in productivity -18.1 -2.9 0.0 -8.4 -20.9
16
ER5 Productivity results
C-TR1 C-TR2 C-TR3 G-TR4 G-TR5 G-TR6
ER3: dry-run (words/hour) 984 432 864 786
ER3: average on 3-4 texts (w/h) 918 774 882 576
ER4: dry-run (w/h) 781 1030 772 518 1081 825
ER4: average on 8 texts (w/h) 1017 1410 725 707 1531 1279
% increase in productivity +30.22 +36.89 -6.08 +36.48 +41.62 +55.03
ER5: dry-run1 (w/h) 924 858 654 864 1338
ER5: average on 8 texts (w/h) 1056 1104 736 1104 1062 912
% increase in productivity +14.29 +28.67 +12.54 +27.78 -20.63
ER5: dry-run2 (w/h) 1602 1416 816 1548 1350
% increase in productivity -34.08 -22.03 -9.80 -28.68 -21.33
ER5: average on 2 dry-runs (w/h) 1290 1137 735 1206 1344
% increase in productivity -18.1 -2.9 0.0 -8.4 -20.9
17
Results of ER5 (cont’d.)
• ER5 productivity compared to 2 dry-runs :– average productivity of 4/5 participants > DR1– but productivity on DR2 very high– using TT’s predictions, only 1/5 participants
surpassed combined DR1+DR2 productivity
• text selected for DR2 particular in having:– very short average sentence length & highest
rate of internal repetition– significantly easier to translate than other
chapters
18
ER5 – Productivity per text
0
5
10
15
20
25
30
35
Word
s p
er m
inute C-TR1
C-TR2
C-TR3
G-TR4
G-TR5
GTR6
19
Non-quantitative trial results
• validated the general evaluation approach– for a CAT tool, production time remains the
best measure of the system’s assistance– in-situ trials that replicate normal working
conditions are indispensable– reliance on trace file for accurate measurements
and honest indication of users’ preferences
• Lessons for evaluation methodology– need to take ‘learning curve effect’ into account– need to assess difficulty of test texts
20
Users’ attitude to TT2
• concerted effort made to gather and analyse users’ comments & suggestions– pop-up notepad added to TT2 GUI
• users resented having to make the same modifications to repeated sentences– need to add full-S repetitions processing (TM)
• more generally: “Why can’t the system learn from my corrections?”– on-line adaptive learning represents a difficult
research challenge
21
Conclusions
• Target-text mediated IMT is a novel approach that has much to recommend it :– when engines perform well, users appreciate
the productivity gains it affords and full control of translation quality that it gives them
• Hopefully, TT2 will not be the last word– what needs to be done to improve the system’s
acceptance by professional TRs is quite clear– as demand for HQ translation soars, there
continues to be a real need for new tools to assist TRs and make them more productive
22
For more information on TransType:
• Visit our Web site (on-line demo):
http://rali.iro.umontreal.ca
• Contact me directly: