Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an

Profile

The METIS Approach

Future Work

Evaluation

METIS II Architecture

METIS II, the continuation of the successful assessment project METIS I, is an IST Programme, with a 3-year duration (01/10/2004 – 30/09/2007).

The METIS II consortium comprises the following partners:

Institute for Language & Speech Processing [ILSP] (co-ordinator) Katholieke Universiteit Leuven [KUL] Gesellschaft zur Förderung der Angewandten Informationsforschung [GFAI] Universitat Pompeu Fabra [UPF]

METIS II is a hybridhybrid system, combining various approaches to machine translation (rule-based, statistical, pattern-matching techniques). It makes use of readily readily availableavailable resourcesresources, such as bilingual dictionaries or basic NLP tools, and it can be easily customised to handle different source (SL) and target language (TL) tags.

Most importantly, however, METIS II is innovativeinnovative because it does not need bilingual corpora for the translation process, but exclusivelyexclusively relies on monolingual TL monolingual TL corporacorpora.

METIS II handles sequences both at sentence and sub-sentential level, achieving thus to exploit the recursiverecursive property of natural language.

METIS II employs a series of weightsweights, i.e. system parameters, in various phases of the translation process. Weights are associated with system resources and employed by the pattern-matching algorithm; they can be automaticallyautomatically adjusted to customise system performance.

Four (4) language pairs have been developed as yet, namely GreekGreek, DutchDutch, GermanGerman & SpanishSpanish EnglishEnglish.

METIS IIMETIS II: : Statistical Machine Translation Statistical Machine Translation using Monolingual Corporausing Monolingual Corpora (FP6-IST-003768)

Database Database ServerServer

Lexicon

BNC Clauses

BNC Chunks

Token Generation

Rules Final TranslationFinal Translation

NLPNLP NLP tools handle the SL input yielding an SL sequence annotated with grammatical & syntactic information.

LexicoLexiconn

LookuLookupp

The SL sequence is enhanced by translation equivalents & PoS info, thus resembling a TL pattern.

Core Core EnginEnginee

The core engine of METIS II system is fed with a sequence of TL-like patterns, handled by the pattern-matching algorithm. It proceeds in 2 stages involving wider and narrower contexts, thus generating a TL sequence.

Web Web InterfaInterfacece

The end user selects the preferred SL and enters the text to be translated.

Token Token GenerationGeneration

The token generation module receives as input a sequence of translated lemmas & their respective tags; it is responsible for the production of tokens out of lemmas.

Weights

Evaluation Setup

For the system evaluation an experimental corpus extracted from real texts, mainly from newspapers, was used. It consisted of 200 200 sentences, 5050 per language pair. The test sentences were of relative complexity, containing one to two clauses each and covered various syntactic phenomena such as word-order variation, NP structure, negation, modification etc.

The reference translations have been restricted to 33 and were produced by humans, while BLEUBLEU & NISTNIST metrics have been used for the evaluation.

Evaluation Results

Greek Results

Dutch Results

Fig. 1: Comparative

analysis of the score

ranges obtained for

METIS II and

SYSTRAN using the

BLEU metric

NIST

50 4945

4135

30 2821

49 4845

3531

22

115

210

4

43

0

10

20

30

40

50

60

Score range

Num

ber

of Sen

tence

s METIS II

Systran

BLEU

3230 29

22

16 1512

10

31 3129

10 95 4

2 2

88

23

0

5

10

15

20

25

30

35

Score range

Num

ber

of Sen

tence

s METIS II

Systran

Fig. 2: Comparative

analysis of the score

ranges obtained for

METIS II and

SYSTRAN using the

NIST metric

Fig. 3: Comparative

analysis of the

scores obtained for

METIS II and

SYSTRAN using the

BLEU metricNIST RESULTS

0,0000

1,0000

2,0000

3,0000

4,0000

5,0000

6,0000

7,0000

8,0000

9,0000

10,0000

best NIST avg NIST

VERBATIM

BASE

TRANSFER

NO S LEVEL

SYSTRAN

Fig. 4: Comparative

analysis of the scores

obtained for METIS II

and SYSTRAN using

the NIST metric

BLEU RESULTS

0,0000

0,0500

0,1000

0,1500

0,2000

0,2500

0,3000

0,3500

0,4000

0,4500

0,5000

best BLEU avg BLEU

VERBATIM

BASE

TRANSFER

NO S LEVEL

SYSTRAN

German Results

Fig. 8: Comparative


obtained for different

settings of METIS II and

SYSTRAN using the

NIST metric

Spanish Results

Fig. 7: Comparative


obtained for different

settings of METIS II and

SYSTRAN using the

BLEU metric

Future work involves further investigation of METIS II system architecture. More specifically, work towards the system optimisation includes the following:

Further system testing with a big number of test suites that will have more elaborate structures and deal with a wider range of phenomena Algorithm optimisation in terms of accuracy Automatic fine tuning of weights Implementation of a post-editor module

0,43

0,44

0,45

0,46

0,47

0,48

0,49

Metis-No op.

Metis-Insertion

Metis-Deletion

Metis-Permutation

Metis-All ops.

Systran

6,25

6,3

6,35

6,4

6,45

6,5

6,55

6,6

6,65

NIST

Metis-No op.

Metis-Insertion

Metis-Deletion

Metis-Permutation

Metis-All ops.

Systran

METIS-II Systran

0.26

0.265

0.27

0.275

0.28

0.285

0.29

0.295

0.3

0.305

0.31

BLEU

METIS-II Systran

4.7394

4.7396

4.7398

4.74

4.7402

4.7404

4.7406

4.7408

4.741

4.7412

NIST

Fig. 5: Comparative



and SYSTRAN using the

BLEU metric

Fig. 6: Comparative



and SYSTRAN using the

NIST metric

Documents

Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an