33
Stephan Vogel - Machine Transl ation 1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Embed Size (px)

Citation preview

Page 1: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 1

Machine Translation

Factored Models

Stephan VogelSpring Semester 2011

Page 2: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 2

Overview

Factored Language Models Multi-Stream Word Alignment Factored Translation Models

Page 3: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 3

Motivation

Vocabulary grows dramatically for morphology rich languages

Looking at surface word form does not take connections (morphological derivations) between words into account Example: ‘book’ and ‘books’ as unrelated as ‘book’ and ‘sky’

Dependencies within sentences between words are not well detected Example: number or gender agreement

Singular: der alte Tisch (the old table)Plural: die alten Tische (the old tables)

Consider word as a bundle of factors Surface word form, stem, root, prefix, suffix, POS, gender

marker, case marker, number marker, …

Page 4: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 4

Two solutions

Morphological decomposition into stream of morphemes Compound noun splitting Prefix-stem-suffix splitting

Words as bundle of (parallel) factorsword

lemma

POS

morphology

word class

prefix stem suffix prefix stem suffix …

w1 w2 w3 w4 ….

Page 5: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 5

Questions

Which information is the most useful

How to use this information? In the language model In the translation model

How to use it at training time How to use it at decoding time

Page 6: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 6

Factored Models

Morphological preprocessing A significant body of work

Factored language models Kirchhoff et al

Hierarchical lexicon Niessen at al

Bi-Stream alignment Zhao et al

Factored translation models Koehn et al

Page 7: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 7

Factored Language Model

Some papers:

Bilmes and Kirchhoff, 2003Factored Language Models and Generalized Parallel

Backoff

Duh and Kirchhoff, 2004Automatic learning of language model structure

Kirchhoff and Yang, 2005Improved Language Modeling for Statistical Machine

Translation

Page 8: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 8

Factored Language Model

Representation:

LM probability:

I

i

Ki

KKi

KI

KI fffpffpwwp

1

:11

:11

:1:1:111 ),...,|(),...,(),...,(

KK ffffw :121 ,...,,

Page 9: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 9

Language Model Backoff

Smoothing by backing off Backoff paths

in standard LM in factored LM

Page 10: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 10

Choosing Backoff Paths

Different possibitities Fixed path Choose path dynamically during training Choose multiple paths dynamically during training and

combine results (Generalized Parallel Backoff)

Many paths -> optimization problem Duh & Kirchhoff (2004) use genetic algorithm

Bilmes and Kirchhoff (2003) report LM perplexities Kirchhoff and Yang (2005) use FLM to rescore n-best

list generated by SMT system 3-gram FLM slightly worse then standard 4-gram LM Combined LM does not outperform standard 4-gram LM

Page 11: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 11

Hierarchical Lexicon

Morphological analysis Using GERCG, a constraint

grammar parser for German for lexical analysis and morphological and syntactic disambiguation

Build equivalence classes Group words which tend to

translate into same target word Don’t distinguish what does not

need to be distinguished! Eg. for nouns: gender is

irrelevant as is nominative, dative, and accusative; but genitive translates differently

Sonja Nießen and Hermann Ney, Toward hierarchical models for statistical machine translation of inflected languages. Proceedings of the workshop on data-driven methods in machine translation - Volume 14, 2001.

Page 12: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 12

Hierarchical Lexicon

Equivalence classes at different levels of abstraction Example: ankommen

n is full analysis n-1: drop “first person” -> group “ankomme”, “ankommst”,

“ankommt” n-2: drop singular/plural distinction …

Page 13: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 13

Hierarchical Lexicon

Translation probability

Probability for taking all factors up to i into account

Assumption: does not depend on e and word form follows unambiguously from tags

Linear combination of pi

it

iii etfpetpefp

0

),|()|()|( 00

),|( 0 etfpi

)|(...)|()|( 00 efpefpefp nn

Page 14: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 14

Multi-Stream Word Alignment

Use multiple annotations: stem, POS, … Consider each annotation as additional stream or tier Use generative alignment models

Model each stream But tie streams through alignment

Example: Bi-Stream HMM word alignment (Zhao et al 2005)

Page 15: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 15

Bi-Stream HMM Alignment

HMM: Relative word position as distortion component (can be

conditioned on word classes) Forward-backward algorithms for training

1jf jf

],[jaj ea

f

J

j

a

J

jjjaj

IJ aaPefPefP1 1

111 )|()|()|(

Page 16: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 16

Bi-Stream HMM Alignment

Bi-Stream HMM: Assume the hidden alignment generates 2 data stream:

words and word class labels

1jf jf

],[jaj ea

1jf

1jg jg 1jg

Stream 1:

Stream 2:

Stream 1 Stream 2

Page 17: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 17

Second Stream: Bilingual Word Clusters

Ideally, we want to use word classes with group translations of words in source language cluster into cluster on target side

Bilingual Word Clusters (Och, 1999) Assume monolingual clusters fixed first Optimize the clusters for the other language (mkcls in GIZA+

+)

Bilingual Word Spectral Clusters Eigen-structure analysis K-means or single linkage clustering.

Other Word Clusters LDA (Blei, 2000) Co-clusters, etc.

Page 18: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 18

Bi-Stream HMM with Word Clusters

Evaluating Word Alignment Accuracy: F-measure Bi-stream HMM (Bi-HMM) is better than HMM; Bilingual word-spectral clusters are better than traditional ones; Helps more for small training data.

41

42

43

44

45

46

47

48

49

50

51

52

HMM-fe Bi-HMM-mfe Bi-HMM-bfe

40

42

44

46

48

50

52

54

HMM-ef Bi-HMM-mef Bi-HMM-bef

45

4647

48

4950

51

5253

54

HMM-fe Bi-HMM-mfe Bi-HMM-bfe44

46

48

50

52

54

56

58

60

HMM-ef Bi-HMM-mef Bi-HMM-bef

TreeBank, F2E TreeBank, E2F

FBIS, F2E FBIS, E2F

F-M

easu

reF

-Mea

sure

HMM Trad. with-Spec HMM Trad. with-Spec

Page 19: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 19

Factored Translation Models

Paper:Koehn and Hoang, Factored Translation Models, EMNLP 2007

Factored translation model as extension of phrase-based SMT Interesting for translating into or between morphology rich

languages Experiments for English-German, English Spanish, English-

Czech

(I follow that paper. Description on Moses web site is nearly identical.

See http://www.statmt.org/moses/?n=Moses.FactoredModels

Example also from: http://www.inf.ed.ac.uk/teaching/courses/mt/lectures/factored-models.pdf)

Page 20: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 20

Factored Model

Analysis as preprocessing Need to specify the transfer Need to specify the generation

word

lemma

POS

morphology

word class

word

lemma

POS

morphology

word class

… …

Input Output

word

lemma

POS

morphology

word class

word

lemma

POS

morphology

word class

… …

Input Output

Factored Representation Factored Model: transfer and generation

Page 21: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 21

Transfer

Mapping individual factors: As we do with non-factored models Example: Haus -> house, home, building, shell

Mapping combinations of factors: New vocabulary as Cartesian product of the vocabularies of

the individual factors, e.g. NN and singular -> NN|singular Map these combinations Example: NN|plural|nominative -> NN|plural, NN|singular

Number of factors on source and target side can differ

Page 22: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 22

Generation

Generate surface form from factors Examples:

house|NN|plural -> houseshouse|NN|singular -> househouse|VB|present|3rd-person -> houses

Page 23: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 23

Example including all Steps

German word Häuser Analysis:

häuser|haus|NN|plural|nominative|neutral

Translation Mapping lemma:

{ ?|house|?|?|?, ?|home|?|?|?, ?|building|?|?|? } Mapping morphology:

{ ?|house|NN|plural, ?|house|NN|singular, ?|home|NN|plural, ?|building|NN||plural }

Generation Generating surface forms:

{houses|house|NN|plural, house|house|NN|singular, homes|home|NN|plural, buildings|building|NN||plural }

Page 24: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 24

Training the Model

Parallel data needs to be annotated -> preprocessing Source and target side annotation typically independent of

each other Some work on ‘coupled’ annotation, e.g. inducing word classes

through clustering with mkcls, or morphological analysis of Arabic conditioned on English side (Linh)

Word alignment Operate on surface form only Use multi-stream alignment (example: BiStream HMM) Use discriminative alignment (example: CRF approach) Estimate translation probabilities: collect counts for factors or

combination of factors

Phrase alignment Extract from word alignment using standard heuristics Estimate various scoring functions

Page 25: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 25

Training the Model

Word alignment (symmetrized)

Page 26: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 26

Training the Model

Extract phrase: natürlich hat john # naturally john has

Page 27: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 27

Training the Model

Extract phrase for other factors: ADV V NNP # ADV NNP V

Page 28: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 28

Training the Generation Steps

Train on target side of corpus Can use additional monolingual data Map factor(s) to factor(s), e.g. word->POS and POS-

>word

Example: The/DET big/ADJ tree/NN Count collection:

count( the, DET )++count( big, ADJ )++count( tree, NN )++

Probability distributions (maximum likelihood estimates)p( the | DET ) and p( DET | the )p( big | ADJ ) and p( ADJ | big )p( tree | NN ) and p( NN | tree )

Page 29: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 29

Combination of Components

Log-linear components of feature functions

Sentence translation generated from a set of phrase pairs

Translation component: Feature functions h defined over phrase pairs

Generation component: Feature function hdefined over output words

n

iiihZ

p1

)f,e(exp1

)f|e(

j

jj f,eh )()f|e(

)( jj f,e

k

keh )()e(

Page 30: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 30

Decoding with Factored Models

Instead of just phrase table, now multiple tables Important: all mappings operate on same segmentation

of source sentence into phrases More target translations are now possible

Example: … beat … can be verb or nounTranslations: beat # schlag (NN or VB) , schlagen (VB), Rhythmus (NN)

… beat …

schlag

schlagen

Rhythmus

… beat …

schlag|NN|Nom

schlag|VB|1-person|singular

schlag|NN|Dat

schlag|NN|Akk

Not-factored Factored

Page 31: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 31

Decoding with Factored Models

Combinatorial explosion -> harsher pruning needed

Notice: translation step features and generation step features depend only on phrase pair Alternative translations can be generated and inserted into the

translation lattice before best-path search begins(building fully expanded phrase table?)

Features can be calculate and used for translation model pruning (observation pruning)

Pruning in Moses decoder Non-factored model: default is 20 alternatives Factored model: default is 50 alternative Increase in decoding time: factor 2-3

Page 32: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 32

Factored LMs in Moses

The training script allows to specify multiple LMs on different factors, with individual orders (history length)

Example:--lm 0:3:factored-corpus/surface.lm // surface form 3-gram LM--lm 2:3:factored-corpus/pos.lm // POS 3-gram LM

This generates different LMs on the different factors, not a factored LM Different LMs are used as independent features in decoder No backing-off between different factors

Page 33: Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011

Stephan Vogel - Machine Translation 33

Summary

Factored models to Deal with large vocabulary in morphology rich LMs ‘Connect’ words, thereby getting better model estimates Explicitly model morphological dependencies within sentences

Factored models are not always called factored models Hierarchical model (lexicon) Multi-stream model (alignment)

Factored LMs introduced for ASR Many backoff paths

Moses decoder Allows factored TMs and factored LMs But no backing-off between factors, only log-linear combination