Phrase-based(the latter half)

Graph Structure

・ Use search graph in phrase-based model ・ At weighted acyclic directed graph G < Ф,V,E,s,g,> Ф : phrase pair sets =feature vector h( ・ ) ・ weight V: vertex partial hypotheses E:edges weight of route E ⊆ V×V× Ф×A A: weight sets

Graph Structure

• out()= edge sets which go out from vertex • in() = : edge sets which head to vertex ->Phrase pairs are linked by <out(), in()>At figure 5.8, phrase pair <へ行った , I went to> is linked by out() = <-----,0,<s>> and in()=<-- ・・・ ,9,went to>

𝑣

𝑣

Graph Structure

• If Ѱ=(, ,…, ): rout from start to any vertexs, head()=tail(), then

Source language phrase sets: Target language phrase sets: Route weight: =

Graph Structure

• In Fig.5.8, for the route

-> the parallel of word sets of source language 「行った」「へ」「領事館」 is “He went to the consulate”

Start

<行った ,He went>

<へ ,to><領事館 ,

the consulate>

Semiring

• set R equipped with two binary operations addition“ + ” and multiplication “ × ”

• Associative: a+(b+c)=(a+b)+c, a×(b×c)=(a×b)×c• Commutative: a+b=b+a• Distributional: a×(b+c)=(a×b)+(a×c)• Additive inverse, multiplicative inverse 0+a=a+0=a; 1×a=a×1=a; 0×a=a×0=0 are not defined

Semiring

• In Table 5.1, tropical semiring is used to solve maximization problem for route weight in decoder

A ⊕ ⊗

Tropical max + ー 0

Semiring

• In weight directed graph G, for a rout from starting point to ending point of source language input f is Ѱ=

• Score of Ѱ = product of partial route = -> Problem which maximize this score is max⊗()= ⊕⊗()

A ⊕ ⊗Tropical max + ー 0

Semiring

• In Fig.5.7,line 11 Q(+1,)max additive operation ⊕ is implemented for each vertex tail(e)=s of G• As semiring sastifies distributional feature-> weight of any vertexs V is ⊕⊗()=⊗

Semiring

• Forward-backward algorithm for finding maximum of route weight in graph structure

• topological order(G): list of vertexs of graph G which arranged in topological order

• external variable

Semiring

FORWARD(G)• topological order(G), ein()⊗

⊕ Start

tail(e)(e)

(e) ⊗

Semiring

BACKWARD(G)• inversetopological order(G), e()⊗

⊕ Goal

(e)

(e) ⊗

head(e)

Semiring

In problem which choose the optimum translation from search space expressed by weighted directed graph G Tropical semiring + Forward algorithm->Viterbi semiring

k-best

• Besides forward-backward algorithm, k-best algorithm is used to optimize route weight

• Dijkstra’s algorithm: for single source shortest path problem

• Eppstein’s algorithm: for heaping multiple paths efficiently

k-best

• Assume problem satisfies Tropical semiring and backward algorithm• Calculate and choose max (weight )• Fig.5.10 algorithm ・ cand: priority queue ・ < , s>: partial route ・ < ,>: partial route whose vertex and edgeout() ・ D: set of < ,>

k-best

• k=1: Initialized cand

• Optimize weight of partial route and whole route

Whole route

D

cand

optimal

get out < , s>,register D Choose and out() insert to cand

heap ( ・ ) to get optimal

k time

Limitation of Search Space

• If search space is big->any sort can be forgiven->calculation amount of decode algorithm become massive->limitation is necessary: ・ Distortion limit, constraint ・ Reordering limit, constraint

Distortion Constraint

• Upper limit setting d for distance between phrase pair d The purpose is making model score small if model distorted lead to penalty become bigFor language pair which do not have big sort, distortion constraint reach good efficiencyIf d=0: no skip, translate from left to right smoothly->monotone translation

Distortion Constraint• Constraint for case when have partial phrases do not reach the ending point : position of the first phrase of source language : the first position of translated phraseIf (), add d・ IBM Constraint

�̈� 𝑠𝑡𝑎𝑟𝑡𝑘 𝑒𝑛𝑑𝑘・・・

phrase

No need to exam

Beam Search

・ Prune disused partial hypothesis and pay attention only partial hypothesis with high score for computational reduction・ Group of vertexs of search graph and prune partial hypothesis which has low score

Beam Search・ Group of vertexs of search graph and prune partial hypothesis which has low score

Partial hypothesis pruned Partial hypothesis chose

Beam Search

Some kinds of grouping: - Cover vector grouping - Radix grouping - Beam width pruning - Histogram pruning

Heuristic Function

• Prevent partial hypothesis which has not been translated yet from pruning• Give predicted score for the rout and learn by A* search so that rout score get the maximum• ->can reduce search error

Pre-reordering Method

Translation between languages which has significantly different grammatical structure• Pre-reordering rule• Pre-reordering model• Pre-reordering learning

Pre-reordering Rule

• Based on tree from syntactic analysis, reorder to target language word order• Head-driven phrase structure grammar(HPSG)’s rule: - Syntactic anlysis - Move the subjects back

Pre-reordering Model

• Source languages must have syntactic analysis tool and morphological analysis tool• Bilingual data are necessary• Probability value of pre-reordering patterns obtained will be estimated by maximum-likelihood estimation(MLE)• Choose the suitable pre-reordering patterns based on reordering part of speech from morphological analysis, or clustering word class

Pre-reordering Learning

• For language pairs without any syntactic analysis tools and morphological analysis tools• Provisional tree structure automatically generated from syntactic analysis result• Divide tree factors to 2 labels: reordering label [X],and no-reordering label <X>• Use linear ordering problem(LOP) to formulate reordering model to find the approximate solution and build the parse tree

Presentations & Public Speaking

Phrase-based(the latter half)