26
Graph Structure Use search graph in phrase-based model At weighted acyclic directed graph G < Ф,V,E,s,g,> Ф : phrase pair sets =feature vector h( ) weight V: vertex partial hypotheses E:edges weight of route E ⊆ V×V× Ф×A A: weight sets

Phrase-based(the latter half)

Embed Size (px)

Citation preview

Page 1: Phrase-based(the latter half)

Graph Structure

・ Use search graph in phrase-based model ・ At weighted acyclic directed graph G < Ф,V,E,s,g,> Ф : phrase pair sets =feature vector h( ・ ) ・ weight V: vertex partial hypotheses E:edges weight of route E ⊆ V×V× Ф×A A: weight sets

Page 2: Phrase-based(the latter half)

Graph Structure

• out()= edge sets which go out from vertex • in() = : edge sets which head to vertex ->Phrase pairs are linked by <out(), in()>At figure 5.8, phrase pair <へ行った , I went to> is linked by out() = <-----,0,<s>> and in()=<-- ・・・ ,9,went to>

𝑣

𝑣

Page 3: Phrase-based(the latter half)

Graph Structure

• If Ѱ=(, ,…, ): rout from start to any vertexs, head()=tail(), then

Source language phrase sets: Target language phrase sets: Route weight: =

Page 4: Phrase-based(the latter half)

Graph Structure

• In Fig.5.8, for the route

-> the parallel of word sets of source language 「行った」「へ」「領事館」 is “He went to the consulate”

Start

<行った ,He went>

<へ ,to><領事館 ,

the consulate>

Page 5: Phrase-based(the latter half)

Semiring

• set R equipped with two binary operations addition“ + ” and multiplication “ × ”

• Associative: a+(b+c)=(a+b)+c, a×(b×c)=(a×b)×c• Commutative: a+b=b+a• Distributional: a×(b+c)=(a×b)+(a×c)• Additive inverse, multiplicative inverse 0+a=a+0=a; 1×a=a×1=a; 0×a=a×0=0 are not defined

Page 6: Phrase-based(the latter half)

Semiring

• In Table 5.1, tropical semiring is used to solve maximization problem for route weight in decoder

A ⊕ ⊗

Tropical max + ー 0

Page 7: Phrase-based(the latter half)

Semiring

• In weight directed graph G, for a rout from starting point to ending point of source language input f is Ѱ=

• Score of Ѱ = product of partial route = -> Problem which maximize this score is max⊗()= ⊕⊗()

A ⊕ ⊗Tropical max + ー 0

Page 8: Phrase-based(the latter half)

Semiring

• In Fig.5.7,line 11 Q(+1,)max additive operation ⊕ is implemented for each vertex tail(e)=s of G• As semiring sastifies distributional feature-> weight of any vertexs V is ⊕⊗()=⊗

Page 9: Phrase-based(the latter half)

Semiring

• Forward-backward algorithm for finding maximum of route weight in graph structure

• topological order(G): list of vertexs of graph G which arranged in topological order

• external variable

Page 10: Phrase-based(the latter half)

Semiring

FORWARD(G)• topological order(G), ein()⊗

⊕ Start

tail(e)(e)

(e) ⊗

Page 11: Phrase-based(the latter half)

Semiring

BACKWARD(G)• inversetopological order(G), e()⊗

⊕ Goal

(e)

(e) ⊗

head(e)

Page 12: Phrase-based(the latter half)

Semiring

In problem which choose the optimum translation from search space expressed by weighted directed graph G Tropical semiring + Forward algorithm->Viterbi semiring

Page 13: Phrase-based(the latter half)

k-best

• Besides forward-backward algorithm, k-best algorithm is used to optimize route weight

• Dijkstra’s algorithm: for single source shortest path problem

• Eppstein’s algorithm: for heaping multiple paths efficiently

Page 14: Phrase-based(the latter half)

k-best

• Assume problem satisfies Tropical semiring and backward algorithm• Calculate and choose max (weight )• Fig.5.10 algorithm ・ cand: priority queue ・ < , s>: partial route ・ < ,>: partial route whose vertex and edgeout() ・ D: set of < ,>

Page 15: Phrase-based(the latter half)

k-best

• k=1: Initialized cand

• Optimize weight of partial route and whole route

Whole route

D

cand

optimal

get out < , s>,register D Choose and out() insert to cand

heap ( ・ ) to get optimal

k time

Page 16: Phrase-based(the latter half)

Limitation of Search Space

• If search space is big->any sort can be forgiven->calculation amount of decode algorithm become massive->limitation is necessary: ・ Distortion limit, constraint ・ Reordering limit, constraint

Page 17: Phrase-based(the latter half)

Distortion Constraint

• Upper limit setting d for distance between phrase pair d The purpose is making model score small if model distorted lead to penalty become bigFor language pair which do not have big sort, distortion constraint reach good efficiencyIf d=0: no skip, translate from left to right smoothly->monotone translation

Page 18: Phrase-based(the latter half)

Distortion Constraint• Constraint for case when have partial phrases do not reach the ending point : position of the first phrase of source language : the first position of translated phraseIf (), add d・ IBM Constraint

�̈� 𝑠𝑡𝑎𝑟𝑡𝑘 𝑒𝑛𝑑𝑘・・・

phrase

No need to exam

Page 19: Phrase-based(the latter half)

Beam Search

・ Prune disused partial hypothesis and pay attention only partial hypothesis with high score for computational reduction・ Group of vertexs of search graph and prune partial hypothesis which has low score

Page 20: Phrase-based(the latter half)

Beam Search・ Group of vertexs of search graph and prune partial hypothesis which has low score

Partial hypothesis pruned Partial hypothesis chose

Page 21: Phrase-based(the latter half)

Beam Search

Some kinds of grouping: - Cover vector grouping - Radix grouping - Beam width pruning - Histogram pruning

Page 22: Phrase-based(the latter half)

Heuristic Function

• Prevent partial hypothesis which has not been translated yet from pruning• Give predicted score for the rout and learn by A* search so that rout score get the maximum• ->can reduce search error

Page 23: Phrase-based(the latter half)

Pre-reordering Method

Translation between languages which has significantly different grammatical structure• Pre-reordering rule• Pre-reordering model• Pre-reordering learning

Page 24: Phrase-based(the latter half)

Pre-reordering Rule

• Based on tree from syntactic analysis, reorder to target language word order• Head-driven phrase structure grammar(HPSG)’s rule: - Syntactic anlysis - Move the subjects back

Page 25: Phrase-based(the latter half)

Pre-reordering Model

• Source languages must have syntactic analysis tool and morphological analysis tool• Bilingual data are necessary• Probability value of pre-reordering patterns obtained will be estimated by maximum-likelihood estimation(MLE)• Choose the suitable pre-reordering patterns based on reordering part of speech from morphological analysis, or clustering word class

Page 26: Phrase-based(the latter half)

Pre-reordering Learning

• For language pairs without any syntactic analysis tools and morphological analysis tools• Provisional tree structure automatically generated from syntactic analysis result• Divide tree factors to 2 labels: reordering label [X],and no-reordering label <X>• Use linear ordering problem(LOP) to formulate reordering model to find the approximate solution and build the parse tree