25
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun Lang 2011-10-21 I2R SMT-Reading Group 1

Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

Embed Size (px)

Citation preview

Page 1: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

1

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

 Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea

        Z honghua liMentor: Jun Lang

2011-10-21 I2R SMT-Reading Group

Page 2: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

2

Paper info

• Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing• ACL-08 Long Paper   Cited :Thirty Seven• Authors:       Hao Z hang                               Chris Quirk      Robert C. Moore      Daniel Gildea      

Page 3: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

3

Core Ideas

• Variational Bayes• Tic-tac-toe pruning• Word-to-phrase bootstrapping

Page 4: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

4

Outline

• Paper present– Pipeline–Model– Training– Parsing (Pruning)– Result

• Shortcomings • Discussion

Page 5: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

5

Summary of the Pipeline

• Run IBM Model 1 on sentence-aligned data• Use tic-tac-toe pruning to prune the bitext space

• Word-based ITG , Variational Bayes training , get the Viterbi alignment

• Non-compositional constraints to constrain the space of phrase pairs

• Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment 

Page 6: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

6

Phrasal Inversion Transduction Grammar

Page 7: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

7

Dirichlet Prior for Phrasal ITG

Page 8: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

8

X1 Xn-1 Zn Xn+1 XN…….. ……..

root

0/0 T/Vt/vs/u

i

Review : Inside-Outside Algorithm

…….. …….. ……..

Forward-backward Algorithm: not only used for HMM, but also for any State Space Model

Inside-Outside Algorithm is a special case of Forward-backward Algorithm. 

 

Shujie liu

Page 9: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

9

VB Algorithm for Training SITGs - E1

• Inside probabilities :

Initialization :

Recursion :

i(s/u-t/v)

t/vs/u S/U

j(s/u-S/U) k (S/U-t/v)

Copy from  liu

Page 10: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

10

VB Algorithm for Training SITGs - E2

• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

Copy from liu

Page 11: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

11

VB Algorithm for Training SITGs - E2

• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

Copy from liu

Page 12: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

12

VB Algorithm for Training SITGs - E2

• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

Copy from liu

Page 13: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

13

VB Algorithm for Training SITGs - E2

• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

Copy from liu

Page 14: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

14

VB Algorithm for Training SITGs - E2

• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

j(s/u-t/v)

S/Us/u

i(S/U-s/u) k (s/u-t/v)

t/v

Copy from liu

Page 15: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

15

VB Algorithm for Training SITGs - E2

• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

j(s/u-t/v)

S/Us/u

i(S/U-s/u) k (s/u-t/v)

t/v

Copy from liu

Page 16: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

16

VB Algorithm for Training SITGs - M

•  s=3 , is the number of right-hand-sides for X•  m is the number of observed phrase pairs •  ψ is the digamma function

•  

Page 17: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

17

Pruning

• Tic-tac-toe pruning (Hao Z hang 2005)• Fast Tic-tac-toe pruning (Hao Z hang 2008)• High-precision alignments pruning (Haghighi 

ACL2009)– Prune all bitext cells that would invalidate more than 8 of high-precision alignments

• 1-1 alignment posterior pruning (Haghighi ACL2009)– Prune all 1-1 bitext cells that have a posterior below 10-4  in both HMM Models 

Page 18: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

18

Tic-tac-toe pruning (Hao Z hang 2005)

Page 19: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

19

Non-compositional Phrases Constraint

e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring 

Page 20: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

20

Word Alignment Evaluation

Both 10 iterations trainingEM : lowest AER is achieved after the second iteration , which is 0.40 .  At iteration 10, AER for EM increase to 0.42VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.

Page 21: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

21

End-to-end Evaluation

NIST Chinese-English training dataNIST 2002 evaluation datasets for tuning and evalution10-reference development set was used for MERT4-reference test set was used for evaluation.

Page 22: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

22

Shortcomings

• Grammar is not perfect• Itg ordering is context independent• Phrasal pairs are sparse

Page 23: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

23

Grammar is not perfect

• Over-counting problem• alternative ITG parse trees have the same word alignment matching, which is called over-counting problem.

ITG Parser Tree Space Word Alignment Space

I am rich !

^^

vv

Page 24: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

24

A better-constrained grammar• A series of nested constituents with the same orientation 

will always have a left-heavy derivation• And the second parser tree of the former example will not 

be generated.

C->1/3 C->2/4 C-> 3/2 C-> 4/1

A -> [C C] B -> <C C>

B -> <A B> ?

Page 25: Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun

25

Thanks Q&A