30
Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete

University of Rome “Tor Vergata”

Roma, Italy

Efficient kernels for sentence pair classification

Page 2: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Classifying sentence pairs is an important activity in many NLP tasks, e.g.:– Textual Entailment Recognition– Machine Translation– Question-Answering

• Classifiers need suitalble feature spaces

Motivation

Page 3: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

For example, in textual entailment…

Motivation

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

P1: T1 H1

T2

H2

“They feed dolphins fishs”

“Fishs eat dolphins”

P2: T2 H2

T3

H3

“Mothers feed babies milk”

“Babies eat milk”

P3: T3 H3

Training examples

Classification

Relevant Featuresfeed eatX Y X Y

First-order rules

Page 4: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• First-order rule (FOR) feature spaces: a challenge

• Tripartite Directed Acyclic Graphs (tDAG) as a solution:– for modelling FOR feature spaces– for defining efficient algorithms for computing kernel functions

with tDAGs in FOR feature spaces

• An efficient algorithm for computing kernels in FOR spaces

• Experimental and comparative assessment of the computational efficiency of the proposed algorithm

In this talk…

Page 5: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

We want to exploit first-order rule (FOR) feature spaces writing the implicit kernel function

K(P1,P2)=|S(P1)S(P2)|

that computes how many common first-order rules are activated from P1 and P2

Without loss of generality, we present the problem in syntactic-first-order rule feature spaces

First-order rule (FOR) feature spaces: challenges

Page 6: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• … using the Kernel Trick: – define the distance K(P1 , P2)

– instead of defining the feautures

Observations

T1 H1

T1 H2

K(T1 H1,T1 H2)

Page 7: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature spaces: challenges

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extracts

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

,

VP

S

NP

S

NP VP1

, VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

T1

H1

“Farmers feed cows animal extracts”

“Cows eat animal extracts”

T1 H1

feedeat

Pa=

S(Pa)=

Adding placeholdersPropagating placeholders

Page 8: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature spaces: challenges

S

NP VP

VB

eat

VP

VB NP

feed

NPNNS

Babies

NNS

babies

NN

milk

S

NP

NNS

Mothers

1 2

2

1 2

1

1

1

1

, NP

NN

milk2

2

2

T3

H3

“Mothers feed babies milk”

“Babies eat milk”

T3 H3

Pb=

S(Pb)=VP

S

NP

S

NP VP1

, VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

feedeat

Page 9: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

First-order rule (FOR) feature spaces: challenges

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

VP

S

NP

S

NP VP1

, VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

feedeat

VP

S

NP

S

NP VP1

, VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

feedeat

K(Pa,Pb)=|S(Pa)S(Pb)|

S(Pb)=

S(Pa)=

,=

==

Page 10: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• FOR feature spaces can be modelled with particular graphs

• We call these graphs tripartite direct acyclic graphs (tDAGs)

• Observations:– tDAGs are not trees– tDAGs can be used to model both rules and sentence

pairs– unifying rules in sentences is a graph matching problem

A step back…

Page 11: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

As for Feature Structures…

Tripartite Directed Acyclic Graphs (tDAG)

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extracts

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

Page 12: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

As for Feature Structures…

Tripartite Directed Acyclic Graphs (tDAG)

S

NP VP

VB NP

X

Y

eat

VP

VB NP X

feed

NP Y

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extracts

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

Page 13: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

A tripartite directed acyclic graph is a graph

G = (N,E)

where:• the set of nodes N is partitioned in three sets Nt, Ng, and A

• the set of edges is partitioned in four sets Nt, Ng, EA(t), and EA(g)

where

t = (Nt,Et) and g = (Nt,Et) are two trees

EA(t) = {(x, y)|x Nt and yA}

EA(g) = {(x, y)|x Ng and yA}

Tripartite Directed Acyclic Graphs (tDAGs)

Page 14: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Alternative definition

A tDAG is a pair of extented trees

G = ( ,t g)

where:

t = (NtAt,EtEA(t)) and g = (NgAg,EgEA(g)).

Tripartite Directed Acyclic Graphs (tDAGs)

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

S

NP VP

NP

eat

VP

VB

feed

NPNP

VB

X

Y

X Y

Page 15: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Computing the implicit kernel function

K(P1,P2)=|S(P1)S(P2)|

involves general graph matching. This is an exponential problem.

Yet…

tDAGs are particular graphs and we can define an efficient algorithm

We will analyze the isomorphism among tDAGs and we will derive an algorithm for

Again challenges

Page 16: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism between graphs

G1=(N1,E1) and G2=(N2,E2) are isomorphic if:– |N1|=|N2| and |E1|=|E2|

– Among all the bijecive functions relating N1 and N2, it exists f : N1 N2 such that:• for each n1 in N1, Label(n1)=Label(f(n1))

• for each (na,nb) in E1, (f(na),f(nb)) is in E2

Isomorphism between tDAGs

Page 17: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism adapted to tDAGs

G1 = (t1,g1) and G2 = (t2,g2) are isomorphic if these two properties hold– Partial isomorphism

• 1 g and 2 g are isomorphic• 1 t and 2 t are isomorphic• This property generates two functions fg and ft

– Constraint compatibility• fg and f t are compatible on the sets of nodes A1 and A2, if

for each n A1, it happens that f g (n) = f t (n).

Isomorphism between tDAGs

Page 18: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Isomorphism between tDAGs

VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,

VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,

Ct=

Ct= Cg

1 1{ ), 3 2( ),( }, Cg= 1 1{ ), 3 2( ),( },

Partial isomorphism

Constraint compatibility

Pa=(ta,ga)=

Pb=(tb,gb)=

Page 19: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

We define

K(P1,P2)=|S(P1)S(P2)|

using the isomorphism between tDAGs

The idea: reverse the order of isomorphism detection• First, constraint compatibility

– Building a set C of all the relevant alternative constraints – Finding subsets of S(P1)S(P2) meeting a constraint cC

• Second, partial isomorphism detection

Ideas for building the kernelsubsets of S(P1)S(P2)

Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 20: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

C={c1,c2}={ 1 1{ ), 2 2( ),( }, , 1 1{ ), 2 3( ),( }, }

K(Pa,Pb)=|S(Pa)S(Pb)|

Pa=(ta,ga)=

Pb=(tb,gb)=

subsets of S(P1)S(P2)

Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 21: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

1 1{ ), 2 2( ),( },c1=

A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

B B 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,{

}

, ,

,

C={c1,c2}

S(Pa)S(Pb)) c1=

Pa=

Pb=

subsets of S(P1)S(P2)

Alternative constraints

Partial Isomorphism

Constraint compatibility

K(Pa,Pb)=|S(Pa)S(Pb)|K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|

Page 22: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

C

1

1

C 2B B 21

1

1

A

B C

C

1

1

C 3B B 21

1

1

I

M N

N

1

1

N 1M M 12

1

2

I

M N

N

1

1

N 1M M 13

1

2

,

,

1 1{ ), 2 3( ),( },c2=

A

B C

1

1

C C 21

1

I

M N

M

1

1

M 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

C C 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,{

}

, ,

,

C={c1,c2}

K(Pa,Pb)=|S(Pa)S(Pb)|=|(S(Pa)S(Pb)) c1(S(Pa)S(Pb)) c2|

Pa=

Pb=

S(Pa)S(Pb)) c2=

subsets of S(P1)S(P2)

Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 23: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Ideas for building the kernel

A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1 1

I

M N

N

1

1

N 1

1

2

,

A

B C

1

1

B B 21

1I

M N

1

1 1

,

A

B C

1

1 1

I

M N

1

1 1 ,=

{

} =

, ,

,

={A

B C

1

1

B B 21

1

I

M N

N

1

1

N 1

1

2

,A

B C

1

1 1

I

M N

1

1 1, }

=}{

(S(Pa)S(Pb)) c1

=(S(ta)S(tb)) c1 (S(ga)S(gb)) c1

K(Pa,Pb)=|cC(S(Pa)S(Pb))c|=|cC (S(ta)S(tb))c(S(ga)S(gb))c|

subsets of S(P1)S(P2)

Alternative constraints

Partial Isomorphism

Constraint compatibility

Page 24: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

The general Equation

can be computed using:1) KS (kernel function for trees) introduced in(Duffy&Collins, 2001)

and refined in (Moschitti&Zanzotto, 2007)

2) The inclusion exclusion principle

Kernel on FOR feature spaces

K(P1,P2)=|cC (S(t1)S(t2))c(S(g1)S(g2))c|

Page 25: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Comparison Kernel (Zanzotto&Moschitti, Coling-ACL 2006),(Moschitti&Zanzotto, ICML 2007)

• Test-bed: corpus– Recognizing Textual Entailment challenge data

Computational Efficency Analysis

Page 26: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

Computational Efficency Analysis

Execution time in seconds (s) for all the RTE2 with respect to different numbers of allowed placeholders

Page 27: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• Training: RTE 1, 2, 3 • Testing: RTE 4

Accuracy Comparison

Page 28: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

• We reduced kernels in first-order feature spaces as graph-matching problems

• We defined a new class of graphs, tDAGs• We presented an efficient algorithm for computing

kernels in FOR feature spaces

Conclusions

Page 29: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

S

NP VP

VB NP

eat

VP

VB NP

feed

NPNNS

CowsNN NNS

animal extracts

NNS

cows

NN NNS

animal extracts

S

NP

NNS

Farmers

1 2 3

3

3

1 2

1

2 3

3

3

21

1

1

,

VP

S

NP

S

NP VP1

, VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }

Page 30: Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification

F.M.Zanzotto

University of Rome “Tor Vergata”

VP

S

NP

S

NP VP1

, VP

VB NP NP 21

S

NP VP

VB NP 2

1 ,, ,...{ }

S

NP VP

VB

eat

VP

VB NP

feed

NPNNS

Cows

NNS

babies

NN

milk

S

NP

NNS

Mothers

1 2

2

1 2

1

1

1

1

, NP

NN

milk2

2

2

VP

S

NP

S

NP VP1

, VP

VB NP NP 31

S

NP VP

VB NP 3

1 ,, ,...{ }