Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach

Preview:

DESCRIPTION

Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach. Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione Università di Milano Bicocca Italy. Maria Teresa Pazienza and Marco Pennacchiotti - PowerPoint PPT Presentation

Citation preview

Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach

Fabio Massimo ZanzottoDipartimento Informatica Sistemistica e ComunicazioneUniversità di Milano BicoccaItaly

Maria Teresa Pazienza and Marco PennacchiottiDepartment of Computer Science, Systems and Production

University of Roma “Tor Vergata”

Classifying Textual Entailment (TE)

Two dimensionsSemantic dimension paraphrasing (i.e., synonymy) strict entailment

Recognition dimension semantic subsumption

America Airlines will lay off ... America Airlines will fire ... syntactic subsumption

American Airlines began laying off hundreds of flight attendants on Tuesday American Airlines will fire hundreds of flight attendants

direct implication America Airlines will fire flight attendants hundreds of flight

attendents will lose their jobs

Recognizing Textual Entailment (TE)

semantic subsumption syntactic subsumption

TE is a Graph Matching problem!

T:

H:

Graph Matching (GM)

GM is used, for instance, in Image Recognition

One Problem: distortions in the input graphs!!

Textual Entailment as Graph Matching (GM)

Known limitations distortion in the input syntactic/semantic graphs (errors in

parsing, word sense disambiguation, etc.) matching nodes is more complex than simple label

matching syntactic transformations should be an invariant

phenomenon (nominalization, passivization, argument movement, ...)

textual entailment relation is an asimmetric relation

Textual Entailment Measure

What’s next

Step 1 Definition of the syntactic representation model

(Extended Dependency Graph, XDG)Step 2: Rule-based Approach Definition of the Graph Matching measure for the

textual entailment relationStep 3: SVM-based Approach Using a SVM to evaluate parameters of the Graph

matching measureStep 4 Preliminary analysis of the results on the

development set

Extended Dependency Graph (XDG)

C are constituents syntactic head potential semantic

governor D are dependencies

among constituents

GM on XDG: definitions

Isomorphic subsumptionif two biiective functions fc and fd exist

Subgraph isomorphic subsumptionif it exists so that

Maximal Common Subsumption Subgraph (MCSS)given and , is the MCSS if

andthen

Finding the bijective function and evaluating the measure

Step 1 Constituent matching (fc:ChCt bijective)

Step 2 Dependency matching (fd:DhDt bijective)

Step 3 Define MCSS using fc and fd

Step 4Evaluate Similarity Measure on MCSS

Constituent Similarity

Degree of similarity

where

Parameter Box

ht

Dependency Similarity

Degree of Similarity

AL

Parameter Box

Textual Entailment Measure

Finally....

textual entailment holds if >t

Parameter Boxt

constituents dependencies

Some more details

Syntactic Transformation nominalization passive form

Other phenomena be-sentences vs appositions, e.g., the

president of XYZ is ... treating the not

Estimating Parameters with SVM

Main idea: divide the Graph Matching measure in many subparts

Assumptions The hypothesis H is a simple S-V-O sentence SVM must learn parameters and thresholds

A possibility: Feature space divided in three parts:

Subject Related Features Main Verb Related Features Object Related Features

Feature Spaces

T:

H:

Feature Spaces

Percent of common tokens and lemmas

Task Structural (Graph) Features

Subgraph matching indicators

Mean number of commonly anchored dependencies within constituents

Used Resources

Chaos: A modular and lexicalised parser for English and Italian (Basili&Zanzotto, 1998, 2002) based on the extended dependency graph (XDG) formalism

WordNet SVMlight

Preliminary analysis (Rule-based System)

Analysis of on dev1

we decided for:=0.85=0.85=0.5

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Treshold

Prec

Preliminary analysis (SVM-based system)

Test Bed: dev1+dev2 Test Method: 3-fold cross validation repeated 10

times

winning horse!

Out from the Fairy Tale...

... and back to real life!!!!

Comdex -- once among the world's largest trade shows, the launching pad for new computer and software products, and a Las Vegas fixture for 20 years -- has been canceled for this year.

Los Vegas hosted the Comdex trade show for 20 years.

Recommended