Using the Web to Validate Lexico-Semantic Relationshpcosta/docs/papers/presentations/201110-E… ·...

Using the Web toValidate Lexico-Semantic Relations

Hernani Costa1, Hugo Goncalo Oliveira2 and Paulo Gomes

{hpcosta,hroliv,pgomes}@dei.uc.pt

Cognitive & Media Systems GroupCISUC, University of Coimbra

Lisbon, October, 2011

1supported by FCT sholarship BII/FCTUC/C2008/CISUC/2ndPhase.2supported by FCT scholarship SFRH/BD/44955/2008.

Hernani Costa et al. (CISUC) TeMA, EPIA 2011 Lisbon, October, 2011 1 / 20

1 Introduction

2 Web-based Similarity Measures

3 ExperimentationDatasetsPreliminary AnalysisCorrelation AnalysisSelect the correct instances

4 Final Remarks

Introduction

Information extraction (IE) from text

1 Hernani is a researcher at the University of Coimbra.

2 Animals, such as dogs.

Entities

I HernaniI University of CoimbraI animalI dog

Binary relations

I t1 = (Hernani, has affiliation,University of Coimbra)I t2 = (animal, hypernym of , dog)

Introduction

EntitiesI HernaniI University of Coimbra

I animalI dog

Binary relations

I t1 = (Hernani, has affiliation,University of Coimbra)I t2 = (animal, hypernym of , dog)

Introduction

EntitiesI HernaniI University of Coimbra

I animalI dog

Binary relationsI t1 = (Hernani, has affiliation,University of Coimbra)

I t2 = (animal, hypernym of , dog)

Introduction

EntitiesI HernaniI University of CoimbraI animalI dog

Binary relationsI t1 = (Hernani, has affiliation,University of Coimbra)

I t2 = (animal, hypernym of , dog)

Introduction

EntitiesI HernaniI University of CoimbraI animalI dog

Binary relationsI t1 = (Hernani, has affiliation,University of Coimbra)I t2 = (animal, hypernym of , dog)

Introduction

Evaluation of semantic relations

Generally ends up being done by humans...

I Manual evaluation

F Less prone to errors

I But...

F Hard to repeatF Time-consumingF (More) subjective

Approaches for automatic evaluation of domain ontologies

I All have limitationsI Not well-suited for broad-coverage open-domain knowledge!

Introduction

I Manual evaluationF Less prone to errors

I But...

F Hard to repeatF Time-consumingF (More) subjective

Introduction

I But...F Hard to repeatF Time-consumingF (More) subjective

Introduction

I But...F Hard to repeatF Time-consumingF (More) subjective

Approaches for automatic evaluation of domain ontologiesI All have limitationsI Not well-suited for broad-coverage open-domain knowledge!

Introduction

Relations’ confidence

IE system with a pattern learning componentI Higher recallI Lower precision

Words that occur in the same contexts, tend to have similarmeanings [Harris, 1970]

Rank instances (and patterns)

I Take advantage of redundancyI Similarity measuresI Assign a confidence value

Introduction

Rank instances (and patterns)

I Take advantage of redundancyI Similarity measuresI Assign a confidence value

Introduction

Rank instances (and patterns)I Take advantage of redundancyI Similarity measuresI Assign a confidence value

Introduction

In this work...

Relations are denoted in free text by disciminating patterns

Relations occurring more times, tend to be more relevant/correct

Compute the confidence on a lexico-semantic relation

I Frequent discriminating patterns for a relation (eg. ”is a” or ”andother” for hyponymy)

I Distribution of related words connected by these patterns on the WebI Similarity measures

How suitable are these measures to validate lexico-semantic relations?

Select the best measure(s) for this task

Introduction

In this work...

Compute the confidence on a lexico-semantic relation

I Frequent discriminating patterns for a relation (eg. ”is a” or ”andother” for hyponymy)

Introduction

In this work...

Compute the confidence on a lexico-semantic relationI Frequent discriminating patterns for a relation (eg. ”is a” or ”and

other” for hyponymy)

Introduction

In this work...

other” for hyponymy)I Distribution of related words connected by these patterns on the WebI Similarity measures

Introduction

In this work...

other” for hyponymy)I Distribution of related words connected by these patterns on the WebI Similarity measures

Web-based Similarity Measures

Measures Application

Web-based similarity measuresI P(q) ⇒ page count in a search engine for q

Similarity of words e1 and e2

I P(e1 ∩ e2) ⇒ P(“e1 AND e′′2 )

I P(e1 ∪ e2) ⇒ P(e1) + P(e2)

Confidence for relation t = (e1, r , e2)

I πri is a discriminating pattern for relation rI P(e1) ⇒ P(e1 πri )I P(e2) ⇒ P(πri e2)I P(e1 ∩ e2) ⇒ P(e1 πri e2)

If e1={planet}, e2={Mars} and πri ={such as}

I P(e1)={planet such as}I P(e2)={such as Mars}I P(e1 ∩ e2)={planet such as Mars}

Similarity of words e1 and e2I P(e1 ∩ e2) ⇒ P(“e1 AND e′′