59
Tiziano Flati and Roberto Navigli SPred: Large-scale Harvesting of Semantic Predicates Cup of

SPred : Large-scale Harvesting of Semantic Predicates

Embed Size (px)

DESCRIPTION

SPred : Large-scale Harvesting of Semantic Predicates. Cup of. Tiziano Flati and Roberto Navigli. “. Over 2.25 billion cups of coffee are consumed in the world every day. ”. c up of *. c up of *. Objective :. cup of *. Challenge #1: discovering representative arguments. - PowerPoint PPT Presentation

Citation preview

Page 1: SPred : Large-scale Harvesting of Semantic Predicates

Tiziano Flati and Roberto Navigli

SPred: Large-scale Harvesting of Semantic Predicates

Cup of

Page 2: SPred : Large-scale Harvesting of Semantic Predicates

Over 2.25 billioncups of coffee are consumed in the world every day

2SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 3: SPred : Large-scale Harvesting of Semantic Predicates

cup of *

3SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 4: SPred : Large-scale Harvesting of Semantic Predicates

cup of *

4SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 5: SPred : Large-scale Harvesting of Semantic Predicates

cup of *

𝒍𝒊𝒒𝒖𝒊 𝒅𝒏𝟏

𝒅𝒂𝒊𝒓𝒚 𝒑𝒓𝒐𝒅𝒖𝒄𝒕𝒏𝟏

𝒄𝒐𝒖𝒏𝒕𝒓 𝒚 𝒏𝟏

Objective:

5SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 6: SPred : Large-scale Harvesting of Semantic Predicates

Challenge #1: discovering representative arguments

6SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 7: SPred : Large-scale Harvesting of Semantic Predicates

Challenge #2: inferring semantic classes

cup of *

𝒍𝒊𝒒𝒖𝒊 𝒅𝒏𝟏 𝒅𝒂𝒊𝒓𝒚 𝒑𝒓𝒐𝒅𝒖𝒄𝒕𝒏

𝟏

𝒄𝒐𝒖𝒏𝒕𝒓 𝒚 𝒏𝟏

7SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 8: SPred : Large-scale Harvesting of Semantic Predicates

LEXICALPATTERNS

X such as Y

[Resnik ‘96,Erk ‘07,

Chambers & Jurasky ‘10]

[Hearst 92,Kozareva & Hovy ‘10,

Wu & Weld ‘10]

EAT

MEAT

GAS

FISH

ICE CREAM

SELECTIONALPREFERENCES

8SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 9: SPred : Large-scale Harvesting of Semantic Predicates

[Resnik ‘96,Erk ‘07,

Chambers & Jurasky ‘10]

[Hearst 92,Kozareva & Hovy ‘10,

Wu & Weld ‘10]

EAT

MEAT

GAS

FISH

ICE CREAM

SELECTIONALPREFERENCES

LEXICALPATTERNS

X such as Y

9SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 10: SPred : Large-scale Harvesting of Semantic Predicates

[Resnik ‘96,Erk ‘07,

Chambers & Jurasky ‘10]

[Hearst 92,Kozareva & Hovy ‘10,

Wu & Weld ‘10]

EAT

MEAT

GAS

FISH

ICE CREAM

SELECTIONALPREFERENCES

LEXICALPATTERNS

X such as Y

10SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 11: SPred : Large-scale Harvesting of Semantic Predicates

Challenge #1: discovering representative arguments

Challenge #2: inferring semantic classes

SPred

11SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 12: SPred : Large-scale Harvesting of Semantic Predicates

Challenge #2: inferring semantic classes

SPredCONTRIBUTION # 1

Capturing concepts for long tail arguments using a novel wikification

procedure

12SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 13: SPred : Large-scale Harvesting of Semantic Predicates

CONTRIBUTION # 1Capturing concepts for long tail

arguments using a novel wikification procedure

CONTRIBUTION # 2Inferring WordNet semantic classes

from a distribution of Wikipedia pages

SPred

13SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 14: SPred : Large-scale Harvesting of Semantic Predicates

METHODOLOGY

14SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 15: SPred : Large-scale Harvesting of Semantic Predicates

WordNet

WordNet

HARVESTING ARGUMENTS

FROM WIKIPEDIA

LINKING ARGUMENTSTO WIKIPEDIA

AND WORDNET

LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES

cup of ** was designed by

the biggest * in 1987

a very big *

cup of [Beverage]

[Structure] was designed by

the biggest [Event] in 1987

a very big [Phenomenon]

15SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 16: SPred : Large-scale Harvesting of Semantic Predicates

cup of *

LEXICAL PREDICATE

16SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 17: SPred : Large-scale Harvesting of Semantic Predicates

Cup of coffee

LEXICAL PREDICATE cup of

was designed by

the biggest in 1987

a very big

****

17SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 18: SPred : Large-scale Harvesting of Semantic Predicates

cup of coffee

FILLING ARGUMENT

18SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 19: SPred : Large-scale Harvesting of Semantic Predicates

FILLING ARGUMENT

Cup of coffee

red wine

Italy

was designed by

was designed by

artist

hotel

cup of

cup of

dress

bridge

a very big

a very big

19SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 20: SPred : Large-scale Harvesting of Semantic Predicates

20

cup of [Beverage]

SEMANTICPREDICATE

[Liquid]

[Milk] [Alcohol] [Coffee]

[Irish coffee]

Example output

SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 21: SPred : Large-scale Harvesting of Semantic Predicates

Cup of Beverage

SEMANTIC PREDICATE

cup of

cup of

[Clothing]

[Platform]

a very big

a very big

[Beverage]

[Country]

was designed by

was designed by

[Artist]

[Building]

21SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 22: SPred : Large-scale Harvesting of Semantic Predicates

cup of Beverage

Structure was designed by

the biggest Event in 1987

a very big Phenomenon

…WordNet

WordNet

HARVESTING ARGUMENTS

FROM WIKIPEDIA

LINKING ARGUMENTSTO WIKIPEDIA

AND WORDNET

LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES

cup of ** was designed by

the biggest * in 1987

a very big *

lexical predicate

lexical predicate

CLASSCLASS

CLASSCLASS

22SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 23: SPred : Large-scale Harvesting of Semantic Predicates

cup of * ( )

23SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 24: SPred : Large-scale Harvesting of Semantic Predicates

cup of *

coffee

tea

Italy

milk

yeast

24SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 25: SPred : Large-scale Harvesting of Semantic Predicates

cup of Beverage

Structure was designed by

the biggest Event in 1987

a very big Phenomenon

…WordNet

WordNet

HARVESTING ARGUMENTS

FROM WIKIPEDIA

LINKING ARGUMENTSTO WIKIPEDIA

AND WORDNET

LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES

cup of ** was designed by

the biggest * in 1987

a very big *

lexical predicate *

lexical predicate

[CLASS][CLASS]

[CLASS][CLASS]

25SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 26: SPred : Large-scale Harvesting of Semantic Predicates

𝑳𝑰𝑵𝑲

!𝑴𝑨𝑷

!

Earl grey tea

𝒕𝒆𝒂cup of

cup of

Earl grey tea

cup of

Earl grey tea

cup of

26SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 27: SPred : Large-scale Harvesting of Semantic Predicates

Research question #1: How to determine which Wikipedia page best corresponds to

an argument?

… and drank over twenty

cups of coffee each day…

?

27SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 28: SPred : Large-scale Harvesting of Semantic Predicates

Wikipedians will occasionallylink the arguments for us

William G. McGowan

He was also a three-pack-a-day smoker and drank over twenty cups of coffee each day until his first heart attack. As leader of MCI, he labored for several years to gain the financing and …

For free!

28SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 29: SPred : Large-scale Harvesting of Semantic Predicates

All instances of ‘coffee’

linked

Problem #1: Not many arguments are linked

29SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

113

4

?

Page 30: SPred : Large-scale Harvesting of Semantic Predicates

113

All instances of ‘coffee’

How to link these instances?

Problem #1: Not many arguments are linked

4 linked

30SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

?

Page 31: SPred : Large-scale Harvesting of Semantic Predicates

the greatest benefits were observed in those who drank coffee for a long period in their lifetime.

[…]

roughly 80 to 100 cups of coffee for an average adult taken within a limited time…

1st heuristic: One sense per page

Health effects of

caffeine

If the argument text has been linked somewhere else in the article, use that link’s page

Manually linked

One sense

per page

31SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 32: SPred : Large-scale Harvesting of Semantic Predicates

Trust the

inventory

2nd heuristic: Trust the inventory

1 sense

only!

If there’s only one page for that argument text, link to that page

his days in the library with a cup ofEarl Grey tea. The main character of the…

32SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 33: SPred : Large-scale Harvesting of Semantic Predicates

Problem #2: Same argument linkedto multiple pages

42

78

linked

linked

All instances of ‘water’

?33SPred: Large-scale Harvesting of Semantic Predicates

Flati, Navigli

100%

Page 34: SPred : Large-scale Harvesting of Semantic Predicates

Research question #2: How to determinewhich WordNet concepts best represent Wikipedia pages?

cup of * ( )𝒘𝒂𝒕𝒆𝒓

𝒕𝒆𝒂

𝑰𝒕𝒂𝒍𝒚34SPred: Large-scale Harvesting of Semantic Predicates

Flati, Navigli

Page 35: SPred : Large-scale Harvesting of Semantic Predicates

NEs andspecialized concepts from Wikipedia

BabelNet: a mapping from Wikipedia pages to concepts

[Navigli & Ponzetto, 2012]

𝒘𝒘𝒘 .𝒃𝒂𝒃𝒆𝒍𝒏𝒆𝒕 .𝒐𝒓𝒈35SPred: Large-scale Harvesting of Semantic Predicates

Flati, Navigli

Concepts from WordNet

Concepts integrated from both resources

Page 36: SPred : Large-scale Harvesting of Semantic Predicates

Argument mapping

Coffee is a brewed beverage with a distinct aroma and flavor, prepared from the roasted seeds…

Coffee

𝝁 (𝑪𝒐𝒇𝒇𝒆𝒆 )

𝝁

𝒄𝒐𝒇𝒇𝒆𝒆

36SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 37: SPred : Large-scale Harvesting of Semantic Predicates

Argument mapping

The vast majority of Wikipedia pages [4M+]do not have a corresponding concept in WordNet [117K+]

= ?

37

( ) SPred: Large-scale Harvesting of Semantic Predicates

Flati, Navigli

Page 38: SPred : Large-scale Harvesting of Semantic Predicates

Argument mapping: hypernym extraction

Earl Grey tea

Earl Grey tea is a tea

with a distinctive flavour and aroma derived from the addition of oil extracted from the rind of the bergamot orange, a fragrant citrus fruit. Traditionally, the term "Earl Grey“…

Target lemma

Hypernym extracted by WCL

Definitional sentence

Tea is an aromatic beverage commonly prepared by pouring hot or boiling water…

Tea

WCL+ link

[Navigli & Velardi, 2010]

38SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 39: SPred : Large-scale Harvesting of Semantic Predicates

Argument mapping: an example

We can thus synergistically map to WordNet more than 500K pages!

WCL

In literature, the main character in Haruki Murakami's Kafka on the Shore starts his days in the library with a cup of

Earl Grey tea. The main character of the…

Earl Grey tea is a tea with a distinctive flavour and aroma derived from…

Earl Grey teaTea is an aromatic beverage commonly prepared by pouring hot or boiling water…

Tea

Trust t

he

invento

ry

WCL

BabelNet

𝒕𝒆𝒂

39SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 40: SPred : Large-scale Harvesting of Semantic Predicates

cup of Beverage

Structure was designed by

the biggest Event in 1987

a very big Phenomenon

…WordNet

WordNet

HARVESTING ARGUMENTS

FROM WIKIPEDIA

LINKING ARGUMENTSTO WIKIPEDIA

AND WORDNET

LINKING ARGUMENTSFROM WORDNET TO SEMANTIC CLASSES

cup of ** was designed by

the biggest * in 1987

a very big *

lexical predicate

SEMANTICPREDICATE

40SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 41: SPred : Large-scale Harvesting of Semantic Predicates

𝒄𝒐𝒇𝒇𝒆𝒆

[𝑩𝒆𝒗𝒆𝒓𝒂𝒈𝒆 ]

𝒄𝒐𝒇𝒇𝒆𝒆

Research question #3: how to generalize WordNet concepts associated with

arguments?

41SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 42: SPred : Large-scale Harvesting of Semantic Predicates

3K+ most frequent concepts freely downloadable

Generalization to semantic classes

{}

{}

{}

{}

{}

{}

{}

CORECONCEPTS

Core concepts of {}

42SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 43: SPred : Large-scale Harvesting of Semantic Predicates

3K+ most frequent concepts freely downloadable

Generalization to semantic classes

{}

{}

{}

{}

{}

{}

{}

Core concepts of {}

Semantic Class of {}

CORECONCEPTS

43SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 44: SPred : Large-scale Harvesting of Semantic Predicates

• By repeating the same procedure for all thearguments of a lexical predicate we discover clusters of arguments for each semantic class

Generalization to semantic classes

Semantic class

In literature, the main character in Haruki Murakami's Kafka on the Shore starts his days in the library with a cup of

Earl Grey tea. The main character of the…

Earl Grey tea is a tea with a distinctive flavour and aroma derived from…

Earl Grey teaTea is an aromatic beverage commonly prepared by pouring hot or boiling water…

Tea

Trust t

he

invento

ry

WCL

BabelNet

𝒕𝒆𝒂

[𝑩𝒆𝒗𝒆𝒓𝒂𝒈𝒆 ]

44SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 45: SPred : Large-scale Harvesting of Semantic Predicates

[𝒘𝒊𝒏𝒆𝒏𝟏 ] [𝒃𝒆𝒗𝒆𝒓𝒂𝒈𝒆𝒏

𝟏 ][𝒄𝒐𝒇𝒇𝒆𝒆𝒏𝟏 ] [𝒘𝒂𝒕𝒆𝒓 𝒏

𝟏 ]

earl grey tea

tea

water

seawater

coffee

cappuccino

wine

white wine

Classes sorted by frequency!

cup of *

45SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 46: SPred : Large-scale Harvesting of Semantic Predicates

EVALUATION

46SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 47: SPred : Large-scale Harvesting of Semantic Predicates

1st EvaluationSemantic class ranking quality

47SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 48: SPred : Large-scale Harvesting of Semantic Predicates

Experimental SetupLexical predicate Argument

provide * minerals

give birth to * child

publish * review

build * suspence

* collide car

get stuck in * traffic jam

reduce * pollution

… …

DATASET 150 random

lexical predicatesfrom

Oxford AdvancedLearner's Dictionary

48SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 49: SPred : Large-scale Harvesting of Semantic Predicates

Precision @ K[Wine][Feeling]

[Coffee][Water]

[Dairy product]

[Country]

Impo

rtan

ce

Top Ksemanticclasses

# correct

KP@K =

49SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 50: SPred : Large-scale Harvesting of Semantic Predicates

Results for dataset 1

1 2 3 4 5 6 7 8 9 10111213141516171819200.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

SPred

K (semantic classes)

Pre

cisi

on

@K

50SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 51: SPred : Large-scale Harvesting of Semantic Predicates

Experimental Setup

DATASET 224 lexical patterns

fromKozareva & Hovy 2010

Lexical predicate

work for *

* work for

fly to *

* fly to

go to *

* go to

* celebrate

* dress

51SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 52: SPred : Large-scale Harvesting of Semantic Predicates

1 2 3 4 5 6 7 8 9 10111213141516171819200.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

SPred

K (semantic classes)

Pre

cisi

on

@K

K&H

Results for dataset 2

52SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 53: SPred : Large-scale Harvesting of Semantic Predicates

2nd EvaluationArgument disambiguation quality

53SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 54: SPred : Large-scale Harvesting of Semantic Predicates

Lexical predicate Argument

provide * minerals

give birth to * child

publish * review

build * suspence

* collide car

get stuck in * traffic jam

reduce * pollution

… …

Experimental Setup

54

• ~ 800 lexical predicatessampled from theOxford AdvancedLearner’s Dictionary

• 3,245 items manuallyannotated with themost suitablesemantic class

DATASET

SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 55: SPred : Large-scale Harvesting of Semantic Predicates

Results

Precision Recall F10

10

20

30

40

50

60

70

80

90

SPredRandom

Per

form

ance

55SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 56: SPred : Large-scale Harvesting of Semantic Predicates

SPred: a novel approach to large-scale harvesting of semantic predicates

Contributions

WCLSPred

WordNet

56

• Novel heuristics for linking arguments• High performance argument classifier• Freely available dataset of semantic predicates

SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 57: SPred : Large-scale Harvesting of Semantic Predicates

http://lcl.uniroma1.it/spred/

~ 1500 predicates

57SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 58: SPred : Large-scale Harvesting of Semantic Predicates

Thanks or…

m i

58SPred: Large-scale Harvesting of Semantic PredicatesFlati, Navigli

Page 59: SPred : Large-scale Harvesting of Semantic Predicates

Tiziano Flati

Linguistic Computing Laboratoryhttp://lcl.uniroma1.it

Joint work with Roberto Navigli