108
1 From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction Valentina Presutti Andrea Giovanni Nuzzolese, Sergio Consoli, Diego Reforgiato Recupero, Aldo Gangemi STLab-ISTC, CNR, Italy Extends “Uncovering the Semantics of Wikipedia pagelinks” Accepted as combined conference/journal paper @ EKAW 2014

From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction

Embed Size (px)

Citation preview

1

From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction

Valentina Presutti Andrea Giovanni Nuzzolese, Sergio Consoli,

Diego Reforgiato Recupero, Aldo Gangemi

STLab-ISTC, CNR, Italy

Extends “Uncovering the Semantics of Wikipedia pagelinks”

Accepted as combined conference/journal paper @ EKAW 2014

• Some background • Motivation and goal • Introducing Open Knowledge Extraction (OKE) • A method for extracting semantic relations

from hyperlinks • Legalo and Legalo-Wikipedia: implementations • Evaluation results • Conclusion and future work

2

Outline

3

The Web as envisioned in 1989 by Tim Berners-Lee

URIs and links having types

relationships amongst named objects

4

The current Web vs the Semantic Web (Marja-Riitta Koivunen and Eric Miller 2001)

resources (web documents) and links

Resources and links that can have explicit types

To populate the web with machine understandable information so that artificial intelligences (AI) can use it as background

knowledge for assisting humans in performing a significant number of their daily tasks

5

The Semantic Web vision

• Standardised knowledge representation and query languages: OWL, RDF, SPARQL

• Web of Data: huge amount of distributed, interlinked and formally represented data (linked data) -> related to the “open data” initiative

6

today…

7

Semantic Web standard knowledge representation languages

The RDF data model is based on triples

Each entity has a URI that identifies it in the global web space

subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data

dbpedia:Paris*

dbpedia:France

“Paris”

dbpedia-owl:country

rdfs:label*dbpedia: <http://dbpedia.org/>

7

Semantic Web standard knowledge representation languages

The RDF data model is based on triples

Each entity has a URI that identifies it in the global web space

subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data

dbpedia:Paris*

dbpedia:France

“Paris”

dbpedia-owl:country

http://www.france.fr

foaf:homepage

rdfs:label*dbpedia: <http://dbpedia.org/>

8

May 2007

8

from 2 million (2007) to 31 billion (2011) triples

• Most of the Web of Data derived from structured data (typically databases) or semi-structured data (e.g. Wikipedia infoboxes)

• Web content is mostly natural language text (web sites, news, forums, reviews, etc.)

• Such content is highly valuable for the Semantic Web (question answering, opinion mining, knowledge summarisation, etc.)

9

Limits and motivation

To extract as much relevant knowledge as possible from web textual content and

publish it in the form of Semantic Web triples

10

Overall Goal

• Most attention to enrich the Web of Data by learning standard relations: e.g., membership (rdf:type), class taxonomy (rdfs:subClassOf), entity linking (owl:sameAs)

• What about general factual relations? • e.g. part, participation, causality,

location, friendship, etc.

11

Approaches to linked data and ontology learning

• Distant supervision: maps textual sentences to linked data triples

• no support for n-ary relation

• bound to already defined relation types

• Open Information Extraction: creates resources of triples composed of text fragments (subject, relational phrase, object)

• not directly reusable as linked data

Open Challenges:

-> new relation discovery (both facts and relation types)

-> methods for automatically formalising discovered knowledge

-> usable label generation for new discovered relations

12

Relation extraction

“Paris is the capital city of France” dbpedia:Paris dbpedia-owl:country dbpedia:France

“John Stigall received a Bachelor of arts” ?

“John Stigall received a Bachelor of arts” (John Stigall, received, a Bachelor of arts)

Open Knowledge Extraction: • unsupervised -> no need of huge annotated

corpora for training • open-domain -> not bound to specific domain • abstractive -> model the text, use external

knowledge resources, use summarisation and language generation techniques

• producing formalised output -> directly reusable as linked data

13

Contribution

14

OIE vs OKE

Subject Predicate Object Approach

John Stigall received a Bachelor of arts extractive

John Stigall received from the State University of New York aat Cortland

extractive

dbpedia:John_Stigall

myprop:receivesAcademicDegree

dbpedia:Bachelor_of_Arts

abstractive

dbpedia:John_Stigall

myprop:receivesAcademicDegreeFrom

dbpedia:State_University_of_New_York

abstractive

“John Stigall received a Bachelor of arts from the State University of New York at Cortland “

15

HypothesisHyperlinks are pragmatic traces of semantic relations

between two entities

“McCarthy often commented on world

affairs on the Usenet forums”

Texts surrounding hyperlinks are linguistic traces of semantic relations between two entities

Pragmatic trace: presence of a hyperlink in a web page

16

He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.

Pragmatic trace: presence of a hyperlink in a web page

16

He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.

?

Linguistic trace: text surrounding a hyperlink

17

He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.

John MCCarthy

?

Linguistic trace: text surrounding a hyperlinkdbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp

(DBpedia) dbpo:wikiPageWikiLink

17

He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.

John MCCarthy

?

Linguistic trace: text surrounding a hyperlink

dbpedia:John_McCarthy myont:developed dbpedia:Lisp

(Legalo) legalo:developProgrammingLanguage

17

He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.

John MCCarthy

?

Linguistic trace: text surrounding a hyperlink

dbpedia:John_McCarthy myont:developed dbpedia:Lisp

(Legalo) legalo:developProgrammingLanguage

rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops

17

He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.

John MCCarthy

?

“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”

18

Hyperlinks

Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools

“McCarthy often commented on world

affairs on the Usenet forums”

“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”

18

Hyperlinks

Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools

“McCarthy often commented on world

affairs on the Usenet forums”

“McCarthy often commented on world

affairs on the Usenet forums”

“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”

18

Hyperlinks

Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools

“McCarthy often commented on world

affairs on the Usenet forums”

“McCarthy often commented on world

affairs on the Usenet forums”

“Joey Foster Ellis has published onThe New York Times, and The Wall

Street Journal.”

19

What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”“McCarthy often commented on world

affairs on the Usenet forums”

19

What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”“McCarthy often commented on world

affairs on the Usenet forums”

McCarthy and Usenet are (probably) related

19

What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”“McCarthy often commented on world

affairs on the Usenet forums”

dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet

McCarthy and Usenet are (probably) related

19

What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”“McCarthy often commented on world

affairs on the Usenet forums”

dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet

McCarthy and Usenet are (probably) related

The following entity pairs could be related:

Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times

The New York Times, The Wall Street Journal

19

What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”“McCarthy often commented on world

affairs on the Usenet forums”

dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet

McCarthy and Usenet are (probably) related

The following entity pairs could be related:

Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times

The New York Times, The Wall Street Journal

dbpedia:Joey_Foster_Ellis dul:associateWith

dbpedia:The_New_York_Times

19

What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”

“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”“McCarthy often commented on world

affairs on the Usenet forums”

dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet

McCarthy and Usenet are (probably) related

The following entity pairs could be related:

Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times

The New York Times, The Wall Street Journal

dbpedia:Joey_Foster_Ellis dul:associateWith

dbpedia:The_New_York_Times

“McCarthy often commented on world

affairs on the Usenet forums”

Linguistic traces provides us the meaning of relevant relations

20

“Joey Foster Ellis has published on The New York Times, and The Wall

Street Journal.”s

{(esubj, eobj)i}Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times

The New York Times, The Wall Street Journal

φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times)

Natural language sentence

Set of entity pairs

Relevant relations evidenced by s

21

Relevant Relation Assessment

φs(esubj, eobj)i

φs(Joey Foster Ellis, The New York Times)

“Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”

φs(The Wall Street Journal, The New York Times)

22

Usable predicate generation

λ ~ λi , λ

i ∈ Λ

Λ ≡ {λ1, ..., λn}labels for φs(esubj, eobj)i

defined for a LOD vocabulary

publish on

φs(Joey Foster Ellis, The New York Times)“Joey Foster Ellis has published

on The New York Times, and The Wall Street Journal.”

(has) published on (has) published on journal (is) author for writes for

23

OWL/RDF formalisation

dbpedia:Joey_Foster_Ellis legalo:publishOn dbpedia:The_New_York_Times .

legalo:publishOn a owl:ObjectProperty ; rdfs:range … ; rdfs:domain … ; … .

φs(Joey Foster Ellis, The New York Times)“Joey Foster Ellis has published

on The New York Times, and The Wall Street Journal.”

A novel Open Knowledge Extraction (OKE) approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable data

Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia

Their evaluation performed with the help of crowdsourcing and the creation of a gold standard, all showing high values of precision, recall, and accuracy

24

Contribution

http://wit.istc.cnr.it/stlab-tools/legalo

http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia

http://wit.istc.cnr.it/kore-dev/legalo Developing version

http://wit.istc.cnr.it/stlab-tools/fred

http://wit.istc.cnr.it/kore-dev/fred

25

Method and implementation

25

Method and implementation

Frame-based formal representation of s

25

Method and implementation

Frame-based formal representation of s

Relation assessment

25

Method and implementation

Frame-based formal representation of s

Label generation (extraction+abstraction)

Relation assessment

25

Method and implementation

Frame-based formal representation of s

Label generation (extraction+abstraction)

Formalisation

Relation assessment

25

Method and implementation

Frame-based formal representation of s

Label generation (extraction+abstraction)

Formalisation

Alignment to existing LOD vocabularies

Relation assessment

25

Method and implementation

Frame-based formal representation of s

Label generation (extraction+abstraction)

Formalisation

Alignment to existing LOD vocabularies

Relation assessment

Linguistic traces of Wikipedia pagelinks

subjects and objects are given by DBpedia pagelinks

26

Let’s see how it works in detail…

27

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

27

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

28

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

28

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

28

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

28

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

28

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

28

Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

29

Relation assessment“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

29

Relation assessment“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

30

Relation assessment“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”

30

Relation assessment

G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

30

Relation assessment

G’=(V,E’) is the undirected version of G=(V,E)

G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

30

Relation assessment

G’=(V,E’) is the undirected version of G=(V,E)

G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}

vi ∈ V is the node in G representing the entity ei mentioned in s.

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

30

Relation assessment

P(vsubj,vobj)=[v0,edge1,…,edgen,vn], Psetsubj,obj ≡ {edge1, v1…, vn-1, edgen}

G’=(V,E’) is the undirected version of G=(V,E)

G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}

vi ∈ V is the node in G representing the entity ei mentioned in s.

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

31

Necessary condition: φs(esubj , eobj ) ⇒ ∃P(vsubj , vobj )

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

31

Necessary condition: φs(esubj , eobj ) ⇒ ∃P(vsubj , vobj )

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

31

Necessary condition: φs(esubj , eobj ) ⇒ ∃P(vsubj , vobj )

φs(Rico Lebrun , Michelangelo ) ⇒ ∃P(fred:Rico_Lebrun, fred:Michelangelo)

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

32

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

32

Let f be a node of G such that dul:Event(f)

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

32

Let f be a node of G such that dul:Event(f)Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

32

Let f be a node of G such that dul:Event(f)Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in fAgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive rolesAgRole ⊆ Role

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

32

Let f be a node of G such that dul:Event(f)Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in fAgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive rolesAgRole ⊆ Role

Sufficient condition:φs(esubj , eobj ) ⇐

∃P (vsubj , vobj ) and ∃f ∈ Psetsubj,obj such that dul:Event(f) and ρ(f,vsubj)∈AgRole

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

33

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Relation assessment

33

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Subject

Object

Relation assessment

33

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Subject

Object

Relation assessment

33

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Subject Object

Relation assessment

33

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Subject Object

Relation assessment

34

Semantic Web triples and properties generation

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

34

Semantic Web triples and properties generation

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

35

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

SubjectObject

35

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

SubjectObject

teach

35

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

SubjectObject

teach art

35

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

SubjectObject

legalo:teachArtAt

teach art at

36

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

Subject

Object

36

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

Subject

Objectteach

36

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”

Subject

Object

legalo:teachAbout

teach about

37

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a

lifelong affinity for Goya and Picasso.”

Semantic Web triples and properties generation

38

“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the

Disney Studios. He was influenced by Michelangelo and maintained a lifelong

affinity for Goya and Picasso.”

Semantic Web triples and properties generation

dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts . s:teachAbout a owl:ObjectProperty ;

rdfs:subPropertyOf fred:Teach; rdfs:domain wibi:Artist ; rdfs:range wibi:Art ;

grounding:definedFromFormalRepresentation fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ;

grounding:derivedFromLinguisticEvidence s:linguisticEvidence ; owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) .

_:b2 a alignment:Cell ; alignment:entity1 s:teachAbout ; alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ; alignment:measure "0.846"^xsd:float ; alignment:relation "equivalence" .

domain, range, subsumption

linguistic and formal scope

alignment to existing LOD vocabularies

39

Evaluation

40

Evaluation sample

*https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research

Crel−extraction*: 5 datasets each dedicated to a specific relation

{“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},{"rater":"12022408018620867151","judgment":"yes"}]}

41

Evaluation sample

Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation”

Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation”

Cgeneral: random sample of 60 snippets from all Crel−extraction datasets

then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”

42

Hypothesis 1: Relevant relation assessment

Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s:

∃φ.φs (esubj , eobj )

φs(esubj, eobj)i

43

Evaluation results for Relevant Relation Assessment

The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores

44

Hypothesis 2: Usable predicate generation

Legalo is able to generate a usable predicate λ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and λi a label generated by a human for φs the following holds:

λ ~ λi , λi ∈ Λ

λ ~ λi , λ

i ∈ Λ

Λ ≡ {λ1, ..., λn}labels for φs(esubj, eobj)i

defined for a LOD vocabulary

45

Evaluation results for Usable Predicate Generation

Agree = 1, Partly Agree = 0.5, Disagree = 0

The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores

• Similarity score between human created {λi} and Legalo λ, for a φs(esubj, eobj)i

• Jaccard distance measure (string similarity)

• Semantic similarity measure based on the SimLibrary framework*

46

Evaluation results for Usable Predicate Generation

Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence

5 Any 0.63 0.80 0.59

* G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010

47

Legalo-Wikipedia (EKAW 2014)

3 raters, computer science, non-familiar with Legalo, familiar with Linked Data

• Open Knowledge Extraction: unsupervised, open-domain, abstractive, producing OWL/RDF output • OKE for relation extraction and its

implementations • An evaluation showing high performance for both

tasks: relevant relation assessment, usable predicate generation

48

Conclusion

49

Where are we heading?

• Improving frequent-subgraph heuristics • Strategies for alignment to existing LOD

vocabularies (or to ODPs?) • Reconciliation of properties, out of sentence scope • Using Legalo for abstractive summarisation tasks • Using Legalo for language generation • Identifying ways to directly compare to other

approaches

50

Thanks for your attention

http://wit.istc.cnr.it/stlab-tools/

51

Evaluation sample

*https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research

Crel−extraction*: 5 datasets each dedicated to a specific relation

{“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},{"rater":"12022408018620867151","judgment":"yes"}]}

52

Evaluation sample

Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation”

Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation”

Cgeneral: random sample of 60 snippets from all Crel−extraction datasets

then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”

53

Hypothesis 1: Relevant relation assessment

Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s:

∃φ.φs (esubj , eobj )

φs(esubj, eobj)i

54

Crowdsourcing tasks for Relevant Relation Assessment

Task 1:

Assessing if a sentence s is an evidence of a referenced relation (i.e. either “institution” or “education”) between two entities, mentioned in s.

Based on data from Cinstitution and Ceducation, respectively

Task 2:

Assessing if a sentence s is an evidence of any relation between two given entities mentioned in s.

Based on data from Cgeneral

At least 3 raters with t>0.70 Answer could be “yes” or “no”

55

Evaluation results for Relevant Relation Assessment

The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores

56

Hypothesis 2: Usable predicate generation

Legalo is able to generate a usable predicate λ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and λi a label generated by a human for φs the following holds:

λ ~ λi , λi ∈ Λ

λ ~ λi , λ

i ∈ Λ

Λ ≡ {λ1, ..., λn}labels for φs(esubj, eobj)i

defined for a LOD vocabulary

57

Crowdsourcing tasks for Usable predicate generation

Task 3:

Judging if λ generated by a machine expresses is a good summarisation of a specific relation (i.e. either “institution” or “education”) between two given entities mentioned in s, according to the evidence provided by s.

Based on data from Cinstitution and Ceducation, respectively

Task 4:

judging if a predicate λ is a good summarisation of a relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s.

Based on data from Cgeneral

At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree

58

Evaluation results for Usable Predicate Generation

Agree = 1, Partly Agree = 0.5, Disagree = 0

The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores

59

Crowdsourcing tasks for Usable predicate generation

Task 5:

Creating a phrase λ that summarises the relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s.

Based on data from Cgeneral

At least 3 raters with t>0.60 Answer was open

• Similarity score between human created {λi} and Legalo λ, for a φs(esubj, eobj)i

• Jaccard distance measure (string similarity)

• Semantic similarity measure based on the SimLibrary framework*

60

Evaluation results for Usable Predicate Generation

Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence

5 Any 0.63 0.80 0.59

* G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010

61

Legalo-Wikipedia (EKAW 2014)

3 raters, computer science, non-familiar with Legalo, familiar with Linked Data

62

Confidence scoreGiven a task unit u, a set of possible judgements {ji}, with i = 1, ...n, {tk} a set of trust scores each representing a rater, with k=1,…,m

tsum=sumk=1,…m(tk) is the sum of trust scores of raters giving judgement on u

trust(ji) is the sum of tk values of raters that choose judgement ji

confidence(ji, u) for judgement ji

on the task unit u is

confidence(“yes”, 582275117) = = (0.95+0.98)/2.82 = 0.68

confidence(“no”, 582275117) = = (0.89/2.82 = 0.31