Upload
stlabcnr
View
379
Download
1
Tags:
Embed Size (px)
Citation preview
1
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
Valentina Presutti Andrea Giovanni Nuzzolese, Sergio Consoli,
Diego Reforgiato Recupero, Aldo Gangemi
STLab-ISTC, CNR, Italy
Extends “Uncovering the Semantics of Wikipedia pagelinks”
Accepted as combined conference/journal paper @ EKAW 2014
• Some background • Motivation and goal • Introducing Open Knowledge Extraction (OKE) • A method for extracting semantic relations
from hyperlinks • Legalo and Legalo-Wikipedia: implementations • Evaluation results • Conclusion and future work
2
Outline
3
The Web as envisioned in 1989 by Tim Berners-Lee
URIs and links having types
relationships amongst named objects
4
The current Web vs the Semantic Web (Marja-Riitta Koivunen and Eric Miller 2001)
resources (web documents) and links
Resources and links that can have explicit types
To populate the web with machine understandable information so that artificial intelligences (AI) can use it as background
knowledge for assisting humans in performing a significant number of their daily tasks
5
The Semantic Web vision
• Standardised knowledge representation and query languages: OWL, RDF, SPARQL
• Web of Data: huge amount of distributed, interlinked and formally represented data (linked data) -> related to the “open data” initiative
6
today…
7
Semantic Web standard knowledge representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
“Paris”
dbpedia-owl:country
rdfs:label*dbpedia: <http://dbpedia.org/>
7
Semantic Web standard knowledge representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
“Paris”
dbpedia-owl:country
http://www.france.fr
foaf:homepage
rdfs:label*dbpedia: <http://dbpedia.org/>
• Most of the Web of Data derived from structured data (typically databases) or semi-structured data (e.g. Wikipedia infoboxes)
• Web content is mostly natural language text (web sites, news, forums, reviews, etc.)
• Such content is highly valuable for the Semantic Web (question answering, opinion mining, knowledge summarisation, etc.)
9
Limits and motivation
To extract as much relevant knowledge as possible from web textual content and
publish it in the form of Semantic Web triples
10
Overall Goal
• Most attention to enrich the Web of Data by learning standard relations: e.g., membership (rdf:type), class taxonomy (rdfs:subClassOf), entity linking (owl:sameAs)
• What about general factual relations? • e.g. part, participation, causality,
location, friendship, etc.
11
Approaches to linked data and ontology learning
• Distant supervision: maps textual sentences to linked data triples
• no support for n-ary relation
• bound to already defined relation types
• Open Information Extraction: creates resources of triples composed of text fragments (subject, relational phrase, object)
• not directly reusable as linked data
Open Challenges:
-> new relation discovery (both facts and relation types)
-> methods for automatically formalising discovered knowledge
-> usable label generation for new discovered relations
12
Relation extraction
“Paris is the capital city of France” dbpedia:Paris dbpedia-owl:country dbpedia:France
“John Stigall received a Bachelor of arts” ?
“John Stigall received a Bachelor of arts” (John Stigall, received, a Bachelor of arts)
Open Knowledge Extraction: • unsupervised -> no need of huge annotated
corpora for training • open-domain -> not bound to specific domain • abstractive -> model the text, use external
knowledge resources, use summarisation and language generation techniques
• producing formalised output -> directly reusable as linked data
13
Contribution
14
OIE vs OKE
Subject Predicate Object Approach
John Stigall received a Bachelor of arts extractive
John Stigall received from the State University of New York aat Cortland
extractive
dbpedia:John_Stigall
myprop:receivesAcademicDegree
dbpedia:Bachelor_of_Arts
abstractive
dbpedia:John_Stigall
myprop:receivesAcademicDegreeFrom
dbpedia:State_University_of_New_York
abstractive
“John Stigall received a Bachelor of arts from the State University of New York at Cortland “
15
HypothesisHyperlinks are pragmatic traces of semantic relations
between two entities
“McCarthy often commented on world
affairs on the Usenet forums”
Texts surrounding hyperlinks are linguistic traces of semantic relations between two entities
Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
?
Linguistic trace: text surrounding a hyperlink
17
He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlinkdbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp
(DBpedia) dbpo:wikiPageWikiLink
17
He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo) legalo:developProgrammingLanguage
17
He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo) legalo:developProgrammingLanguage
rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops
17
He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
John MCCarthy
?
“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published onThe New York Times, and The Wall
Street Journal.”
19
What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
19
What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
McCarthy and Usenet are (probably) related
19
What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are (probably) related
19
What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are (probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
19
What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are (probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis dul:associateWith
dbpedia:The_New_York_Times
19
What can we derive from a hyperlink?“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”
“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are (probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis dul:associateWith
dbpedia:The_New_York_Times
“McCarthy often commented on world
affairs on the Usenet forums”
Linguistic traces provides us the meaning of relevant relations
20
“Joey Foster Ellis has published on The New York Times, and The Wall
Street Journal.”s
{(esubj, eobj)i}Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times)
Natural language sentence
Set of entity pairs
Relevant relations evidenced by s
21
Relevant Relation Assessment
φs(esubj, eobj)i
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”
φs(The Wall Street Journal, The New York Times)
22
Usable predicate generation
λ ~ λi , λ
i ∈ Λ
Λ ≡ {λ1, ..., λn}labels for φs(esubj, eobj)i
defined for a LOD vocabulary
publish on
φs(Joey Foster Ellis, The New York Times)“Joey Foster Ellis has published
on The New York Times, and The Wall Street Journal.”
(has) published on (has) published on journal (is) author for writes for
23
OWL/RDF formalisation
dbpedia:Joey_Foster_Ellis legalo:publishOn dbpedia:The_New_York_Times .
legalo:publishOn a owl:ObjectProperty ; rdfs:range … ; rdfs:domain … ; … .
φs(Joey Foster Ellis, The New York Times)“Joey Foster Ellis has published
on The New York Times, and The Wall Street Journal.”
A novel Open Knowledge Extraction (OKE) approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable data
Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia
Their evaluation performed with the help of crowdsourcing and the creation of a gold standard, all showing high values of precision, recall, and accuracy
24
Contribution
http://wit.istc.cnr.it/stlab-tools/legalo
http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia
http://wit.istc.cnr.it/kore-dev/legalo Developing version
http://wit.istc.cnr.it/stlab-tools/fred
http://wit.istc.cnr.it/kore-dev/fred
25
Method and implementation
Frame-based formal representation of s
Label generation (extraction+abstraction)
Relation assessment
25
Method and implementation
Frame-based formal representation of s
Label generation (extraction+abstraction)
Formalisation
Relation assessment
25
Method and implementation
Frame-based formal representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing LOD vocabularies
Relation assessment
25
Method and implementation
Frame-based formal representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing LOD vocabularies
Relation assessment
Linguistic traces of Wikipedia pagelinks
subjects and objects are given by DBpedia pagelinks
27
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
27
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
29
Relation assessment“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
29
Relation assessment“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
30
Relation assessment“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
30
Relation assessment
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
vi ∈ V is the node in G representing the entity ei mentioned in s.
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
P(vsubj,vobj)=[v0,edge1,…,edgen,vn], Psetsubj,obj ≡ {edge1, v1…, vn-1, edgen}
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
vi ∈ V is the node in G representing the entity ei mentioned in s.
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
31
Necessary condition: φs(esubj , eobj ) ⇒ ∃P(vsubj , vobj )
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
31
Necessary condition: φs(esubj , eobj ) ⇒ ∃P(vsubj , vobj )
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
31
Necessary condition: φs(esubj , eobj ) ⇒ ∃P(vsubj , vobj )
φs(Rico Lebrun , Michelangelo ) ⇒ ∃P(fred:Rico_Lebrun, fred:Michelangelo)
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in fAgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive rolesAgRole ⊆ Role
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in fAgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive rolesAgRole ⊆ Role
Sufficient condition:φs(esubj , eobj ) ⇐
∃P (vsubj , vobj ) and ∃f ∈ Psetsubj,obj such that dul:Event(f) and ρ(f,vsubj)∈AgRole
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject
Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject
Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject Object
Relation assessment
34
Semantic Web triples and properties generation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
34
Semantic Web triples and properties generation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
SubjectObject
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
SubjectObject
teach
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
SubjectObject
teach art
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
SubjectObject
legalo:teachArtAt
teach art at
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
Subject
Object
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
Subject
Objectteach
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from”
Subject
Object
legalo:teachAbout
teach about
37
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
38
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a lifelong
affinity for Goya and Picasso.”
Semantic Web triples and properties generation
dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts . s:teachAbout a owl:ObjectProperty ;
rdfs:subPropertyOf fred:Teach; rdfs:domain wibi:Artist ; rdfs:range wibi:Art ;
grounding:definedFromFormalRepresentation fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ;
grounding:derivedFromLinguisticEvidence s:linguisticEvidence ; owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) .
_:b2 a alignment:Cell ; alignment:entity1 s:teachAbout ; alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ; alignment:measure "0.846"^xsd:float ; alignment:relation "equivalence" .
domain, range, subsumption
linguistic and formal scope
alignment to existing LOD vocabularies
40
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research
Crel−extraction*: 5 datasets each dedicated to a specific relation
{“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},{"rater":"12022408018620867151","judgment":"yes"}]}
41
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation”
Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation”
Cgeneral: random sample of 60 snippets from all Crel−extraction datasets
then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”
42
Hypothesis 1: Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s:
∃φ.φs (esubj , eobj )
φs(esubj, eobj)i
43
Evaluation results for Relevant Relation Assessment
The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
44
Hypothesis 2: Usable predicate generation
Legalo is able to generate a usable predicate λ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and λi a label generated by a human for φs the following holds:
λ ~ λi , λi ∈ Λ
λ ~ λi , λ
i ∈ Λ
Λ ≡ {λ1, ..., λn}labels for φs(esubj, eobj)i
defined for a LOD vocabulary
45
Evaluation results for Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
• Similarity score between human created {λi} and Legalo λ, for a φs(esubj, eobj)i
• Jaccard distance measure (string similarity)
• Semantic similarity measure based on the SimLibrary framework*
46
Evaluation results for Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
47
Legalo-Wikipedia (EKAW 2014)
3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
• Open Knowledge Extraction: unsupervised, open-domain, abstractive, producing OWL/RDF output • OKE for relation extraction and its
implementations • An evaluation showing high performance for both
tasks: relevant relation assessment, usable predicate generation
48
Conclusion
49
Where are we heading?
• Improving frequent-subgraph heuristics • Strategies for alignment to existing LOD
vocabularies (or to ODPs?) • Reconciliation of properties, out of sentence scope • Using Legalo for abstractive summarisation tasks • Using Legalo for language generation • Identifying ways to directly compare to other
approaches
50
Thanks for your attention
http://wit.istc.cnr.it/stlab-tools/
51
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research
Crel−extraction*: 5 datasets each dedicated to a specific relation
{“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},{"rater":"12022408018620867151","judgment":"yes"}]}
52
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation”
Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation”
Cgeneral: random sample of 60 snippets from all Crel−extraction datasets
then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”
53
Hypothesis 1: Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s:
∃φ.φs (esubj , eobj )
φs(esubj, eobj)i
54
Crowdsourcing tasks for Relevant Relation Assessment
Task 1:
Assessing if a sentence s is an evidence of a referenced relation (i.e. either “institution” or “education”) between two entities, mentioned in s.
Based on data from Cinstitution and Ceducation, respectively
Task 2:
Assessing if a sentence s is an evidence of any relation between two given entities mentioned in s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be “yes” or “no”
55
Evaluation results for Relevant Relation Assessment
The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
56
Hypothesis 2: Usable predicate generation
Legalo is able to generate a usable predicate λ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and λi a label generated by a human for φs the following holds:
λ ~ λi , λi ∈ Λ
λ ~ λi , λ
i ∈ Λ
Λ ≡ {λ1, ..., λn}labels for φs(esubj, eobj)i
defined for a LOD vocabulary
57
Crowdsourcing tasks for Usable predicate generation
Task 3:
Judging if λ generated by a machine expresses is a good summarisation of a specific relation (i.e. either “institution” or “education”) between two given entities mentioned in s, according to the evidence provided by s.
Based on data from Cinstitution and Ceducation, respectively
Task 4:
judging if a predicate λ is a good summarisation of a relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree
58
Evaluation results for Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
59
Crowdsourcing tasks for Usable predicate generation
Task 5:
Creating a phrase λ that summarises the relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.60 Answer was open
• Similarity score between human created {λi} and Legalo λ, for a φs(esubj, eobj)i
• Jaccard distance measure (string similarity)
• Semantic similarity measure based on the SimLibrary framework*
60
Evaluation results for Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. Pirró and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
61
Legalo-Wikipedia (EKAW 2014)
3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
62
Confidence scoreGiven a task unit u, a set of possible judgements {ji}, with i = 1, ...n, {tk} a set of trust scores each representing a rater, with k=1,…,m
tsum=sumk=1,…m(tk) is the sum of trust scores of raters giving judgement on u
trust(ji) is the sum of tk values of raters that choose judgement ji
confidence(ji, u) for judgement ji
on the task unit u is
confidence(“yes”, 582275117) = = (0.95+0.98)/2.82 = 0.68
confidence(“no”, 582275117) = = (0.89/2.82 = 0.31