Upload
daniel-fernandez-alvarez
View
19
Download
1
Embed Size (px)
Citation preview
Inference and Serialization of Latent Graph
Schemata Using ShexSpeaker: Daniel Fernández-Álvarez
Category: Idea
Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* [email protected] [email protected] [email protected]
*Department of Computer ScienceWESO Research Group
University of OviedoOviedo, Spain
Motivational example
Motivation: Torimbia Beach
Motivation: Torimbia Beach
• Country: Spain• Region: Asturias• Council/city: Llanes• Lat/long: 43.44, -4.85• Length: 500 m• Width: 100 m• Naturist: True
Motivation: Torimbia Beach
*Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach
Region Lat/long Width
XXX
X X
6 different random but relevant beaches in DBPedia*
The same happens with country, council/city, length and naturist
MotivationI would like to…
check the concept of beach, not the instances
make a single query/click to discover usual schemata
be correct, coherent and exhaustive
Idea
Proposal
• Analysis of the neighborhood of nodes that fit in a certain condition to induce usual schemata:• Typical condition: rdf:type
• Serialization of inferred schemata with ShEx (Shape Expressions).• Association to a type (class)• Management of trustworthiness
• Handy for:• Documentation• Verification of quality• Discovering “hidden” entities
How?
Workflow
ShEx<Person> {
}
Source graph:Dbpedia,
Wikidata…
Inference Serialization
Abstract schemata
representation
Textual schemata representation
with ShEx
Schemata Inference: current approaches
• Ontology integration to find shared core elements [Zhao,13]
• Association rule mining (Apriori)• Rule-based classification (Decision Tables)
• Logical axioms at ontology level [Völker,11]
• Association rule mining (Apriori)• Axioms represented with OWL 2 EL
• Graph schemata al class level[Christodoulou,15]
• Clusters of similar individuals (ideally, cluster=class).• Results in an ad-hoc syntax.
Schemata Inference: our current status
Some promising ideas:Instance clustering
Association rule mining
Some issues linked to the target graph:Noise management
Adaptation to data modelGraph size & complexity
Completeness and coherence
Schemata Serialization I
Need: Standard syntax to express constraints in RDF graphs at class level:
• XML: RelaxNG, DTD, Xml Schema• Relational databases: DDL• Json: Json Schema
RDF candidates:
ShEx
Grammar-orientedRecursion
Human-friendly syntax
SHACL
Constraint-orientedNo recursion (by now)RDF syntax (by now)
19%
59%
83%
83%
87%
69%
32%
Schemata Serialization II
Pure ShEx
<Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, dbo:isPartOf @<Place>*}
Anotated ShEx
<Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, geo:geometry @<Point>, dbo:isPartOf @<Place>*, dbo:country @<Country>}
Use cases?
Context: Types of graphs
Specific purposeAutomatically built
Managed by a single agent
General purposeManually built
Managed by community
Reality
Context: Collaborative graphs
Key points:• Schemata are not planned, they just emerge
• Schemata change in time
Posibilities:• Schemata inference on users’ demand
• What is associated to a type, instead of how a type should be
• Freedom: ShEx as guide, not dogma
To summarize…
Conclusions and Future Work
What we have done:Idea
Inference of Latent Graph SchemataSerialization through ShEx syntax
What we want to do:Prototype
Selection of techniquesSelection of target source/s
TestsUsefulness in different domains
Feasibility: reached trustworthinessUser’s acceptance
References
• Zhao, L., & Ichise, R. (2013, May). Instance-based ontological knowledge acquisition. In Extended Semantic Web Conference (pp. 155-169). Springer Berlin Heidelberg.
• [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction. In Extended Semantic Web Conference (pp. 124-138). Springer Berlin Heidelberg.
• [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015). Structure inference for linked data sources using clustering. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XIX (pp. 1-25). Springer Berlin Heidelberg.
Inference and Serialization of Latent
Graph Schemata Using ShexSpeaker: Daniel Fernández-Álvarez
Category: Idea
Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* [email protected] [email protected] [email protected]
*Department of Computer ScienceWESO Research Group
University of OviedoOviedo, Spain
Extra information for Torimbia example I
Lat\long* Naturist
Batu Ferringhidbp:latd, dbp:longd, georss:point, geo:geometry, geo:lat, geo:long X
Horseshoe Bay geo:geometry, geo:lat, geo:long X
Manly Beachgeorss:point, geo:geometry, geo:lat, geo:long X
Marina Beachgeorss:point, geo:geometry, geo:lat, geo:long X
Playa Arcadiageorss:point, geo:geometry, geo:lat, geo:long X
Red Beachdbp:latDeg, dbp:longDeg, georss:point, geo:geometry, geo:lat, geo:long X
*Some lat/long properties has been omitted. Some of them work togheter in order to get a precise coordinate (total degrees + orientation N/S E/W)
Extra information for Torimbia example II
Lenght Width Council Region CountryBatu Ferringhi X X shared entity dbo:isPartOf dbo:country
Horseshoe Bay X X description description
rdf:type (BeachesOfBermuda)
Manly Beach X X description
dct:subject dbc:Beaches_of_New_South_Wales description
Marina Beach dbp:height description dct:subject dct:subjectPlaya Arcadia X X dct:subject X dct:subjectRed Beach X dbp:width dbp:city is dbp:south of description
Wikimedia Strategy: Templates and Mappings
• Mappings • Designed to automatically import data from Wikipedia’s infoboxes and tables
into DBpedia.• Wikipedia Templates define expected properties for certain types. Mappings
define which property should be used to create a triple when finding an occurrence of an expected property.
PROS
• Preserves Wikipedia’s quality.• Handy as guide for content
represented in Wikipedia.• It may enrich both Wikipedia and
DBpedia• Templates can evolve guided by
community
CONS
• Depends on Wikipedia’s quality.• It can only manage content
represented in Wikipedia.• Non transposable to standalone RDF
graph projects. • It assumes that the community is
following the templates. It may not reflect the real graph.
ShEx vs SHACLShEx
<UserShape> { dbp:label xsd:string, ex:role ( ex:User ) ?}
SHACL:UserShape a sh:Shape ; sh:property [ sh:predicate rdfs:label ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ] ; sh:property [ sh:predicate ex:role ; sh:hasValue ex:User ; sh:filterShape [ sh:property [ sh:predicate ex:role ; sh:minCount 1 ; ] ] ; sh:maxCount 1 ; ] .