25
Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* [email protected] [email protected] [email protected] *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain

Slides SEMAPRO 2016 University of Oviedo

Embed Size (px)

Citation preview

Page 1: Slides SEMAPRO 2016 University of Oviedo

Inference and Serialization of Latent Graph

Schemata Using ShexSpeaker: Daniel Fernández-Álvarez

Category: Idea

Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* [email protected] [email protected] [email protected]

*Department of Computer ScienceWESO Research Group

University of OviedoOviedo, Spain

Page 2: Slides SEMAPRO 2016 University of Oviedo

Motivational example

Page 3: Slides SEMAPRO 2016 University of Oviedo

Motivation: Torimbia Beach

Page 4: Slides SEMAPRO 2016 University of Oviedo

Motivation: Torimbia Beach

• Country: Spain• Region: Asturias• Council/city: Llanes• Lat/long: 43.44, -4.85• Length: 500 m• Width: 100 m• Naturist: True

Page 5: Slides SEMAPRO 2016 University of Oviedo

Motivation: Torimbia Beach

*Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach

Region Lat/long Width

XXX

X X

6 different random but relevant beaches in DBPedia*

The same happens with country, council/city, length and naturist

Page 6: Slides SEMAPRO 2016 University of Oviedo

MotivationI would like to…

check the concept of beach, not the instances

make a single query/click to discover usual schemata

be correct, coherent and exhaustive

Page 7: Slides SEMAPRO 2016 University of Oviedo

Idea

Page 8: Slides SEMAPRO 2016 University of Oviedo

Proposal

• Analysis of the neighborhood of nodes that fit in a certain condition to induce usual schemata:• Typical condition: rdf:type

• Serialization of inferred schemata with ShEx (Shape Expressions).• Association to a type (class)• Management of trustworthiness

• Handy for:• Documentation• Verification of quality• Discovering “hidden” entities

Page 9: Slides SEMAPRO 2016 University of Oviedo

How?

Page 10: Slides SEMAPRO 2016 University of Oviedo

Workflow

ShEx<Person> {

}

Source graph:Dbpedia,

Wikidata…

Inference Serialization

Abstract schemata

representation

Textual schemata representation

with ShEx

Page 11: Slides SEMAPRO 2016 University of Oviedo

Schemata Inference: current approaches

• Ontology integration to find shared core elements [Zhao,13]

• Association rule mining (Apriori)• Rule-based classification (Decision Tables)

• Logical axioms at ontology level [Völker,11]

• Association rule mining (Apriori)• Axioms represented with OWL 2 EL

• Graph schemata al class level[Christodoulou,15]

• Clusters of similar individuals (ideally, cluster=class).• Results in an ad-hoc syntax.

Page 12: Slides SEMAPRO 2016 University of Oviedo

Schemata Inference: our current status

Some promising ideas:Instance clustering

Association rule mining

Some issues linked to the target graph:Noise management

Adaptation to data modelGraph size & complexity

Completeness and coherence

Page 13: Slides SEMAPRO 2016 University of Oviedo

Schemata Serialization I

Need: Standard syntax to express constraints in RDF graphs at class level:

• XML: RelaxNG, DTD, Xml Schema• Relational databases: DDL• Json: Json Schema

RDF candidates:

ShEx

Grammar-orientedRecursion

Human-friendly syntax

SHACL

Constraint-orientedNo recursion (by now)RDF syntax (by now)

Page 14: Slides SEMAPRO 2016 University of Oviedo

19%

59%

83%

83%

87%

69%

32%

Schemata Serialization II

Pure ShEx

<Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, dbo:isPartOf @<Place>*}

Anotated ShEx

<Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, geo:geometry @<Point>, dbo:isPartOf @<Place>*, dbo:country @<Country>}

Page 15: Slides SEMAPRO 2016 University of Oviedo

Use cases?

Page 16: Slides SEMAPRO 2016 University of Oviedo

Context: Types of graphs

Specific purposeAutomatically built

Managed by a single agent

General purposeManually built

Managed by community

Reality

Page 17: Slides SEMAPRO 2016 University of Oviedo

Context: Collaborative graphs

Key points:• Schemata are not planned, they just emerge

• Schemata change in time

Posibilities:• Schemata inference on users’ demand

• What is associated to a type, instead of how a type should be

• Freedom: ShEx as guide, not dogma

Page 18: Slides SEMAPRO 2016 University of Oviedo

To summarize…

Page 19: Slides SEMAPRO 2016 University of Oviedo

Conclusions and Future Work

What we have done:Idea

Inference of Latent Graph SchemataSerialization through ShEx syntax

What we want to do:Prototype

Selection of techniquesSelection of target source/s

TestsUsefulness in different domains

Feasibility: reached trustworthinessUser’s acceptance

Page 20: Slides SEMAPRO 2016 University of Oviedo

References

• Zhao, L., & Ichise, R. (2013, May). Instance-based ontological knowledge acquisition. In Extended Semantic Web Conference (pp. 155-169). Springer Berlin Heidelberg.

• [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction. In Extended Semantic Web Conference (pp. 124-138). Springer Berlin Heidelberg.

• [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015). Structure inference for linked data sources using clustering. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XIX (pp. 1-25). Springer Berlin Heidelberg.

Page 21: Slides SEMAPRO 2016 University of Oviedo

Inference and Serialization of Latent

Graph Schemata Using ShexSpeaker: Daniel Fernández-Álvarez

Category: Idea

Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* [email protected] [email protected] [email protected]

*Department of Computer ScienceWESO Research Group

University of OviedoOviedo, Spain

Page 22: Slides SEMAPRO 2016 University of Oviedo

Extra information for Torimbia example I

Lat\long* Naturist

Batu Ferringhidbp:latd, dbp:longd, georss:point, geo:geometry, geo:lat, geo:long X

Horseshoe Bay geo:geometry, geo:lat, geo:long X

Manly Beachgeorss:point, geo:geometry, geo:lat, geo:long X

Marina Beachgeorss:point, geo:geometry, geo:lat, geo:long X

Playa Arcadiageorss:point, geo:geometry, geo:lat, geo:long X

Red Beachdbp:latDeg, dbp:longDeg, georss:point, geo:geometry, geo:lat, geo:long X

*Some lat/long properties has been omitted. Some of them work togheter in order to get a precise coordinate (total degrees + orientation N/S E/W)

Page 23: Slides SEMAPRO 2016 University of Oviedo

Extra information for Torimbia example II

Lenght Width Council Region CountryBatu Ferringhi X X shared entity dbo:isPartOf dbo:country

Horseshoe Bay X X description description

rdf:type (BeachesOfBermuda)

Manly Beach X X description

dct:subject dbc:Beaches_of_New_South_Wales description

Marina Beach dbp:height description dct:subject dct:subjectPlaya Arcadia X X dct:subject X dct:subjectRed Beach X dbp:width dbp:city is dbp:south of description

Page 24: Slides SEMAPRO 2016 University of Oviedo

Wikimedia Strategy: Templates and Mappings

• Mappings • Designed to automatically import data from Wikipedia’s infoboxes and tables

into DBpedia.• Wikipedia Templates define expected properties for certain types. Mappings

define which property should be used to create a triple when finding an occurrence of an expected property.

PROS

• Preserves Wikipedia’s quality.• Handy as guide for content

represented in Wikipedia.• It may enrich both Wikipedia and

DBpedia• Templates can evolve guided by

community

CONS

• Depends on Wikipedia’s quality.• It can only manage content

represented in Wikipedia.• Non transposable to standalone RDF

graph projects. • It assumes that the community is

following the templates. It may not reflect the real graph.

Page 25: Slides SEMAPRO 2016 University of Oviedo

ShEx vs SHACLShEx

<UserShape> { dbp:label xsd:string, ex:role ( ex:User ) ?}

SHACL:UserShape a sh:Shape ; sh:property [ sh:predicate rdfs:label ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ] ; sh:property [ sh:predicate ex:role ; sh:hasValue ex:User ; sh:filterShape [ sh:property [ sh:predicate ex:role ; sh:minCount 1 ; ] ] ; sh:maxCount 1 ; ] .