49
Why do they call it Linked Data when they want to say…? Keynote at The 6th International Workshop on Consuming Linked Data (COLD) 12/10/2015 Oscar Corcho [email protected] @ocorcho https://www.slideshare.com/ocorcho

Why do they call it Linked Data when they want to say...?

Embed Size (px)

Citation preview

10 basic rules to overcome ontology engineering deadlocks in collaborative ontology engineering tasks

Why do they call it Linked Data when they want to say?

Keynote atThe 6th International Workshop on Consuming Linked Data (COLD)12/10/2015Oscar [email protected]@ocorchohttps://www.slideshare.com/ocorcho

The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politcnica de Madrid, in order to facilitate Linked Data consumption.

LicenseThis work is licensed under the license CC BY-NC-SA 4.0 Internationalhttp://purl.org/NET/rdflicense/cc-by-nc-sa4.0

You are free:to Share to copy, distribute and transmit the workto Remix to adapt the work

Under the following conditionsAttribution You must attribute the work by inserting[source Oscar Corcho] at the footer of each reused slidea credits slide stating: These slides are partially based on Why do they call it Linked Data when they want to say? by O. CorchoNon-commercialShare-Alike

MotivationI want to consume Linked Data. What do I use?

SQUINLinked Data PlatformLinked Data FragmentsJSON-LDCSV on the WebSPARQL endpoints

Outline of the talkWhere do we start from?A few examples of applications that we have built by consuming RDF

Application 1. 3cixty

http://www.3cixty.com/

3cixty. Planning our visit to a city

3cixty. Exploiting the wishlist while in the city

Check it at the poster and demo session, for the Semantic Web Challenge

Application 2. Geomarketing

Application 3. Buyer profile at Zaragoza

http://www.zaragoza.es/ciudad/gestionmunicipal/contratos/

Application 4. Smart Developer Hub

http://www.smartdeveloperhub.org/

How are all these applications built?ApplicationHow is data stored & published?How is data consumed?3cixtyCentralised SPARQL endpointLinked Data (Virtuoso)SPARQL queries (webapp)Ad-hoc API (mobile app) Linked Data (not used yet)Geomarketing

Centralised SPARQL endpointLinked Data (ELDA)Linked DataAd-hoc API for RDF Data CubeBuyer profile at Zaragoza

Oracle DBLinked Data (ad-hoc software)SOLRCentralised SPARQL endpointLinked DataSOLRSPARQL for complex queries????

Outline of the talkWhere do we start from?A few examples of applications that we have built by consuming RDFQuiz time: what do we understand by Linked Data?

What do papers in COLD tell us about Linked Data?KR2RML: An Alternative Interpretation of R2RML for Heterogenous SourcesLeveraging Linked Data to Infer Semantic Relations within Structured SourcesLOTUS: Linked Open Text UnleaShedOptimizing RDF Data Cubes for Efficient Processing of Analytical Queries Pattern-Based Linked Data Publication: The Linked Chess Dataset CasePolicies Composition based on Data Usage Context Towards Crawling the Web for Structured Data: Pitfalls of Common Crawl for E-Commerce Uniqueness, Density, and Keyness: Exploring Class Hierarchies

TopicsMakes use of Linked Data principles, including dereferencingInvolves direct use of multiple, real-world Linked Datasets

Linked Data principlesUse URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using standards (RDF, SPARQL) Include links to other URIs, so that they can discover more things

Quiz time

What is Linked Data for you?

Quiz 1. Is this Linked Data?

They call it API. Do they mean Linked Data?http://www.zaragoza.es/docs-api/

16

Quiz 1. A few hintsLets try to runcurl -X GET --header "Accept: application/x-turtle" "http://www.zaragoza.es/api/recurso/urbanismo-infraestructuras/callejero/via?rf=html&results_only=false"

Or a more specific one for one streetcurl -X GET --header "Accept: application/ld+json" "http://www.zaragoza.es/api/recurso/urbanismo-infraestructuras/callejero/via/20?rf=html"

Then, what do we think about it?

Quiz 2. And what about this? http://datos.localidata.com/recurso/comercio/Provincia/Madrid/Municipio/madrid/Local/Distrito/Label/Tetun

Quiz 2. A few more hintsHowever, this is giving me access to lots of URIshttp://datos.localidata.com/recurso/comercio/Provincia/Madrid/Municipio/madrid/Local/11029404L0-PlantaPB-Local214-ID36963

Which I could then use in order to start applying a Linked traversal approach with bound subjects (e.g., as in SQUIN)

In summarySeveral approaches for Linked Data exposure that go beyond pure Linked DataCombining REST APIs that provide you access to lots of URIs with pure Linked Data approaches

Outline of the talkWhere do we start from?A few examples of applications that we have built by consuming RDFQuiz time: what do we understand by Linked Data?A summary of current Linked Data consumption approaches

A summary of Linked Data consumption approachesStealing some copyrighted material from the Linked Data Fragments folksThey will surely be better than me explaining this ;-)

A summary of Linked Data consumption approaches

?

Outline of the talkWhere do we start from?A few examples of applications that we have built by consuming RDFQuiz time: what do we understand by Linked Data?A summary of current Linked Data consumption approachesYet another approach: AGORAPlus some demos (compulsory when talking about Linked Data)

Attention!!Ongoing workSneak-previewNo technical paper yetWe have to sit down and write everything carefullyHighly driven by our initial use caseNow in the process of generalising it

Our research hypothesisCan we go a bit beyond triple pattern fragments whilemaintaining the good behaviour server-side,exploiting Linked Data about subjects, andkeeping to the Web paradigm?

Basic graph pattern fragments?

BGPs-lite, that is, BGPs with somerestrictionsThe Agora (/r/; Ancient Greek: Agor) was a central spot in ancient Greek city-states. The literal meaning of the word is "gathering place" or "assembly". [Wikipedia]

Our assumptions on BGPsBGPs composed of triple patterns withSubjects are always variablesProperties must be URIsObjects can be variables, URIs or literals (will only work with equality)Easy extensions (not done because of lack of time)Allowing URIs as subjectsExtending properties to property pathsAdding more types of FILTERSDifficult extensions (need to think a bit more about them)Properties as variablesPROCESSABLE{?x ci:codebase ?y}{?s doap:name "jenkins" . ?s scm:hasBranch ?b}{?a ci:hasBuild ?b . ?b ci:hasExecution ?c . ?c ci:hasResult ?d}

NOT PROCESSABLE{?x ?p "jenkins}{?x ?p ?y}

A few more assumptionsRDF data has been created according to some vocabularyResources are typed ( a )Vocabularies may be lightweight or heavyweightHowever, we are not exploiting all types of domain and range restrictions, or inferences, yet

Step 1. Provide some vocabularies to use for planningTell AGORA (our fountain) which are the vocabularies that it has to understandNote: relevant for the production of query plans

Post to http://localhost:9001/vocabs the OWL fileLets check the resultshttp://localhost:9001/typeshttp://localhost:9001/properties

Step 2. Provide/get some seed URIs to start query plansTell AGORAs seed collector which are the seeds that it can take to start the link traversal approachNote: those seed URIs need to be connected to all dataStored in redis

Post to http://localhost:9001/seeds every seed URIOne may be enough if it provides access to other URIsLets check the resultshttp://localhost:9001/seeds

Step 2. Provide/get some seed URIs to start query plansSeeds may be obtained from a list of URIs, queries to SPARQL endpoints, ad-hoc wrappers, etc.

Step 3. Obtain a query/search planRequest a query plan to AGORAs planner, for a given graph pattern

Lets check the resultshttp://localhost:9001/plan?gp={?a ci:hasBuild ?b}

Step 3. Obtain a query/search plan[] a agora:SearchTree ; agora:fromType ci:CIHarvester ; agora:hasSeed ; agora:length 1 ; agora:next [ agora:byPattern _:tp_0 ; agora:expectedType ci:CIHarvester ] .

[] a agora:SearchSpace ; agora:definedBy _:tp_0 .

_:var_a a agora:Variable ; rdfs:label "?a"^^xsd:string .

_:var_b a agora:Variable ; rdfs:label "?b"^^xsd:string .

_:tp_0 a agora:TriplePattern ; agora:object _:var_b ; agora:predicate ci:hasBuild ; agora:subject _:var_a .

Lets check the resultshttp://localhost:9001/plan?gp={?a ci:hasBuild ?b}

Lets check this URI

Looking up for that URI

Step 3. Obtain a query/search plan[] a agora:SearchTree ; agora:fromType ci:CIHarvester ; agora:hasSeed ; agora:length 52 ; agora:next [ agora:byPattern _:tp_2 ; agora:expectedType ci:CIHarvester ; agora:next [ agora:byPattern _:tp_0 ; agora:expectedType ci:Build ; agora:next [ agora:byPattern _:tp_1 ; agora:expectedType oslc_auto:AutomationRequest ] ; agora:onProperty ci:hasExecution ] ; agora:onProperty ci:hasBuild ] .

[] a agora:SearchSpace ; agora:definedBy _:tp_0, _:tp_1, _:tp_2 .

_:var_a a agora:Variable ; rdfs:label "?a"^^xsd:string .

_:var_d a agora:Variable ; rdfs:label "?d"^^xsd:string .

_:tp_0 a agora:TriplePattern ; agora:object _:var_c ; agora:predicate ci:hasExecution ; agora:subject _:var_b .

..Lets check the results of a more complex queryhttp://localhost:9001/plan?gp={?a ci:hasBuild ?b . ?b ci:hasExecution ?c . ?c ci:hasResult ?d}

What is a query/search plan for a BGP?Composed by:A set of seed URIsA set of search pathsWhat is a seed URI?The subject of one of the triples contained in the AgoraWhat is a search path?A finite and executable queue of search stepsIts execution starts by dereferencing the seed URIs, which initializes theset of query-relevant triples

property 1...property N

Step 4. Evaluate the query plan by dereferencing

Lets check the resultshttp://localhost:9001/fragment?gp={?a ci:hasBuild ?b}

Lets now do a demo with dbpediaYeah, all this was working in a controlled environment. What about Dbpedia?Obviuosly, DBpedia understood from a pure Linked Data perspective.

We will open a brand new AGORA and will tell it to understand about movies

A few operations to be doneFirst of all, load the vocabulary in AGORA and provide a few seedsThrough a SPARQL query to DBpedia, but could be a list of URIsThen, we can start inspectinghttp://localhost:9000/graph/ http://localhost:9000/typeshttp://localhost:9000/propertiesLets start queryingFirst lets see a plan:http://localhost:9000/plan?gp={?f%20dbpedia-owl:starring%20?a}http://localhost:9000/plan/view?gp={?f%20dbpedia-owl:starring%20?a}And then execute the query

A few other queriesGet all relations between the films and the actors who star on themhttp://localhost:9000/fragment?gp={?f dbpedia-owl:starring ?a}Same as previous query, but also getting the name of these actorshttp://localhost:9000/fragment?gp={?f dpedia-owl:starring ?a. ?a dbp:birthName ?n}Get all films, their distributors and known locations of each themhttp://localhost:9000/fragment?gp={?f dbpedia-owl:distributor ?d. ?d dbpedia-owl:location ?l}

Outline of the talkWhere do we start from?A few examples of applications that we have built by consuming RDFQuiz time: what do we understand by Linked Data?A summary of current Linked Data consumption approachesYet another approach: AGORAPlus some demos (compulsory when talking about Linked Data)Where do we go next?

Whats next for AGORA?An additional bit of engineeringExtending to other parts of SPARQLExploiting caching even morePaginationBuilding the vocabularies automatically for all those cases where there is no vocabulary (using LOUPE)etc.(basically, all those things already very well done by LDF)SPARQL UpdatesSome Linked Data Platform (ldp4j) technology behind the scenesSitting down to write everything carefullyThe whole frameworkThe query planning algorithmEvaluations and comparisons with other approachesIs this approach really worth it?

What have we been talking about?

WAIT FOR OUR PAPER TO BE PUBLISHED

And now the main conclusionsConsumption of Linked Data is normally associated to SPARQL querying over some dataset of the LOD cloudMy feeling after having read many papers that talk about Linked Data consumptionNothing against that (look at the original examples that I gave earlier), but we have to understand, as a community, whether there are any challenges that pure Linked Data approaches allows performing betterWhy do all people talk about REST APIs and we dont?So, more work needed onApproaches that exploit the features of pure Linked Data (e.g., SQUIN and Linked Traversal querying)Approaches that exploit the Web dimension infrastructure (e.g. Linked Data Fragments)

Conclusions (II)We should continue exploring this space

But probably these dimensions are not enough

And many open challenges stillFederated query processing techniques (adaptive)

AGORA

And the last (bonus) slide

And this is what you should remember from the talk

Source: "Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"

Why do they call it Linked Data when they want to say?

Acknowledgements to the SDH team at the Center for Open Middleware:Fernando Serena, Carlos Blanco, Alejandro Fernndez, Alejandro Vera, Miguel Esteban, Andrs Garca, Javier Soriano, Asuncin GmezOscar [email protected]@ocorchohttps://www.slideshare.com/ocorcho