Linking the world with Python and Semantics

Preview:

DESCRIPTION

Introduction on how to use open data and Python, with examples of RDFLib, SuRF and RDF-Alchemy. http://softwarelivre.org/fisl13

Citation preview

Linking the world with Python and Semantics@tati_alchueyr (Globo.com)25th July 2012, FISL 13

how do you store your data?

how do you store your data?

[ ] data... what data?![ ] raw files (csv, json, xml)[ ] database (eg. Relational Data Base)

[ ] graphs (eg. Resource Description Framework)

[ ] other...

how do you search for...?

Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo and New York.

Researchers working on artificial intelligence in Southeast of Brazil.

GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers

how do you search for...?

Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo and New York.

Researchers working on artificial intelligence in Southeast of Brazil.

GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers

how do you search for...?

Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo and New York.

Researchers working on artificial intelligence in Southeast of Brazil.

GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers

how do you search for...?

Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.

ERP service providers with offices in São Paulo and New York.

Researchers working on artificial intelligence in Southeast of Brazil.

GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers

what ^ have in common?

linked open data in 2007

linked open data in 2008

linked open data in 2009

linked open data in 2011

traditional RDMS

linked data graph

linked data modelling

modelling

modelling

quering RDB

select bookID, authorName from books, authorswhere books.aid = authors.aid and books.isbn = ‘006251587X’.

quering RDF

select ?authName ?authEmail where { <amazon:book#006251587X> <amazon:hasAuthor> <foaf:name#TimBerners-Lee> <foaf:name#TimBerners-Lee> <foaf:name> ?authName <foaf:name#TimBerners-Lee> <foaf:email>?authEmail}

globo.com developers before usingweb semantics

globo.com developers while learningweb semantics

(?w ?t ?f)

globo.com developers after usingweb semantics

Sample hard to test code

approach 1# queries isolation

approach 2# data as object

DAO

Y U NO make

SPARQL queries?!

Y U NO make

data access easy?!

Y U NO make

things testable?!

product developers evaluatingweb semantics

fact 1: we don't have anout-of-box solution

fact 2: but we do havesome options

#1: create a solutionfrom scratch

#2: study existing solutions and then[ ] contribute to them[ ] develop on top of them[ ] goto #1

some options

the final decision is not only ours

but we chose starting from #2

#2: study existing solutions and then (...)

ok, lmgfy

a few results from google

ActiveRDF

active-semantic

Django4Store

Django-RDF

Django-RDFAlchemy

Djubby

EasyRDF

Jena

FuXi

Oort

Pymantic

PyRdfa

pysparql

RDFAlchemy

RdfLib

Redland

semantic-django

SPARQLWrapper

Sparrow

Sparta

SuRF

ActiveRDF

active-semantic

Django4Store

Django-RDF

Django-RDFAlchemy

Djubby

EasyRDF

Jena

FuXi

Oort

Pymantic

PyRdfa

pysparql

RDFAlchemy

RdfLib

Redland

semantic-django

SPARQLWrapper

Sparrow

Sparta

SuRF

{?project :by_author ?author .?author :works_at :globocom . }

ActiveRDF

active-semantic

Django4Store

Django-RDF

Django-RDFAlchemy

Djubby

EasyRDF

Jena

FuXi

Oort

Pymantic

PyRdfa

pysparql

RDFAlchemy

RdfLib

Redland

semantic-django

SPARQLWrapper

Sparrow

Sparta

SuRF

{?project :use_language :python . }

{?project :use_language :python ;:last_commit ?commit .

FILTER (?commit >= "2011-12-01"^^xsd:date) }ActiveRDF

active-semantic

Django4Store

Django-RDF

Django-RDFAlchemy

Djubby

EasyRDF

Jena

FuXi

Oort

Pymantic

PyRdfa

pysparql

RDFAlchemy

RdfLib

Redland

semantic-django

SPARQLWrapper

Sparrow

Sparta

SuRF

relation between these tools

team filtering

ActiveRDF

active-semantic

Django4Store

Django-RDF

Django-RDFAlchemy

Djubby

EasyRDF

Jena

FuXi

Oort

Pymantic

PyRdfa

pysparql

RDFAlchemy

RdfLib

Redland

semantic-django

SPARQLWrapper

Sparrow

Sparta

SuRF

# List all predicates of dbonto:Bandquery = """SELECT distinct ?subjectFROM <http://dbpedia.org>{ ?subject rdfs:domain ?object . <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0) ).}""" http://live.dbpedia.org/sparql

sparql = SPARQLWrapper("http://dbpedia.org/sparql")sparql.setQuery(query)sparql.setReturnFormat(JSON)results = sparql.query().convert()

for result in results["results"]["bindings"]: print(result["subject"]["value"])

SPARQLWrapperproblem: list all predicates of a class

SPARQLWrapper

# List all predicates of dbonto:Bandquery = """SELECT distinct ?subjectFROM <http://dbpedia.org>{ ?subject rdfs:domain ?object . <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0) ).}""" http://live.dbpedia.org/sparql

sparql = SPARQLWrapper("http://dbpedia.org/sparql")sparql.setQuery(query)sparql.setReturnFormat(JSON)results = sparql.query().convert()

for result in results["results"]["bindings"]: print(result["subject"]["value"])

abstract endpoint returns dict

SPARQLWrapper

Ok, not different from what we have...

SPARQLWrapper

just a wrapper around a SPARQL serverwell tested ;)

SPARQLWrapperproblem: list all subjects given ?p ?o

from SPARQLWrapper import SPARQLWrapper, JSON

# List all instances (eg. bands) with genre Metalquery = """PREFIX db: <http://dbpedia.org/resource/>PREFIX dbonto: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?whoFROM <http://dbpedia.org>WHERE { ?who dbonto:genre db:Metal .}"""

sparql = SPARQLWrapper("http://dbpedia.org/sparql")sparql.setQuery(query)sparql.setReturnFormat(JSON)results = sparql.query().convert()

for result in results["results"]["bindings"]: print(result["who"]["value"])

import rdflibimport rdfextras.store.SPARQL

# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)

# Definitionsgenre = rdflib.URIRef("http://dbpedia.org/ontology/genre")metal = rdflib.URIRef("http://dbpedia.org/resource/Metal")

# Queryfor label in graph.subjects(genre, metal):

print label

RdfLibproblem: list all subjects given ?p ?o

RdfLibabstract endpoint returns dict namespace

import rdflibimport rdfextras.store.SPARQL

# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)

# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")

# Queryfor label in graph.subjects(DBONTO.genre, DB.Metal):

print label

RdfLibabstract endpoint returns dict namespace

import rdflibimport rdfextras.store.SPARQL

# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)

# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")

# Queryfor label in graph.subjects(DBONTO.genre, DB.Metal):

print label

subjectspredicatesobjectssubject_predicatessubject_objectspredicates_objects

RdfLibabstract endpoint returns dict namespace

import rdflibimport rdfextras.store.SPARQL

# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)

# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")

# Using triplesfor musician, _, _ in graph.triples((None, DBONTO.genre, DB.Metal)): print musician

RdfLibabstract endpoint returns dict namespace query by triples

import rdflibimport rdfextras.store.SPARQL

# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)

# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")

# Queryfor label in graph.subjects(DBONTO.genre, DB.Metal):

print label

RdfLibabstract endpoint returns dict namespace query by triples

import rdflibimport rdfextras.store.SPARQL

# n3 fixture filegraph = rdflib.Graph()graph.parse("fixture_genre_metal.nt", format="nt")

# NamespaceDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")

# Add nodesgraph.add((DB.AndrewsMedina, DBONTO.genre, DB.Metal))graph.add((DB.Siminino, DBONTO.genre, DB.Metal))graph.add((DB.Herman, DBONTO.genre, DB.Metal))

# Remove nodesgraph.remove((DB.AndrewsMedina, DBONTO.genre, DB.Metal))

add / remove

RdfLib

concentrates on providing the core RDF types and interfaces, through plugin interface

RdfLib

makes testing simple, allowingfixtures using n3 files, add triplesand remove triples

RdfLib

but...

each triple query requires a new connection to SPARQL

RdfLib

therefore

too many access to SPARQL endpoint

RdfLib

and...

doesn't provide an ORM (object relational mapping)

SuRFabstract endpoint returns dict namespace query by triples add / remove

ORM

from surf import Store, Session, ns, query

store = Store(reader='sparql_protocol',                   endpoint='http://dbpedia.org/sparql')session = Session(store, {})session.enable_logging = False

ns.register(db='http://dbpedia.org/resource/')ns.register(dbonto='http://dbpedia.org/ontology/')

MusicalArtist = session.get_class(ns.DB['MusicalArtist'])

artistas_metal = MusicalArtist.get_by(dbonto_genre=ns.DB["Metal"])

print artistas_metal

SuRFproblem: list all subjects given ?p ?o

from surf import Store, Session, ns, query

store = Store(reader='sparql_protocol', endpoint='http://dbpedia.org/sparql')session = Session(store, {})

ns.register(db='http://dbpedia.org/resource/')ns.register(dbonto='http://dbpedia.org/ontology/')

query_surf = query.select("?who").distinct()query_surf.where(("?who", ns.DBONTO.genre, ns.DB.Metal))

metal_bands = session.default_store.execute(query_surf)

for band in metal_bands:print band

ORMcomposed

queries

SuRF

various approachesORM

programaticaly

SuRF

simple ORMno need to redeclare

TTL definitions

SuRF

“complex” queries using

lazy evalutation

SuRF

documentation&

community

SuRF

but...

no django-style models

SuRF

verbose syntax

RDFAlchemy

from rdfalchemy.sparql import SPARQLGraphfrom rdflib import Namespace

endpoint = "http://dbpedia.org/sparql"graph = SPARQLGraph(endpoint)

DB = Namespace("http://dbpedia.org/resource/")DBONTO = Namespace("http://dbpedia.org/ontology/")

metal_bands = graph.subjects(predicate=DBONTO.genre, object=DB.Metal)

for band in metal_bands:print band

problem: list all subjects given ?p ?o

RDFAlchemyabstract endpoint returns dict namespace query by triples add / remove

ORM django-like

from rdfalchemy.sparql import SPARQLGraphfrom rdfalchemy import rdfSubject, rdfSinglefrom rdflib import Namespace

DB = Namespace('http://dbpedia.org/resource/')DBONTO = Namespace("http://dbpedia.org/ontology/")RDFS = Namespace('http://www.w3.org/2000/01/rdf-schema#')

endpoint = "http://live.dbpedia.org/sparql"graph = SPARQLGraph(endpoint)rdfSubject.db = graph

class MusicalArtist(rdfSubject): rdfs_label = rdfSingle(RDFS.label, 'label') genre = rdfSingle(DBONTO.genre, 'genre')

metal_artists = MusicalArtist.filter_by(genre=DB.Metal)

for band in metal_artists: print band

RDFAlchemy

django-likemodels

RDFAlchemy

simple syntax

RDFAlchemy

but...

non-lazy

RDFAlchemy

we have to declare all data already

described in TTL filesas python classes

semantic-django

# Classes similar to django model's are created from TTL# files (using manage.py)

class BaseLugar(BaseEntidade): latitude = models.UriField() longitude = models.UriField() geonameid = models.UriField() tem_mapa = models.UriField() apelido = models.UriField() ImagemMapa = models.UriField() genero_gramatical = models.UriField() class Meta: semantic_graph = 'http://semantica.globo.com/base/Lugar'

abstract endpoint returns dict namespace query by triples

ORM django-like

add / remove

semantic-django

https://github.com/rfloriano/semantic-django

semantic-django

dream ofmany

product developers

semantic-django

but...

just started to be developed

[ ] contribute to them[ ] develop on top of them[ ] create a solution from scratch[ ] other, _________________

study existing solutions, and now?

grab your post-it, it's review time!

SuRF

RDFAlchemy

RDFlib

semantic-django

(...)

=) =( comments

nomodels

showsquery

modelsnotlazy

niceAPI

djangolike

namespace

lowlayer

juststarted

myfavorite

mychoice

any questions...?

@tati_alchueyr

Recommended