Linking the world with Python and Semantics@tati_alchueyr (Globo.com)25th July 2012, FISL 13
how do you store your data?
how do you store your data?
[ ] data... what data?![ ] raw files (csv, json, xml)[ ] database (eg. Relational Data Base)
[ ] graphs (eg. Resource Description Framework)
[ ] other...
how do you search for...?
Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.
ERP service providers with offices in São Paulo and New York.
Researchers working on artificial intelligence in Southeast of Brazil.
GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
how do you search for...?
Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.
ERP service providers with offices in São Paulo and New York.
Researchers working on artificial intelligence in Southeast of Brazil.
GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
how do you search for...?
Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.
ERP service providers with offices in São Paulo and New York.
Researchers working on artificial intelligence in Southeast of Brazil.
GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
how do you search for...?
Apartments near English-Portuguese bilingual childcare in Rio de Janeiro state.
ERP service providers with offices in São Paulo and New York.
Researchers working on artificial intelligence in Southeast of Brazil.
GNU GPL software for image processing developed from 2009 to 2010 authored also by Brazilian developers
what ^ have in common?
linked open data in 2007
linked open data in 2008
linked open data in 2009
linked open data in 2011
traditional RDMS
linked data graph
linked data modelling
modelling
modelling
quering RDB
select bookID, authorName from books, authorswhere books.aid = authors.aid and books.isbn = ‘006251587X’.
quering RDF
select ?authName ?authEmail where { <amazon:book#006251587X> <amazon:hasAuthor> <foaf:name#TimBerners-Lee> <foaf:name#TimBerners-Lee> <foaf:name> ?authName <foaf:name#TimBerners-Lee> <foaf:email>?authEmail}
globo.com developers before usingweb semantics
globo.com developers while learningweb semantics
(?w ?t ?f)
globo.com developers after usingweb semantics
Sample hard to test code
approach 1# queries isolation
approach 2# data as object
DAO
Y U NO make
SPARQL queries?!
Y U NO make
data access easy?!
Y U NO make
things testable?!
product developers evaluatingweb semantics
fact 1: we don't have anout-of-box solution
fact 2: but we do havesome options
#1: create a solutionfrom scratch
#2: study existing solutions and then[ ] contribute to them[ ] develop on top of them[ ] goto #1
some options
the final decision is not only ours
but we chose starting from #2
#2: study existing solutions and then (...)
ok, lmgfy
a few results from google
ActiveRDF
active-semantic
Django4Store
Django-RDF
Django-RDFAlchemy
Djubby
EasyRDF
Jena
FuXi
Oort
Pymantic
PyRdfa
pysparql
RDFAlchemy
RdfLib
Redland
semantic-django
SPARQLWrapper
Sparrow
Sparta
SuRF
click to know more
ActiveRDF
active-semantic
Django4Store
Django-RDF
Django-RDFAlchemy
Djubby
EasyRDF
Jena
FuXi
Oort
Pymantic
PyRdfa
pysparql
RDFAlchemy
RdfLib
Redland
semantic-django
SPARQLWrapper
Sparrow
Sparta
SuRF
ActiveRDF
active-semantic
Django4Store
Django-RDF
Django-RDFAlchemy
Djubby
EasyRDF
Jena
FuXi
Oort
Pymantic
PyRdfa
pysparql
RDFAlchemy
RdfLib
Redland
semantic-django
SPARQLWrapper
Sparrow
Sparta
SuRF
{?project :by_author ?author .?author :works_at :globocom . }
ActiveRDF
active-semantic
Django4Store
Django-RDF
Django-RDFAlchemy
Djubby
EasyRDF
Jena
FuXi
Oort
Pymantic
PyRdfa
pysparql
RDFAlchemy
RdfLib
Redland
semantic-django
SPARQLWrapper
Sparrow
Sparta
SuRF
{?project :use_language :python . }
{?project :use_language :python ;:last_commit ?commit .
FILTER (?commit >= "2011-12-01"^^xsd:date) }ActiveRDF
active-semantic
Django4Store
Django-RDF
Django-RDFAlchemy
Djubby
EasyRDF
Jena
FuXi
Oort
Pymantic
PyRdfa
pysparql
RDFAlchemy
RdfLib
Redland
semantic-django
SPARQLWrapper
Sparrow
Sparta
SuRF
relation between these tools
team filtering
ActiveRDF
active-semantic
Django4Store
Django-RDF
Django-RDFAlchemy
Djubby
EasyRDF
Jena
FuXi
Oort
Pymantic
PyRdfa
pysparql
RDFAlchemy
RdfLib
Redland
semantic-django
SPARQLWrapper
Sparrow
Sparta
SuRF
# List all predicates of dbonto:Bandquery = """SELECT distinct ?subjectFROM <http://dbpedia.org>{ ?subject rdfs:domain ?object . <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0) ).}""" http://live.dbpedia.org/sparql
sparql = SPARQLWrapper("http://dbpedia.org/sparql")sparql.setQuery(query)sparql.setReturnFormat(JSON)results = sparql.query().convert()
for result in results["results"]["bindings"]: print(result["subject"]["value"])
SPARQLWrapperproblem: list all predicates of a class
SPARQLWrapper
# List all predicates of dbonto:Bandquery = """SELECT distinct ?subjectFROM <http://dbpedia.org>{ ?subject rdfs:domain ?object . <http://dbpedia.org/ontology/Band> rdfs:subClassOf ?object OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0) ).}""" http://live.dbpedia.org/sparql
sparql = SPARQLWrapper("http://dbpedia.org/sparql")sparql.setQuery(query)sparql.setReturnFormat(JSON)results = sparql.query().convert()
for result in results["results"]["bindings"]: print(result["subject"]["value"])
abstract endpoint returns dict
SPARQLWrapper
Ok, not different from what we have...
SPARQLWrapper
just a wrapper around a SPARQL serverwell tested ;)
SPARQLWrapperproblem: list all subjects given ?p ?o
from SPARQLWrapper import SPARQLWrapper, JSON
# List all instances (eg. bands) with genre Metalquery = """PREFIX db: <http://dbpedia.org/resource/>PREFIX dbonto: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?whoFROM <http://dbpedia.org>WHERE { ?who dbonto:genre db:Metal .}"""
sparql = SPARQLWrapper("http://dbpedia.org/sparql")sparql.setQuery(query)sparql.setReturnFormat(JSON)results = sparql.query().convert()
for result in results["results"]["bindings"]: print(result["who"]["value"])
import rdflibimport rdfextras.store.SPARQL
# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)
# Definitionsgenre = rdflib.URIRef("http://dbpedia.org/ontology/genre")metal = rdflib.URIRef("http://dbpedia.org/resource/Metal")
# Queryfor label in graph.subjects(genre, metal):
print label
RdfLibproblem: list all subjects given ?p ?o
RdfLibabstract endpoint returns dict namespace
import rdflibimport rdfextras.store.SPARQL
# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)
# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")
# Queryfor label in graph.subjects(DBONTO.genre, DB.Metal):
print label
RdfLibabstract endpoint returns dict namespace
import rdflibimport rdfextras.store.SPARQL
# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)
# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")
# Queryfor label in graph.subjects(DBONTO.genre, DB.Metal):
print label
subjectspredicatesobjectssubject_predicatessubject_objectspredicates_objects
RdfLibabstract endpoint returns dict namespace
import rdflibimport rdfextras.store.SPARQL
# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)
# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")
# Using triplesfor musician, _, _ in graph.triples((None, DBONTO.genre, DB.Metal)): print musician
RdfLibabstract endpoint returns dict namespace query by triples
import rdflibimport rdfextras.store.SPARQL
# SPARQL endpoint setupendpoint = "http://dbpedia.org/sparql"store = rdfextras.store.SPARQL.SPARQLStore(endpoint)graph = rdflib.Graph(store)
# Namespaces to clear up definitionsDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")
# Queryfor label in graph.subjects(DBONTO.genre, DB.Metal):
print label
RdfLibabstract endpoint returns dict namespace query by triples
import rdflibimport rdfextras.store.SPARQL
# n3 fixture filegraph = rdflib.Graph()graph.parse("fixture_genre_metal.nt", format="nt")
# NamespaceDBONTO = rdflib.Namespace("http://dbpedia.org/ontology/")DB = rdflib.Namespace("http://dbpedia.org/resource/")
# Add nodesgraph.add((DB.AndrewsMedina, DBONTO.genre, DB.Metal))graph.add((DB.Siminino, DBONTO.genre, DB.Metal))graph.add((DB.Herman, DBONTO.genre, DB.Metal))
# Remove nodesgraph.remove((DB.AndrewsMedina, DBONTO.genre, DB.Metal))
add / remove
RdfLib
concentrates on providing the core RDF types and interfaces, through plugin interface
RdfLib
makes testing simple, allowingfixtures using n3 files, add triplesand remove triples
RdfLib
but...
each triple query requires a new connection to SPARQL
RdfLib
therefore
too many access to SPARQL endpoint
RdfLib
and...
doesn't provide an ORM (object relational mapping)
SuRFabstract endpoint returns dict namespace query by triples add / remove
ORM
from surf import Store, Session, ns, query
store = Store(reader='sparql_protocol', endpoint='http://dbpedia.org/sparql')session = Session(store, {})session.enable_logging = False
ns.register(db='http://dbpedia.org/resource/')ns.register(dbonto='http://dbpedia.org/ontology/')
MusicalArtist = session.get_class(ns.DB['MusicalArtist'])
artistas_metal = MusicalArtist.get_by(dbonto_genre=ns.DB["Metal"])
print artistas_metal
SuRFproblem: list all subjects given ?p ?o
from surf import Store, Session, ns, query
store = Store(reader='sparql_protocol', endpoint='http://dbpedia.org/sparql')session = Session(store, {})
ns.register(db='http://dbpedia.org/resource/')ns.register(dbonto='http://dbpedia.org/ontology/')
query_surf = query.select("?who").distinct()query_surf.where(("?who", ns.DBONTO.genre, ns.DB.Metal))
metal_bands = session.default_store.execute(query_surf)
for band in metal_bands:print band
ORMcomposed
queries
SuRF
various approachesORM
programaticaly
SuRF
simple ORMno need to redeclare
TTL definitions
SuRF
“complex” queries using
lazy evalutation
SuRF
documentation&
community
SuRF
but...
no django-style models
SuRF
verbose syntax
RDFAlchemy
from rdfalchemy.sparql import SPARQLGraphfrom rdflib import Namespace
endpoint = "http://dbpedia.org/sparql"graph = SPARQLGraph(endpoint)
DB = Namespace("http://dbpedia.org/resource/")DBONTO = Namespace("http://dbpedia.org/ontology/")
metal_bands = graph.subjects(predicate=DBONTO.genre, object=DB.Metal)
for band in metal_bands:print band
problem: list all subjects given ?p ?o
RDFAlchemyabstract endpoint returns dict namespace query by triples add / remove
ORM django-like
from rdfalchemy.sparql import SPARQLGraphfrom rdfalchemy import rdfSubject, rdfSinglefrom rdflib import Namespace
DB = Namespace('http://dbpedia.org/resource/')DBONTO = Namespace("http://dbpedia.org/ontology/")RDFS = Namespace('http://www.w3.org/2000/01/rdf-schema#')
endpoint = "http://live.dbpedia.org/sparql"graph = SPARQLGraph(endpoint)rdfSubject.db = graph
class MusicalArtist(rdfSubject): rdfs_label = rdfSingle(RDFS.label, 'label') genre = rdfSingle(DBONTO.genre, 'genre')
metal_artists = MusicalArtist.filter_by(genre=DB.Metal)
for band in metal_artists: print band
RDFAlchemy
django-likemodels
RDFAlchemy
simple syntax
RDFAlchemy
but...
non-lazy
RDFAlchemy
we have to declare all data already
described in TTL filesas python classes
semantic-django
# Classes similar to django model's are created from TTL# files (using manage.py)
class BaseLugar(BaseEntidade): latitude = models.UriField() longitude = models.UriField() geonameid = models.UriField() tem_mapa = models.UriField() apelido = models.UriField() ImagemMapa = models.UriField() genero_gramatical = models.UriField() class Meta: semantic_graph = 'http://semantica.globo.com/base/Lugar'
abstract endpoint returns dict namespace query by triples
ORM django-like
add / remove
semantic-django
https://github.com/rfloriano/semantic-django
semantic-django
dream ofmany
product developers
semantic-django
but...
just started to be developed
[ ] contribute to them[ ] develop on top of them[ ] create a solution from scratch[ ] other, _________________
study existing solutions, and now?
grab your post-it, it's review time!
SuRF
RDFAlchemy
RDFlib
semantic-django
(...)
=) =( comments
nomodels
showsquery
modelsnotlazy
niceAPI
djangolike
namespace
lowlayer
juststarted
myfavorite
mychoice
any questions...?
@tati_alchueyr
casting by(click to know more about each meme)