55
Linked Data at Semantic Team [email protected] Tatiana Al-Chueyr and Rodrigo D. A. Senra {tatiana.martins, rodrigo.senra}@corp.globo.com globo.com

Linked data at globo.com

Embed Size (px)

DESCRIPTION

Speech given together with Tatiana Al-Chuery during SemanticDay at Globo.com

Citation preview

Page 1: Linked data at globo.com

Linked Data at

Semantic [email protected] Al-Chueyr and Rodrigo D. A. Senra{tatiana.martins, rodrigo.senra}@corp.globo.com

globo.com

Page 2: Linked data at globo.com

Andréia Bustamante

Ícaro Medeiros

Tatiana Al-Chueyr

Rodrigo Senra

Semantic Team

Page 3: Linked data at globo.com

Franklin Amorim

João Carlos Mendes Luís

Alberto Beloni

André Nicodemus

Contributors

Page 4: Linked data at globo.com

BROADCAST MOVIES PAY TV INTERNET

EVENTS MUSIC

PUBLISHING

NEW VENTURES NEWSPAPERRADIO NETWORK

Page 5: Linked data at globo.com

Motivation

Soccer player

Cross-link content from different web products

Page 6: Linked data at globo.com

Politician

MotivationCross-link content from different web products

Page 7: Linked data at globo.com

Celebrity

Motivation● Cross-link content from different web products

MotivationCross-link content from different web products

Page 8: Linked data at globo.com

Isabella Nardoni foi morta em 29 de março de 2008

na Zona Norte de São Paulo (Foto:Reprodução)

Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo.

Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.

Caso Isabella Nardoni

Juliana Cardilli G1 SP

RDF

FOAF

GEO

Dublin Core

SKOS

Semantic markup in web pagesMotivation

Page 9: Linked data at globo.com

Recommend annotations to information ProducerMotivation

Page 10: Linked data at globo.com

Suggest related content to information Consumer Motivation

Page 11: Linked data at globo.com

Suggest related content to information Consumer Motivation

Page 12: Linked data at globo.com

Suggest related content to information Consumer Motivation

Page 13: Linked data at globo.com

Outcomes ● Flexible ways to organize content

● Ease to find related issues

● Explicit relations derived from annotated content

● Up-to-date topic pages with little editorial effort

● Linking content across different web products

● Seamless navigation leading to flow state

Page 14: Linked data at globo.com

Status QuoUsed by the main web products of Globo.com

linking, among others:

○ 18,485 organizations

○ 82,386 people

○ 9,129 places

○ 1,000,000+ annotated news

from August 2010 to May 2013

Page 15: Linked data at globo.com

Legacy Architecture

CDA

CMA

triple store

search engine

ontology

Page 16: Linked data at globo.com

CDA

CMA

CDACMA

CDACMA

CDACMA

Legacy Architecture

triple store

search engine

ontology

Page 17: Linked data at globo.com

Poor data management

○ direct access to triple store (unmanaged)

○ difficulty to share data (distributed DBs)

○ re-sync triple-store and search engine index

○ scalability of triple store

○ high entropy in distributed ontology engineering

Problems

Page 18: Linked data at globo.com

Problems

Page 19: Linked data at globo.com

Ontology Engineering

Domain-driven(current)

Base

G1 GE EGO TVG

news sports gossip tv

Upper

Person Organization

Music

Politics

Programme Education

Sports

Product-driven(past)

Place

Page 20: Linked data at globo.com

Possible Solution

UpperOntology

Page 21: Linked data at globo.com

Semantic as a library

○ many different versions in production

○ programming language dependent

○ steep learning curve for RDF/OWL/SPARQL

Problems

Page 22: Linked data at globo.com

Create an open semantic data management platform

● Scalable

● Mobile and Web friendly

● Interconnect Globo's data with external data sources

● Automate content extraction (including NER)

Next Step

Page 23: Linked data at globo.com

Brainiaklinked data restful API

Page 24: Linked data at globo.com

CDA

CMA

CDACMA

CDACMA

CDACMA

Legacy Architecture

triple store

search engine

ontology

Page 25: Linked data at globo.com

APIBrainiak

CMA

CDA

CDA

CDA

CDA

triple store

search engine

Under Development

Page 26: Linked data at globo.com

Requirements● Indirect usage of SPARQL

● Programming language independent

● Data management with quality

● Finer-grained authorization and authentication

● Isolate applications from triplestore

● Improve triplestore performance

Page 27: Linked data at globo.com

SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0

task: list all sports teams

Page 28: Linked data at globo.com

/sports/Team

Brainiak query

GET

Page 29: Linked data at globo.com

SPARQL response

Page 30: Linked data at globo.com

Brainiak response

Page 31: Linked data at globo.com

Brainiak concepts

● Instance

● Collection (set of instances from a given Class)

● Schema (the Class definition)

● Context

Page 32: Linked data at globo.com

Instance

Page 33: Linked data at globo.com

Collection

Page 34: Linked data at globo.com

Schema

Page 35: Linked data at globo.com

Context

Page 36: Linked data at globo.com

placeState

Brazil

Country

JapanCity

Real example

Page 37: Linked data at globo.com

/placeGET

/place/CountryGET

/place/Country/_schemaGET

/place/Country/BrazilGET

Real example

Page 38: Linked data at globo.com

resource URL→ /place/Country/Brazil

context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/Countryinstance → http://semantica.globo.com/place/Country/Brazil

URI Conventions

Page 39: Linked data at globo.com

/place/River ?graph_uri=http://dbpedia.org/resource/classes#&class_uri=dbpedia:River

Overridencontext (graph) → http://dbpedia.org/resource/classes#class → http://dbpedia.org/ontology/River

Conventioncontext (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/River

Legacy URIs

Page 40: Linked data at globo.com

Hypermedia

● Flexibility and programmatic adaptation

● Semantic affordances

● Client has to understand what is consumed

● "Hypermedia APIs are not fully baked yet"

Page 41: Linked data at globo.com

Brainiak hypermedia graph

context instance

/ schema

inCollection

item

instances

instances

describedBy

self

replacedelete

self

instances

self

self

self

create

collection

Page 42: Linked data at globo.com

Services

● List Contexts

● List Collections

● Get a Schema

● List Prefixes

● Status of Services

● Create

● Retrieve

● Delete

● Edit

● List

Instances

Page 43: Linked data at globo.com

Features

● JSON-Schema

● JSON-LD

● REST

● Python + Tornado

OPTIONS GET PUT POST DELETE

Page 44: Linked data at globo.com

/sports/Team

Brainiak query

GET

Page 45: Linked data at globo.com

Brainiak response

Page 46: Linked data at globo.com

Brainiak response

Page 47: Linked data at globo.com

Brainiak response

Page 48: Linked data at globo.com

Brainiak response

Page 49: Linked data at globo.com

SPARQL query

SELECT DISTINCT ?classWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class .}

task: retrieve all superclasses of a class

Page 50: Linked data at globo.com

SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_propertyWHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } }}

task: retrieve all properties of a group of classes

Page 51: Linked data at globo.com

SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . }}

}

task: retrieve the cardinalities of all properties of a certain class

Page 52: Linked data at globo.com

/place/City/_schema

Brainiak query

GET

Page 53: Linked data at globo.com

● SEO (automatic schema.org)

● Improved annotator (DBpedia Spotlight)

● Richer content relationships (inference)

● Link to open data (e.g. DBPedia, dados.gov.br)

Next steps

Page 54: Linked data at globo.com

Stay tuned

@brainiak_api

... will be soon released as an open source project !

Page 55: Linked data at globo.com

Semantic [email protected]

globo.com

Thank you for the attention!