View
220
Download
0
Category
Preview:
Citation preview
An RDF and XML DatabaseJohn Snelson, Lead Engineer23rd October 2013
Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
MarkLogic
SEARCHDATABASE
APPLICATION SERVICES
Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Data ≠
Information
Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Data +Context =Information
Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Dynamic Semantic PublishingBBC Sports
Size and Complexity: # of athletes # of teams # of assets (match
reports, statistics, etc.) # of relations (facts)
Rich user experience See information in
context Personalize content Easy navigation Intelligently serve ads
(outside of UK)
Manageable Static pages? Too
many, changing too fast
Limited number of journalists
Automate as much as possible
The Challenge Goals
Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Dynamic Semantic PublishingA Solution
Store, manage documents
Stories Blogs Feeds Profiles
Store, manage values Statistics
Full-Text search Performance,
scalability Robustness
Metadata about documents
Tagged by journalists Added
(semi-)automatically Inferred
Facts reported by journalists
Linked Open Data for real-world facts
XML Database Triple Store
Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
played in
plays in
plays for
Dynamic Semantic PublishingUnderstanding Data
Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Dynamic Semantic PublishingScaling Up
Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is RDF?
:has-child:has-parent
:birth-place
:spouse
:spouse
:birth
-pla
ce
:has-child
:has-parent:person20:person5
:place5 :first-name:person4 “John”
Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What is RDF?
• Schema-less• Triple granularity• Open world assumption• Joins - the cost of granularity
RDF
Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Data stored in Triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London""London" : isIn : "England"
What is Semantics?
Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Data stored in Triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London""London" : isIn : "England"
Rules tell us something about the triples
Example:
If (A livesIn X) AND (X isIn Y) then (A livesIn Y)
Inference: "John Smith" : livesIn : "England"
What is Semantics?
Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Data stored in Triples
Expressed as Subject : Predicate : Object
Example:
"John Smith" : livesIn : "London""London" : isIn : "England"
Rules tell us something about the triples
What is Semantics?
"John Smith" "England"livesIn
"London"isIn
livesIn
Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Semantics Architecture
TRIPLE
XQY XSLT SQL SPARQL
GRAPHSPARQL
Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Triple Index
• 3 triple orders• Cached for performance• Works seamlessly with other indexes• Security• 150 bytes per triple on disk• Billions of triples per host• Scaling out horizontally
TRIPLE
Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
RDF Loading
RDF
Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Triples Embedded in Documents
…<sem:triple> <sem:subject> http://example.org/kennedy/person12 </sem:subject> <sem:predicate> http://example.org/kennedy/last-name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"> Lawford </sem:object></sem:triple>…
Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Content, Data, and Semantics
<SAR>
<title>Suspicious vehicle…Suspicious vehicle near airport
<date>
<type>
<threat>
2012-11-12Z
observation/surveillance
<type>suspicious activity
<category>suspicious vehicle
<location>
<lat>37.497075
<long>-122.363319
<subject>IRIID
<subject>IRIID
<predicate>
<predicate>
isa
value
<triple>
<triple>
<object>license-plate
<object>ABC 123
<description>A blue van…A blue van with license plate ABC 123 was observed parked behind the airport sign…
</title>
</date>
</type>
</type>
</category>
</threat>
</lat>
</long>
</location>
</subject>
</subject>
</predicate>
</predicate>
</object>
</object>
</description>
</SAR>
</triple>
</triple>
Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Content, Data, and Semantics
<SAR>
<title>
Suspicious vehicle…
<date>
2012-11-12Z
<type>
<threat>
suspicious activity
<category>
suspicious vehicle
<location>
<lat>
37.497075
<long>
-122.363319
<description>
A blue van…
<subject>
<subject>
<predicate>
<object>
IRIID
IRIID
isa
value
license-plate
ABC 123<predicate>
<object>
observation/surveillance<type>
<triple>
<triple>
Semantic
(RDF)
Triples
Unstructured full-
text
Geospati
alData
Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
RDF Values
<http://example.org/kennedy/person4>
“string value”^^xs:string
“987”^^xs:double
“2013-04-09”^^xs:date “bonjour”@fr
_:blank1
“simple”
Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Datatype Mapping
Datatype SPARQL XQuery
Typed Literal
“2013-04-09”^^xs:date
xs:date(“2013-04-09”)
IRI <http://example.com> sem:iri(“http:// example.com”)
Blank Node _:blank1 sem:blank(“…”)
Simple Literal “simple” xs:string(“simple”)
Language “bonjour”@frTaggedLiteral
rdf:langString(“bonjour”,“fr”)
Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SPARQL
• Executed using the triple index• SPARQL 1.0 + much of SPARQL 1.1• Cost-based optimization• Join ordering and algorithms
select * where { ?person :birth-place ?place; :first-name “John”}
SPARQL
Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Executing SPARQL
sem:sparql(“ prefix : <http://example.org/kennedy/> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”))
Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Returning Binding Solutions
select * where { ?person :birth-place :place5}
select * where { ?person :birth-place ?place; :first-name “John”}
Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Solution Results
person place
:person22 :place13
:person4 :place5
map:map
Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SPARQL Query Results XML Format
sem:query-result-serialize( sem:sparql(“select * { … }”), “xml”)
Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Returning Triples
describe :person4
construct { ?bp :uses-name ?fn} where { ?person :birth-place ?bp; :first-name ?fn}
Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Triple Resultssem:triple
:place0 :uses-name “Ethel”, “Jeffrey”, “Kara” .:place1 :uses-name “Edward”, “James” .:place10 :uses-name “Robert”, “Sheila”, “Stephen” .
sem:iri
Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Querying Named Graphs
select *from <http://my_graph>where { ?s ?p ?o }
select * where { graph <http://my_graph> { ?s ?p ?o }}
collection
Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Restricting The Datasets
let $options := “properties”let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) )return sem:sparql(“…”,(),(), $options,$query)
Slide 32 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Creating Triples
• sem:triple()• sem:rdf-parse()• sem:rdf-get()• sem:rdf-builder()
• sem:rdf-load()• sem:rdf-insert()
Returning sem:triple values
Inserting to a database
Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Graph Store API
declare function graph-insert(
$graphname as sem:iri,
$triples as sem:triple*,
[$permissions as element(sec:permission)*,
$collections as xs:string*,
$quality as xs:int?,
$forest-ids as xs:unsignedLong*]
) as xs:string*;
declare function graph-delete(
$graphname as sem:iri
) as empty-sequence();
Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Conclusion
• Semantics can enhance your data-oriented and search applications.• XQuery and SPARQL work well together.• A combination RDF and XML database simplifies working with the technologies together.• Try MarkLogic 7: http://www.marklogic.com/early-access/
Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Any Questions?
Recommended