Upload
lindsay-douglas
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
https://www.flickr.com/photos/jdhancock/5307754233; License: CC BY 2.0
Crossmedia-Publishing mit NoSQL-Techniken: Möglichkeiten, Einsatzszenarien, Bewertung
Christian Kohl, De Gruyter23.06.2015Cross-Media-Forum, München
1) Kurze Erläuterung NoSQL + XML
2) NoSQL + XML im Crossmedia-Publishing bei De Gruyter
Source: http://www.flickr.com/photos/ravescuritiba/773032554/
A very very short history of DB technology
1960s Hierarchical Era
Applikations- und Hardware spezifische DatenspeicherungIBM Mainframes bspw.
1970s+ Relational Era
Granularer Zugriff auf hochstrukturierte DatenTabellen: Spalten/ZeilenIBM, MS, Oracle, …+ SQL
2000s+ Any Structure Era
Schema agnostic, Massive scale, Query and search, Heterogeneous data, Unstructered, Faster time-to-resultsAmazon, Google, Facebook, LinkedIn, MarkLogic, …+ XQUERY, SPARQL, Gremlin, …
Image Source: https://www.flickr.com/photos/infocux/8450190120; License: CC BY-NC 2.0
Information Continuum
RDBMS
Semi- or Un-StructuredStructured
Free textRelational
Hierarchical Semi-structured
Emails DocumentsTime-varying
XML Metadata
Content
Geospatial
Sparse
Graph
Suchmaschine
Volumen von Information
Datenlandschaft heute
Source: Frank Föge, MarkLogic Corporation, 2014.
Datenmenge
Verlinkung
Semi-/Un-Structured Data
Verteilte, horizontal skalierbare Architekturen
Datenkomplexität / -heterogenität
Perf
orm
ance
Lohnliste
Großzahl d. Webanwendungen
Soziales Netzwerk
Semantic Trading?
Relationale DB
Anforderung der Applikation
RDBMS Performance
Source: Sam Bisbee, http://www.ibmbigdatahub.com/blog/exploring-nosql-family-tree.
Image Source: http://h5inc.files.wordpress.com/2011/04/warning-brain-explosion-zone.png
• Riak, Dynamo, Voldemort, …
Key/Value
• Cassandra, Hbase, BigTable, …
Column Oriented
• MarkLogic, CouchDB, MongoDB, …
Document Store
• Neo4j, InfiniteGraph, …
Graph
(Zu) Einfache NoSQL Taxonomie
Image Source: https://steenschledermann.files.wordpress.com/2014/05/no-thanks-were-too-busy1.jpg?w=611
NoSQL ermöglicht …
• Schnellere App Entwicklung• Heterogene Datentypen• Rapid Deployment• Starke horizontale Skalierbarkeit
hinsichtlich• Größe• Komplexität
Source: http://media.gamemanx.com/flv/sf4-ehonda-sagat.jpg
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
Developer Journey
Iterate
Load Data Sources“as-is”
(XML, JSON, Binary)
SearchTransformCombine
Data
Define Indexesfor Analytics
Data AccessWeb Application
User Interface
== Agile Process
Image Source: http://www.flickr.com/photos/rs-foto/1242024959/
DOXMLDBs
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
A book table looks like this…??Book
InfoTitle = “I Love Penguins”Author = “S. Lion”
Section• Chapter
PageParagraph = “I love penguins because…”
PageParagraph = “On the subject of food…”
• ChapterPage
Section• Chapter• Chapter• Chapter
• Paragraph• Paragraph
title author section …
I Love Penguins S. Lion
Issues with Sections? How many columns?
Option: Modeling hierarchies with relations (foreign key) is not efficient.
DB Schema mapping
Shredding
Foreign Keys & Joins
Performance Overhead
Maintenance Overhead
<meta> <URI> http://thewobbitaparody.blogspot.de</URI> <title> The Superfriends Of The Ring</title> <author>Paul Erickson</author></meta><body> (…) <section nr=„11“ title=„Promo‘s Afterparty“> <paragraph>Promo came in soon afterwards. He glanced about the condo and then quietly
asked "Is Uncle Bulbo gone yet?“ "Yes, at last," said Pantsoff. "I thought he'd never leave. Oh, he left something for you." He handed Promo the inter-office envelope. "Don't bother unwinding the string. Inside is his will, his trust documents, and his tax records. I think he left you his ring, too.“ "Oh, great," said Promo. "How long do I have to keep that stuff? Five years? Seven years? Forever? I hate filing." He stopped complaining for a moment. "You said his magic ring is in there too? Cool! I'll never have to pay a cover charge to enter a nightclub again!“ "Promo, you've inherited Bulbo's fortune, so stop thinking small for a change. Actually, don't think about the ring at all. Just put it away. Keep it secret, and keep it safe!"</paragraph>
(…)</body>
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
Dokument als Informations-Container<SAR>
<title>Suspicious vehicle…Suspicious vehicle near airport<date><type><threat>
2012-11-12Zobservation/surveillance
<type>suspicious activity<category>suspicious vehicle
<location><lat>37.497075<long>-122.363319
<subject>IRIID<subject>IRIID
<predicate><predicate>
isavalue
<triple><triple>
<object>license-plate<object>ABC 123
<description>A blue van…A blue van with license plate ABC 123 was observed parked behind the airport sign…
</title></date>
</type>
</type></category>
</threat>
</lat></long>
</location>
</subject></subject>
</predicate></predicate>
</object></object>
</description></SAR>
</triple></triple>
Metadaten, Daten, Beziehungen und Inhalte
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
Dokument als Informations-Container <SAR>
<title>
Suspicious vehicle…
<date>
2012-11-12Z
<type>
<threat>
suspicious activity<category>
suspicious vehicle
<location>
<lat>
37.497075
<long>
-122.363319
<description>
A blue van…
<subject><subject>
<predicate>
<object>
IRIID
IRIID
isa
value
license-plate
ABC 123<predicate>
<object>
observation/surveillance<type>
<triple>
<triple>
Semantic
(RDF)
Triples
Unstructured full-
text
Geospati
alValues
XML ist für Verlage
Source: http://www.flickr.com/photos/scotthudson/3448785931/
• De Gruyter Online• De Gruyter CMS• Maybe Asset
Management?• Maybe DataWarehouse?
NoSQL bei DG
De Gruyter Online
Dokumente
MetadatenAssets
Entitlements
Starkes Wachstum
Unterschiedlichste Daten
De Gruyter CMS
Dokumente
Metadaten
Triples
Assets
Häufiges Re-Arrangement der Daten: Änderungen bei Struktur und Verlinkung
Semantik