40
Databases 1 Daniel POP

Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Databases 1

Daniel POP

Page 2: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Week 12

Page 3: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Agenda

1. Introduction to NoSQL databases

2. Data representation in NoSQL

– XML

– JSON

3. Key-value database

4. (Wide) Column database

5. Document database

6. Graph database

Page 4: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Introduction to NoSQL databases

Page 5: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational databases advantages

1. Reliability

2. Safety

3. ACID: Atomicity, Consistency, Isolation,

Durability

4. Easy, standardized query language (SQL)

Page 6: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational databases limitations

1. Scalability

• usually costly scale-up approach

• problems with distributed databases (difficult to

join distributed tables)

2. Impedance mismatch

• different representations in memory and database

3. Complexity

• data as tables, schema-bound

4. Query language (SQL) only for structured data

5. Dealing with semi-/un- structured data

Page 7: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Unstructured data explosition

Page 8: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Semi-structured data examples

Page 9: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Un-structured data examples

Page 10: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

What is Big Data?

From a technology perspective, Big Data is defined as

those data sets whose size, type, and speed-of-creation

make them impractical to process and analyse

with traditional database technologies and related tools in

a cost- or time-effective way.

Source: wikibon.org

Page 11: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

4 Vs of BigData

• Volume: – 12TB of tweets / day => product sentiment analysis

• Velocity: – fraud detection

– predict customer churn faster (anlyse 500 million daily calls in real-time)

• Variety: – exploit 80% growth of un&semi-structured data for customer satisfaction

• Variability / Veracity:– variance in meaning (1 in 3 business leader don‘t trust the information they

use to make decisions)

Source: Brian Hopkins (Forrester)

Page 12: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Distributed systemsDistributed system is a “collection of independent computers that appear to

the users of the system as a single computer” (Tanenbaum)

Page 13: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Requirements of distributed applications

1. Consistency: all nodes ‘see’ the same data at the same

time

2. Availability: guarantee that every request receives a

response about whether it succeeded or failed

3. Partition tolerance: the system continues to operate

despite arbitrary message loss or failure of part of the

system

Page 14: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Brewer’s CAP Theorem

1. Consistency: all nodes ‘see’ the same data at the same

time

2. Availability: guarantee that every request receives a

response about whether it succeeded or failed

3. Partition tolerance: the system continues to operate

despite arbitrary message loss or failure of part of the

system

Brewer’s (CAP) Theorem: it is impossible for

a distributed computer system to

simultaneously provide all three requirements.

Page 15: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Consistency vs. Availability

More at https://www.youtube.com/watch?v=ASiU89Gl0F0

Page 16: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

BASE instead of ACID

1. Basic Availability

2. Soft-state

3. Eventual consistency: informally guarantees that, if

no new updates are made to a given data item,

eventually all accesses to that item will return the last

updated value

Page 17: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

NoSQL = Not Only SQL

• Not based on relational model, not using SQL

• No fixed schema, new attributes can be added to

data items at any time (schema-less)

• Designed for distributed, large clusters (scale-

out), including support for distributed processing

(MapReduce)

• Deliver eventual consistency

• Data-model specific query languages

Page 18: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

NoSQL: classification and usage pattern

Data

Size

Data

Complexity

Key-value DB

(Wide) Column DB

Document DB

Graph DB

Data model

Page 19: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Visual guide to NoSQL systems

Page 20: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Trends

Source: http://db-engines.com/en/ranking_trend

Page 21: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

NoSQL brief history

• Carlo Strozzi (1998) – to name his lightweight, open-source,

relational database, but without SQL interface

• Johan Oskarsson and Eric Evans (2009) – meetup on distributed structured data storage; they used #nosql as Twitter hashtag

for this

• Google Bigtable (2006)

• Amazon DynamoDB (2007)

• ....

• Used on large scale at Amazon, Facebook, Google, Twitter etc.

Page 22: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Bibliography

Pramod J. Sadalage, Martin Fowler. NoSQL Distilled.

Addison Wesley, 2012

Watch M. Fowler @ NoSQL matters Conference in

Cologne, Germany 2013

http://martinfowler.com/nosql.html

1. Lorenzo Alberton. NoSQL Databases: why, what and when

2. Yousof Alsatom. Introduction to NoSQL

3. Wikipedia

4. Introduction to NoSQL on w3resource.com

5. http://www.nosql-database.org/ - list, by data model, of NoSQL databases

Page 23: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

XML Data

Page 24: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Extensible Markup Language

• Standard for data representation and exchange

• Document format similar to HTML

– Tags describe content instead of formatting

• Also streaming format Nested Elements

•described by tag

•may have attributes

•text

Rules of well-formed XML:

• Single root element

• Matched tags and nesting

• Unique attributes within

elements

Page 25: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Working with XML documents

• XML Parser

– Returns an error message (if XML document is not well

formed) or a ‘hook’ to manipulate the document

– 3 types of parsers

• Document Object Model (DOM) based (e.g. Xerces, libxml2)

• Event base, such as Simple API for XML (SAX) (e.g. Expat, JAXP

API)

• In the middle, Streaming API for XML (StAX) (e.g. Sun Java StAX

XML Processor, Woodstox)

• Displaying XML

– Convert XML document to HTML using CSS (Cascading Style

Sheets) or XSLT (Extensible Stylesheet Language

Transformations)

Page 26: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Validating XML documents

Content-specific specifications

- Document Type Descriptor (DTD)

- XML Schema Definition (XSD)

Tools: LibXML 2 (XML Lint)

Page 27: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Document Type Descriptor (DTD)

Can specify

• Elements,

• Attributes,

• Nesting,

• Ordering,

• Number of occurrences,

• ID and IDREF

Page 28: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Document Type Descriptor (DTD)* = 0 or more

| = or

? = optional

+ = 1 or more

CDATA = string

#PCDATA = text

Page 29: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Document Type Descriptor (DTD)* = 0 or more

| = or

? = optional

+ = 1 or more

CDATA = string

#PCDATA = text

ID = identifier

IDREF = refers to

one identifier

IDREFS = refers to

one or more

identifiers

Source: Jennifer Widom – Database Course, Stanford University

Page 30: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

XML Schema Definition (XSD)

Can specify

• Elements,

• Attributes,

• Nesting,

• Ordering,

• Number of occurrences,

• Data types,

• Keys,

• References (typed pointers),

• … and many more

Page 31: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

XML Schema Definition (XSD)

Can specify

• Elements,

• Attributes,

• Nesting,

• Ordering,

• Number of occurrences,

• Data types,

• Keys,

• References (typed pointers),

• … and many more

Page 32: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Typed values, occurrences constraints

• Typed values

<xsd:attribute name=“Price” type=“xsd:integer” use=“required”>

• Occurrences constraints

<xsd:element name="Book"

type="BookType”

minOccurs="0"

maxOccurs="unbounded" />

Remark: If minOccurs/maxOccurs is not specified, the default

value used is 1.

Page 33: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Keys and references

• Key declarations

<xsd:key name="BookKey">

<xsd:selector xpath="Book" />

<xsd:field xpath="@ISBN" />

</xsd:key>

• References (typed pointers)

<xsd:keyref name="BookKeyRef" refer="BookKey">

<xsd:selector xpath="Book/Remark/BookRef" />

<xsd:field xpath="@book" />

</xsd:keyref>

Page 34: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational vs. XML

Source: Jennifer Widom – Database Course, Stanford University

Relational XML

Structure

Schema

Queries

Ordering

Implementation

Page 35: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational vs. XML

Source: Jennifer Widom – Database Course, Stanford University

Relational XML

Structure Tables Hierarchical

(tree, graph)

Schema

Queries

Ordering

Implementation

Page 36: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational vs. XML

Source: Jennifer Widom – Database Course, Stanford University

Relational XML

Structure Tables Hierarchical

(tree, graph)

Schema Fixed Flexible

Queries

Ordering

Implementation

Page 37: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational vs. XML

Source: Jennifer Widom – Database Course, Stanford University

Relational XML

Structure Tables Hierarchical

(tree, graph)

Schema Fixed Flexible

Queries Simple (SQL) More cumbersome

(XPath / XQuery)

Ordering

Implementation

Page 38: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational vs. XML

Source: Jennifer Widom – Database Course, Stanford University

Relational XML

Structure Tables Hierarchical

(tree, graph)

Schema Fixed Flexible

Queries Simple (SQL) More cumbersome

(XPath / XQuery)

Ordering None Implied

Implementation

Page 39: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Relational vs. XML

Source: Jennifer Widom – Database Course, Stanford University

Relational XML

Structure Tables Hierarchical

(tree, graph)

Schema Fixed Flexible

Queries Simple (SQL) More cumbersome

(XPath / XQuery)

Ordering None Implied

Implementation Native 1. XML-enabled

(extensions to

RDBMS)

2. Native XML

Page 40: Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed schema, new attributes can be added to data items at any time (schema-less) ... • Amazon

Bibliography

NoSQL Distilled

by Pramod J.

Sadalage and

Martin Fowler,

Addison

Wesley, 2012

Chapter 1

A First Course in

Database Systems

(3rd edition) by

Jeffrey Ullman and

Jennifer Widom,

Prentice Hall, 2007

Chapter 11