26
DATABASE SYSTEMS GROUP NoSQL Data Models The 4 Main NoSQL Data Models: Key/Value Stores Document Stores Wide Column Stores Graph Databases Big Data Management and Analytics 71

NoSQL Data Models - LMU

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

The 4 Main NoSQL Data Models:

• Key/Value Stores

• Document Stores

• Wide Column Stores

• Graph Databases

Big Data Management and Analytics 71

Page 2: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Key/Value Stores:

• Most simple form of database systems

• Store key/value pairs and retrieve values by keys

• Values can be of arbitrary format

Big Data Management and Analytics 72

10213

10334

10023

51

Page 3: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Key/Value Stores:

• Consistency models range from Eventual consistency toserializibility

• Some systems support ordering of keys, which enablesefficient querying, like range queries

• Some systems support in-memory data maintenance, someuse disks

→ There are very heterogeneous systems

Big Data Management and Analytics 73

Page 4: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Key/Value Stores - Redis:

• In-memory data structure store with built-in replication, transactions and different levels of on-disk persistence

• Support of complex types like lists, sets, hashes, …

• Support of many atomic operations

>> SET val 1>> GET val => 1>> INCR val => 2>> LPUSH my_list a (=> ‘a‘)>> LPUSH my_list b (=> ‘b‘,‘a‘)>> RPUSH my_list c (=> ‘b‘,‘a‘,‘c‘)>> LRANGE my_list 0 1 => b,aBig Data Management and Analytics 74

Page 5: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Key/Value Stores – The Redis cluster model:

• Data is automatically sharded across nodes

• Some degree of availability, achieved by master-slave architecture (but cluster stops in the event of larger failures)

• Easily extendable

Big Data Management and Analytics 75

Page 6: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Key/Value Stores – The Redis cluster model:

Big Data Management and Analytics

A

B

C

Hash slotsNodes

0

5000

5001

10000

10001

14522

A

B

C

Hash slotsNodes

0

4000

4001

8000

8001

12000

D12001

14522

add

node

remove

node

A

B

Hash slotsNodes

0

7500

7501

14522

Page 7: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Key/Value Stores – The Redis cluster model:

Big Data Management and Analytics

A

B

C

MasterNodes Hash slots

0

5000

5001

10000

10001

14522

Hash slots 5001 – 10000 cannot be used anymore

A

B

C

MasterNodes Hash slots

0

5000

5001

10000

10001

14522

A‘

B‘

C‘

Slave Nodes

ReplicatedHash slots

0

5000

5001

10000

10001

14522

Slave node B‘ is promoted asthe new master and hash slots5001 – 10000 are still available

Page 8: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

Big Picture

CAP Theorem:

Big Data Management and Analytics 78

C

A PA

CACID

BASE

All clients alwayshave the same view

of the data

The system works welldespite physicalnetwork partitions

Each client can al-ways read and write

AP-Systems

AC-Systems CP-Systems- Redis

- Dynamo

Key/Value Stores

- RDBMSs (MySQL, Postgres, …)

Page 9: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Document Stores:

• Store documents in form of XML or JSON

• Semi-structured data records that do not have a homogeneous structure

• Columns can have more than one value (arrays)

• Documents include internal structure, or metadata

• Data structure enables efficient use of indexes

Big Data Management and Analytics 79

Page 10: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Document Stores:

Given following text:

→ Find all <contact>s where <zip> is “12345“

Big Data Management and Analytics 80

Max MustermannMusterstraße 12D-12345 Musterstadt

<contact><first_name>Max</first_name><last_name>Mustermann</last_name><street>Musterstraße 12</street><city>Musterstadt</city><zip>12345</zip><country>D</country>

</contact>

Page 11: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Document Stores:

• Data stored as documents in binary representation (BSON)

• Similarly structured documents are bundled in collections

• Provides own ad-hoc query language

• Supports ACID transactions on document level

Big Data Management and Analytics 81

Page 12: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Document Stores:

MongoDB Data Management:– Automatic data sharding– Automatic re-balancing

• Multiple sharding policies:– Hash-based sharding: Documents are distributed

according to an MD5 hash → uniform distribution– Range-based sharding: Documents with shard key values

close to one another are likely to be co-located on the same shard → works well for range queries

– Location-based sharding: Documents are partitioned wrtto a user-specified configuration that associates shard key ranges with specific shards and hardware

Big Data Management and Analytics 82

Page 13: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Document Stores:

MongoDB Consistency & Availabilty:

• Default: Strong consistency (but configurable)

• Increased availability through replication

– Replica sets consist of one primaryand multiple secondary members

– MongoDB applies writes on theprimary and then records the operations on the primary’s oplog

Big Data Management and Analytics 83

Page 14: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

Big Picture

CAP Theorem:

Big Data Management and Analytics 84

C

A PA

CACID

BASE

All clients alwayshave the same view

of the data

The system works welldespite physicalnetwork partitions

Each client can al-ways read and write

AP-Systems

AC-Systems CP-Systems- Redis

- Dynamo

Key/Value StoresDocument Stores

- MongoDB

- CouchDB

- RDBMSs (MySQL, Postgres, …)

Page 15: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Wide Column Stores:

• Rows are identified by keys

• Rows can have different numbers of columns (up tomillions)

• Order of rows depend on key values (locality is important!)

• Multiple rows can be summarized to families (or tablets)

• Multiple families can be summarized to a key space

Big Data Management and Analytics 85

Page 16: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Wide Column Stores:

Big Data Management and Analytics 86

Key Space

Column Family

RowKey

Column Name

Value

Column Name

Value

Column Name

Value

RowKey

Column Name

Value

RowKey

Column Name

Value

Column Name

Value

Column Name

Value

Column Name

Value

Column

Column Family

Page 17: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Wide Column Stores:

Big Data Management and Analytics 87

Key Space „Edibles“

Column Family „Fruit“

Applecolor

„green“

weight

95

variety

„Granny Smith“

Cherrycolor

„red“

Lemoncolor

„yellow“

weight

50

origin

„Egypt“

flavor

„sour“

Column Family „Vegetable“

Carrot2015-08-11

65

2015-08-12

50

2015-09-21

87

Page 18: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Wide Column Stores:

• Developed by Facebook, Apache project since 2009

• Cluster Architecture:– P2P system (ordered as rings)– Each node plays the same role

(decentralized)– Each node accepts read/write operations

• User access through nodes via Cassandra Query Language (CQL)

Big Data Management and Analytics 88

N

NN

N

NN

N

NN

NN

N

NN

N

N

Page 19: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Wide Column Stores:

Consistency

• Tunable Data Consistency (choosable per operation)

• Read repair: if stale data is read, Cassandra issues a readrepair → find most up-to-date data and update stale data

• Generally: Eventually consistent

• Main focus on availability!

Big Data Management and Analytics 89

Page 20: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

Big Picture

CAP Theorem:

Big Data Management and Analytics 90

C

A PA

CACID

BASE

All clients alwayshave the same view

of the data

The system works welldespite physicalnetwork partitions

Each client can al-ways read and write

AP-Systems

AC-Systems CP-Systems- Redis

- Dynamo

Key/Value StoresDocument StoresWide Column Stores

- MongoDB

- CouchDB

- Cassandra

- HBase

- RDBMSs (MySQL, Postgres, …)

Page 21: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Graph Databases:

• Use graphs to store and represent relationships betweenentities

• Composed of nodes and edges

• Each node and each edge can contain properties (Property-Graphs)

Big Data Management and Analytics 91

AliceAlice

BobBob

CarolCarolDaveDave

lentmoney to

knows

Page 22: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Graph Databases:

Alice is a friend of Bob and vice versa. They both love the movie „Titanic“.

Big Data Management and Analytics 92

name = „Alice“ name = „Bob“

title = „Titanic“

Page 23: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Graph Databases:

Alice is a friend of Bob and vice versa. They both love the movie „Titanic“.

Big Data Management and Analytics 93

name = „Alice“ name = „Bob“

title = „Titanic“

Person Person

Movie

Page 24: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Graph Databases:

Alice is a friend of Bob and vice versa. They both love the movie „Titanic“.

Big Data Management and Analytics 94

name = „Alice“ name = „Bob“

title = „Titanic“

Person Person

Movie

is a friend of

is a friend of

loves loves

Page 25: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

NoSQL Data Models

Graph Databases:

• Master-Slave Replication (no partitioning!)

• Consistency: Eventual Consistency (tunable to Immediate Consistency)

• Support of ACID Transactions

• Cypher Query Language

• Schema-optional

Big Data Management and Analytics 95

Page 26: NoSQL Data Models - LMU

DATABASESYSTEMSGROUP

Big Picture

CAP Theorem:

Big Data Management and Analytics 96

C

A PA

CACID

BASE

All clients alwayshave the same view

of the data

The system works welldespite physicalnetwork partitions

Each client can al-ways read and write

AP-Systems

AC-Systems CP-Systems

- RDBMSs (MySQL, Postgres, …)

- Redis

- Dynamo

Key/Value StoresDocument StoresWide Column StoresGraph Databases

- MongoDB

- CouchDB

- Cassandra

- HBase

- Neo4J