Big Data and NoSQL What and Why?. Motivation: Size WWW has spawned a new era of applications that need to store and query very large data sets –Facebook

Big Data and NoSQL

What and Why?

Motivation: Size• WWW has spawned a new era of applications

that need to store and query very large data sets– Facebook / 1.1 billion (sep 2012) [Yahoo finance]– Twitter / 500 mio (jul 2012) [techcrunch.com]– Netflix / 30 mio (okt 2012) [Yahoo finance]

– Google / index ~45 billion pages (nov 2012) • [http://www.worldwidewebsize.com/]

• So – how do we store and query this type of data set?

2

Motivation: Scalability• Another characteristica is the potential rapidly

expanding market– Facebook (end of year data):

• 2004: 1 mio• 2006: 12 mio• 2008: 150 mio• 2010: 600 mio• 2012: 1.100 mio

• How to scale fast enough?

3

Motivation: Availability• Bad user experience

– Down time– Long waits

• ... Equals lost users

• How to ensure good user experience?

4

From Alvin Richard’s Goto presentation

5

Requirements:• So the QA requirements are

– Performance:• Time-dimension: fast storage and retrival of data• Space-dimension: large data sets

– Availability: of data service

6

The players• The old workhorse

– RDBM: Relational DataBase Management• The hyped and forgotten (?)

– OODB– XMLDB

• BaseX

• The hyped and uncertain– NoSQL

7

RDBM

• Performance:– Time

• Queries do not scale linearly in the size of the data set

– 10.000 rows ~ 1 second– 1 billion rows ~ 24 hours

8[Jacobs (2009)]

RDBM• Performance:

– Space• Generally, RDBM are rather monolith

One RDBM = One machine

• Limitations on disk size, RAM, etc.

– (There are ‘tricks’ to distribute data over many machines, but it is generally an ‘after-the-fact’ design, not something considered a premise...)

9

RDBM• Availability:

– OK I think...• Redundancy, redundant disk arrays, the usual bag of tricks

10

RDBM Performance

Starting with Time behaviour

11

RDBM Space tricks• Sharding

• Exercise: Give examples from WoW …

12

[Wikipedia]

Why do RDBMs become slow?• Jacobs’ paper explains the details.• It boils down to:

– The memory hierarchy• Cache, RAM, Disk, (Tape)• Each step has a big performance penalty

– The reading pattern• Sequential read faster than Random read• Tape: obvious but even for RAM this is true!

13

Issue 1

14

Thus

Issue 1: Large data sets cannot reside in RAM

Thus read/write time is slow

15

Issue 2• Issue

– Often one domain type is fixed at ”large”• Number of customers/users ( < 7 billion )• Number of sensors

– While other domains increase indefintely• Observations over time

– each page visit, transaction, sensor reading• Data production

– tweets, facebook update, log entry

16

But...

The relational database model explicitly ignores the ordering of rows in tables.

17

Example• Transactions recorded by a web server

– Transaction log is inherently time ordered

– Queries are usually about timeas well

• How may last day?• Most visiting user last week?

• However results in randomvisits to the disk!

18

Normalization• It becomes even worse due to the classic RDBM

virtue of normalization:• A user and a transaction table

19

Jacobs’ statement

20

Denormalization technique

• Denormalization: – Make aggregate

objects– Allows much faster

access– Liability: Much more

space intensive

• And!– Store data in the

sequences they are to be queried…

21

NoSQL Databases

A new take on storing and quering big data

22

Key features• NoSQL: ”Not Only SQL” / Not Relational

– Horizontal scaling of simple operations over many servers

– Repliation and partitioning data over many servers– Simple call level interface– Weaker concurrency model than ACID– Efficient use of RAM and dist. Indexes for storage– Ability to dynamically add new attributes to records

• Architectural Drivers:– Performance– Scalabilty

23

[Cattel, 2010]

Clarifications• NoSQL focus on

– Simple operations• Key lookup, read/write of one or a few records• Opposite: Complex joins over many tables (SQL)

• (NoSQL generally denormalize and thus do not require joins)

– Horizontal scaling• Many servers with no RAM nor disk sharing• Commodity hardware

– Cheap but more prone to failures

24

NoSQL DB Types

25

Basic types• Four types

– Key-value stores• Memcached, Riak, Regis

– Document stores• MongoDB

– Extensible record (Fowler: Column-family)• Cassandra, Google BigTable

– Graph stores• Neo4J

26

Fowler: Aggregate-oriented

Key-value• Basically the Java Map<KeyType, ValueType>

datastructure:– Map.put(”pid01-2012-12-03-11-45”, measurement);

– m = Map.get(”pid01-2012-12-03-11-45”);• Schema: Any value under any key…• Supports:

– Automatic replication– Automatic sharding– Both using hashing on the key

• Only lookup using the (primary) key27

Document• Stores ”documents”

– MongoDB: JSON objects.– Stronger queries, also in

document contents– Schema: Any JSON object may be stored!– Atomic updates, otherwise no concurrency control

• Supports– Master-slave replication, automatic failover and

recovery– Automatic sharding

• Range-based, on shard key (like zip-code, CPR, etc.)

28

Extensible Record/Column• First kid on the block: Google BigTable• Datamodel

– Table of rows and columns• Scalability model: splitting both on rows and columns• Row: (Range) Sharding on primary key• Column: Column groups – domain defined clustering

– No Schema on the columns, change as you go– Generally memory-based with periodic flushes to disk

29

(Graph)• Neo4J

30

CAP Theorem

31

Reviewing ACID• Basic RDBM teaching talks on ACID• Atomicity

– Transaction: All or none succeed• Concistency

– DB is in valid state before and after transaction• Isolation

– N transactions executed concurrently = N executed in sequence

• Durability– Once a transaction has been committed, it will remain

so (even in the event of power loss, crash)

32

CAP• Eric Brewer: only get two of the three:• Consistency

– Set of operations has occurred at once (Client view)

• Availability– Operations will terminate in intended reponse

• Partition tolerence– Operation will complete, even if components are

unavailable

33

Horizontal Scaling: P taken• We have already taken P, so we have to relax

either

– Consistency

– Availability

• RDBM prefer consistency over availability• NoSQL prefer availability over consistency

– replacing it with eventual consistency

34

Achieving ACID• Two-phase Commit

1. All partitions pre-commit, report result to master2. If success, master tells each to commit; else roll-back

• Guaranty consistency, but availability suffer• Example

– Two partitions, 99.9% availability• => 99.92 = 99.8% (+43 min down every month)

– Five partitions: • 99,5% (36 hours down time in all)

35

BASE• Replace ACID with BASE

• BA: Basically Available• S: Soft state• E: Eventual consistent

• Availability achieved by partial failures not leading to system failures– In two-phase commit, what would master do if one

partition does not repond?

36

Eventual Consistency• So what does this mean?

– Upon write, an immediate read may retrieve the old value – or not see the newest added item!

• Why? Gets data from a replica that is not yet updated…

– Eventual consistency: • Given sufficiently long period of time which no updates, all

replicas are consistent, and thus all reads consistently return the same data…

• System always available, but state is ‘soft’ / cached

37

Discussion• Web applications

– Is it a problem that a facebook update takes some minutes to appear at my friends?

38

Summary• RDBMs have issues with ‘big data’

– Ignores the physical laws of hardware (random read)– RDB model requires joins (random read)

• NoSQL– Class of alternative DB models and impl.– Two main classes

• Key-value stores• Document stores

• CAP– You can only get two of three– NoSQL choose A, and sacrifice C for Eventual

ConsistencyCS@AU Henrik Bærbak Christensen 39

Documents

Big Data and NoSQL What and Why?. Motivation: Size WWW has spawned a new era of applications that need to store and query very large data sets –Facebook