NoSQL in Real-time Architectures

Preview:

Citation preview

© 2014 Aerospike. All rights reserved ‹#›

NoSQL in Real-time Architectures

Ronen Botzer

Aerospike

© 2014 Aerospike. All rights reserved ‹#›

NoSQL?

What is NoSQL Anyway?

■ Strozzi NoSQL (1998) - an RDBMS that lacks support for

the Structured Query Language

■ A collective term for non-relational data stores (~2009)

■ Column: Cassandra, HBase, BigTable

■ Document: MongoDB, CouchDB

■ Key-value: Redis, Aerospike

■ Graph: OrientDB, Neo4j

■ BTW, SQL-like query languages are emerging in NoSQL

■ "Not Only SQL" is one of the worst backronyms, ever.

■ A vague "marketing" term describing NotREL databases

© 2014 Aerospike. All rights reserved ‹#›

Old Architecture ( scale out in 2000 )

APP SERVERS

CACHE CLUSTER

STORAGE

CONTENT

DELIVERY NETWORK

LOAD BALANCER

SHARDED RDBMS

SHARD MANAGER

© 2014 Aerospike. All rights reserved ‹#›

We Have a Problem, Part 1 - The RDBMS

■ Relational databases don't cluster well.

■ Most are not designed to scale well vertically, either.

■ They don't work at the velocities required by web

applications under high loads.

■ Schemas are too rigid for modern applications.

■ Relational databases were not designed for this

■ Designed in the era of single cores, expensive RAM, rotational

disks, and accounting for the huge speed difference between disk

and RAM. For example, disk-based indexes.

■ The days when DBAs controlled the design and access to the

schema, and dictated a glacial rate of change, with long design

and implementation cycles. Not adaptive or responsive.

■ Designed to power a single app, not a growing number of them.

© 2014 Aerospike. All rights reserved ‹#›

We Have a Problem, Part 2 - Architectural Impact

■ Architecting around the weakness of the RDBMS

■ Caches are added to compensate for slow reads and to

reduce query load.

■ Increased the complexity of application logic.

■ Caches have their own clustering problems.

■ Broke database consistency.

■ Only improves reads, write-load still an issue.

■ Increasing use of denormalization.

■ Various attempts at sharing relational databases

■ Shard managers are usually written wrong.

■ Hotspots often emerge due to unbalanced hashing.

■ Cluster rebalancing once nodes are added is painful.

■ Does not provide high-availability.

© 2014 Aerospike. All rights reserved ‹#›

Social Media

MYSQL or POSTGRES(ROTATIONAL DISK)

Recent user generated

content

Java application

tier

Data abstraction

and sharding

MODIFIED REDIS(SSD ENABLED)

Content and

Historical data

© 2014 Aerospike. All rights reserved ‹#›

Travel Portal

PRICING DATABASE(RATE LIMITED)

Poll for

Pricing

Changes

PRICING

DATA

Store

Latest

Price

SESSION

MANAGEMENT

Session

DataRead

Price

XDR

Airlines forced interstate

banking

Legacy mainframe

technology

Multi-company reservation

and pricing

Requirement: 1M TPS

allowing overhead

Travel App

© 2014 Aerospike. All rights reserved ‹#›

MILLIONS OF CONSUMERS

BILLIONS OF DEVICES

APP SERVERS

DATA

WAREHOUSEINSIGHTS

Advertising Technology Stack

WRITE CONTEXT

In-memory NoSQL

WRITE REAL-TIME CONTEXT

READ RECENT CONTENT

PROFILE STORE

Cookies, email, deviceID, IP address, location,

segments, clicks, likes, tweets, search terms...

REAL-TIME ANALYTICS

Best sellers, top scores, trending tweets

BATCH ANALYTICS

Discover patterns,

segment data: location

patterns, audience

affinity

© 2014 Aerospike. All rights reserved ‹#›

North American RTB speeds & feeds

■ 1 to 6 billion cookies tracked

■ Some companies track 200M, some track 20B

■ Each bidder has their own data pool

■ Data is your weapon

■ Recent searches, behavior, IP addresses

■ Audience clusters (K-cluster, K-means) from offline Hadoop

■ “Remnant” from Google, Yahoo is about 0.6 million / sec

■ Facebook exchange: about 0.6 million / sec

■ “other” is 0.5 million / sec

Currently about 3.0M / sec in North American

© 2014 Aerospike. All rights reserved ‹#›

Advertising Ecosystem

© 2014 Aerospike. All rights reserved ‹#›

Modern Scale Out Architecture

Load balancer

Simple stateless

APP SERVERS

IN-MEMORY NoSQL

RESEARCH

WAREHOUSE

CONTENT

DELIVERY NETWORK

LOAD BALANCER

Long term cold storageFast stateless

© 2014 Aerospike. All rights reserved ‹#›

Modern Scale Out Architecture

Load balancer

Simple stateless

APP SERVERS

IN-MEMORY NoSQL

RESEARCH

WAREHOUSE

CONTENT

DELIVERY NETWORK

LOAD BALANCER

Long term cold storageFast stateless

HDFS BASED

© 2014 Aerospike. All rights reserved ‹#›

Financial Services – Intraday Positions

LEGACY DATABASE(MAINFRAME)

Read/Write

Start of Day

Data Loading

End of Day

Reconciliation

QueryREAL-TIME

DATA FEED

ACCOUNT

POSITIONS

XDR

10M+ user records

Primary key access

1M+ TPS planned

Finance App

Records App

RT Reporting App

© 2014 Aerospike. All rights reserved ‹#›

Live analytics without ETL

http://www.aerospike.com/community/labs/

■ 'Old Hadoop' involves using MapReduce for ELT/ETL.

■ Integration points with fast NoSQL

■ Input format connector - using NoSQL as a faster storage layer.

■ Output format connector - skipping the L and the T.

■ Dynamic programming paradigm - shared nothing MR tasks have

to wait until the reduce phase to consolidate information. You can

lookup and update row-level data during the map phase instead.

© 2014 Aerospike. All rights reserved ‹#›

Live Analytics

Load balancer

Simple stateless

APP SERVERS

IN-MEMORY NoSQL

RESEARCH

WAREHOUSE

CONTENT

DELIVERY NETWORK

LOAD BALANCER

Long term cold storageFast stateless

Live Analytics

Kafka

© 2014 Aerospike. All rights reserved ‹#›

How fast can you go?

© 2014 Aerospike. All rights reserved ‹#›

– Geir Magnusson, CTO of AppNexus

Strata Santa Clara, 2014

“We run Aerospike heavily, peaking at 3 Million reads

per second and well over 1 1/2 million writes a second

in a very cost effective way. I don’t think there’s any

technology we’ve run into that even comes close.”

© 2014 Aerospike. All rights reserved ‹#›

Tada Pivorius, Developer at Adform

"Married to Cassandra", 2014

http://vimeo.com/102812401

“Adform scaled from a 32 node Cassandra cluster to a 3

node Aerospike cluster, managing 1 TB data at 120k tps.”

© 2014 Aerospike. All rights reserved ‹#›

Native Flash Performance

0

100,000

200,000

300,000

400,000

Balanced Read-Heavy

Aerospike

Cassandra

MongoDB

Couchbase 2.0**We were forced to exclude Couchbase...since when run with either disk or replica durability

on it was unable to complete the test.”

– Thumbtack Technology

0.

2.25

4.5

6.75

9.

11.25

0 50,000 100,000 150,000 200,000

Avera

ge L

ate

ncy,

ms

Throughput, ops/sec

Balanced Workload Read Latency

AerospikeCassandraMongoDB

0.

3.5

7.

10.5

14.

17.5

0 50,000 100,000 150,000 200,000

Avera

ge L

ate

ncy,

ms

Throughput, ops/sec

Balanced Workload Update Latency

AerospikeCassandraMongoDB

HIGH THROUGHPUT LOW LATENCY

Thr

ough

put,

TP

S

© 2014 Aerospike. All rights reserved ‹#›

YCSB Performance Comparison 2014

© 2014 Aerospike. All rights reserved ‹#›

Hot Analytics

■ High throughput Queries

■2 node cluster, 10 Indexes

■Query returns 100 of 50M records

■ Predictable low latency

UN-PREDICTABLE LATENCY

128 – 300 ms

70 – 760 ms

7 – 10 ms

QPS

© 2014 Aerospike. All rights reserved ‹#›

Amazon EC2 results

© 2014 Aerospike. All rights reserved ‹#›

Amazon EC2 results

© 2014 Aerospike. All rights reserved ‹#›

Lots of Clients & Examples

© 2014 Aerospike. All rights reserved ‹#›

Use Open Source

© 2014 Aerospike. All rights reserved ‹#›

How do we do it?

© 2014 Aerospike. All rights reserved ‹#›

WRITING RELIABLY WITH HIGH PERFORMANCE

1. Write sent to row master

2. Latch against simultaneous writes

3. Apply write to master memory and replica memory synchronously

4. Queue operations to disk

5. Signal completed transaction (optional storage commit wait)

6. Master applies conflict resolution policy (rollback/ rollforward)

master replica

1. Cluster discovers new node via gossip protocol

2. Paxos vote determines new data organization

3. Partition migrations scheduled

4. When a partition migration starts, write journal starts on destination

5. Partition moves atomically

6. Journal is applied and source data deleted

transactions

continueWriting with Immediate Consistency Adding a Node

© 2014 Aerospike. All rights reserved ‹#›

DATABASE

OS FILE SYSTEM

PAGE CACHE

BLOCK INTERFACE

SSD HDD

BLOCK INTERFACE

SSD SSD

OPEN NVM

SSD

Ask me and I’ll tell you the answer.Ask me. I’ll look up the answer and then tell it to

you.

DATABASE

HYBRID MEMORY SYSTEM™

•Direct device access

•Large Block Writes

•Indexes in DRAM

•Highly Parallelized

•Log-structured FS “copy-on-write”

•Fast restart with shared memory

FLASH OPTIMIZED HIGH

PERFORMANCE

© 2014 Aerospike. All rights reserved ‹#›

SHARED-NOTHING SYSTEM:100% DATA AVAILABILITY

■ Every node in a cluster is identical, handles both transactions and long running tasks

■ Data is replicated synchronously with immediate consistency within the cluster

■ Data is replicated asynchronouslyacross data centers

OHIO Data Center

© 2014 Aerospike. All rights reserved ‹#›

Recommended