Upload
ronen-botzer
View
424
Download
2
Tags:
Embed Size (px)
Citation preview
© 2014 Aerospike. All rights reserved ‹#›
NoSQL in Real-time Architectures
Ronen Botzer
Aerospike
© 2014 Aerospike. All rights reserved ‹#›
NoSQL?
What is NoSQL Anyway?
■ Strozzi NoSQL (1998) - an RDBMS that lacks support for
the Structured Query Language
■ A collective term for non-relational data stores (~2009)
■ Column: Cassandra, HBase, BigTable
■ Document: MongoDB, CouchDB
■ Key-value: Redis, Aerospike
■ Graph: OrientDB, Neo4j
■ BTW, SQL-like query languages are emerging in NoSQL
■ "Not Only SQL" is one of the worst backronyms, ever.
■ A vague "marketing" term describing NotREL databases
© 2014 Aerospike. All rights reserved ‹#›
Old Architecture ( scale out in 2000 )
APP SERVERS
CACHE CLUSTER
STORAGE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
SHARDED RDBMS
SHARD MANAGER
© 2014 Aerospike. All rights reserved ‹#›
We Have a Problem, Part 1 - The RDBMS
■ Relational databases don't cluster well.
■ Most are not designed to scale well vertically, either.
■ They don't work at the velocities required by web
applications under high loads.
■ Schemas are too rigid for modern applications.
■ Relational databases were not designed for this
■ Designed in the era of single cores, expensive RAM, rotational
disks, and accounting for the huge speed difference between disk
and RAM. For example, disk-based indexes.
■ The days when DBAs controlled the design and access to the
schema, and dictated a glacial rate of change, with long design
and implementation cycles. Not adaptive or responsive.
■ Designed to power a single app, not a growing number of them.
© 2014 Aerospike. All rights reserved ‹#›
We Have a Problem, Part 2 - Architectural Impact
■ Architecting around the weakness of the RDBMS
■ Caches are added to compensate for slow reads and to
reduce query load.
■ Increased the complexity of application logic.
■ Caches have their own clustering problems.
■ Broke database consistency.
■ Only improves reads, write-load still an issue.
■ Increasing use of denormalization.
■ Various attempts at sharing relational databases
■ Shard managers are usually written wrong.
■ Hotspots often emerge due to unbalanced hashing.
■ Cluster rebalancing once nodes are added is painful.
■ Does not provide high-availability.
© 2014 Aerospike. All rights reserved ‹#›
Social Media
MYSQL or POSTGRES(ROTATIONAL DISK)
Recent user generated
content
Java application
tier
Data abstraction
and sharding
MODIFIED REDIS(SSD ENABLED)
Content and
Historical data
© 2014 Aerospike. All rights reserved ‹#›
Travel Portal
PRICING DATABASE(RATE LIMITED)
Poll for
Pricing
Changes
PRICING
DATA
Store
Latest
Price
SESSION
MANAGEMENT
Session
DataRead
Price
XDR
Airlines forced interstate
banking
Legacy mainframe
technology
Multi-company reservation
and pricing
Requirement: 1M TPS
allowing overhead
Travel App
© 2014 Aerospike. All rights reserved ‹#›
MILLIONS OF CONSUMERS
BILLIONS OF DEVICES
APP SERVERS
DATA
WAREHOUSEINSIGHTS
Advertising Technology Stack
WRITE CONTEXT
In-memory NoSQL
WRITE REAL-TIME CONTEXT
READ RECENT CONTENT
PROFILE STORE
Cookies, email, deviceID, IP address, location,
segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS
Best sellers, top scores, trending tweets
BATCH ANALYTICS
Discover patterns,
segment data: location
patterns, audience
affinity
© 2014 Aerospike. All rights reserved ‹#›
North American RTB speeds & feeds
■ 1 to 6 billion cookies tracked
■ Some companies track 200M, some track 20B
■ Each bidder has their own data pool
■ Data is your weapon
■ Recent searches, behavior, IP addresses
■ Audience clusters (K-cluster, K-means) from offline Hadoop
■ “Remnant” from Google, Yahoo is about 0.6 million / sec
■ Facebook exchange: about 0.6 million / sec
■ “other” is 0.5 million / sec
Currently about 3.0M / sec in North American
© 2014 Aerospike. All rights reserved ‹#›
Advertising Ecosystem
© 2014 Aerospike. All rights reserved ‹#›
Modern Scale Out Architecture
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
© 2014 Aerospike. All rights reserved ‹#›
Modern Scale Out Architecture
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
HDFS BASED
© 2014 Aerospike. All rights reserved ‹#›
Financial Services – Intraday Positions
LEGACY DATABASE(MAINFRAME)
Read/Write
Start of Day
Data Loading
End of Day
Reconciliation
QueryREAL-TIME
DATA FEED
ACCOUNT
POSITIONS
XDR
10M+ user records
Primary key access
1M+ TPS planned
Finance App
Records App
RT Reporting App
© 2014 Aerospike. All rights reserved ‹#›
Live analytics without ETL
http://www.aerospike.com/community/labs/
■ 'Old Hadoop' involves using MapReduce for ELT/ETL.
■ Integration points with fast NoSQL
■ Input format connector - using NoSQL as a faster storage layer.
■ Output format connector - skipping the L and the T.
■ Dynamic programming paradigm - shared nothing MR tasks have
to wait until the reduce phase to consolidate information. You can
lookup and update row-level data during the map phase instead.
© 2014 Aerospike. All rights reserved ‹#›
Live Analytics
Load balancer
Simple stateless
APP SERVERS
IN-MEMORY NoSQL
RESEARCH
WAREHOUSE
CONTENT
DELIVERY NETWORK
LOAD BALANCER
Long term cold storageFast stateless
Live Analytics
Kafka
© 2014 Aerospike. All rights reserved ‹#›
How fast can you go?
© 2014 Aerospike. All rights reserved ‹#›
– Geir Magnusson, CTO of AppNexus
Strata Santa Clara, 2014
“We run Aerospike heavily, peaking at 3 Million reads
per second and well over 1 1/2 million writes a second
in a very cost effective way. I don’t think there’s any
technology we’ve run into that even comes close.”
© 2014 Aerospike. All rights reserved ‹#›
Tada Pivorius, Developer at Adform
"Married to Cassandra", 2014
http://vimeo.com/102812401
“Adform scaled from a 32 node Cassandra cluster to a 3
node Aerospike cluster, managing 1 TB data at 120k tps.”
© 2014 Aerospike. All rights reserved ‹#›
Native Flash Performance
0
100,000
200,000
300,000
400,000
Balanced Read-Heavy
Aerospike
Cassandra
MongoDB
Couchbase 2.0**We were forced to exclude Couchbase...since when run with either disk or replica durability
on it was unable to complete the test.”
– Thumbtack Technology
0.
2.25
4.5
6.75
9.
11.25
0 50,000 100,000 150,000 200,000
Avera
ge L
ate
ncy,
ms
Throughput, ops/sec
Balanced Workload Read Latency
AerospikeCassandraMongoDB
0.
3.5
7.
10.5
14.
17.5
0 50,000 100,000 150,000 200,000
Avera
ge L
ate
ncy,
ms
Throughput, ops/sec
Balanced Workload Update Latency
AerospikeCassandraMongoDB
HIGH THROUGHPUT LOW LATENCY
Thr
ough
put,
TP
S
© 2014 Aerospike. All rights reserved ‹#›
YCSB Performance Comparison 2014
© 2014 Aerospike. All rights reserved ‹#›
Hot Analytics
■ High throughput Queries
■2 node cluster, 10 Indexes
■Query returns 100 of 50M records
■ Predictable low latency
UN-PREDICTABLE LATENCY
128 – 300 ms
70 – 760 ms
7 – 10 ms
QPS
© 2014 Aerospike. All rights reserved ‹#›
Amazon EC2 results
© 2014 Aerospike. All rights reserved ‹#›
Amazon EC2 results
© 2014 Aerospike. All rights reserved ‹#›
Lots of Clients & Examples
© 2014 Aerospike. All rights reserved ‹#›
Use Open Source
© 2014 Aerospike. All rights reserved ‹#›
How do we do it?
© 2014 Aerospike. All rights reserved ‹#›
WRITING RELIABLY WITH HIGH PERFORMANCE
1. Write sent to row master
2. Latch against simultaneous writes
3. Apply write to master memory and replica memory synchronously
4. Queue operations to disk
5. Signal completed transaction (optional storage commit wait)
6. Master applies conflict resolution policy (rollback/ rollforward)
master replica
1. Cluster discovers new node via gossip protocol
2. Paxos vote determines new data organization
3. Partition migrations scheduled
4. When a partition migration starts, write journal starts on destination
5. Partition moves atomically
6. Journal is applied and source data deleted
transactions
continueWriting with Immediate Consistency Adding a Node
© 2014 Aerospike. All rights reserved ‹#›
DATABASE
OS FILE SYSTEM
PAGE CACHE
BLOCK INTERFACE
SSD HDD
BLOCK INTERFACE
SSD SSD
OPEN NVM
SSD
Ask me and I’ll tell you the answer.Ask me. I’ll look up the answer and then tell it to
you.
DATABASE
HYBRID MEMORY SYSTEM™
•Direct device access
•Large Block Writes
•Indexes in DRAM
•Highly Parallelized
•Log-structured FS “copy-on-write”
•Fast restart with shared memory
FLASH OPTIMIZED HIGH
PERFORMANCE
© 2014 Aerospike. All rights reserved ‹#›
SHARED-NOTHING SYSTEM:100% DATA AVAILABILITY
■ Every node in a cluster is identical, handles both transactions and long running tasks
■ Data is replicated synchronously with immediate consistency within the cluster
■ Data is replicated asynchronouslyacross data centers
OHIO Data Center
© 2014 Aerospike. All rights reserved ‹#›