Upload
cody-allen
View
221
Download
1
Embed Size (px)
Citation preview
Intuitions for Scaling Data-Centric Architectures
Ben StopfordConfluent Inc
Intuitions for Scale
Intuition does not come to the unprepared mind
A.E.
Locality &Sequential Addressing
Computers work best with sequential workloads
Disk buffer
Page cache
L3 cacheL2 cache
L1 cache
Pre-fetch is your friend
Random vs. Sequential Addressing
300 reads/sec 200MB/s
e.g. sequential is ~7000x faster for 100B rows
This isn’t just Disk
L3
L2L1
Random RAM ~ Sequential Disk
10-100x
Files
We can write sequentially to a file quickly
Reading Efficiently
Scan
Position & Scan(pages)
Avoid Random Reads
Writing Tradeoffs
Append OnlyJournal
(Sequential IO)
Update in PlaceOrdered File(Random IO)
v2
v1
v2
v1
Supporting Lookups
Add Indexes for Selectivity
bob
dave fred hary mikesteve vince
Index
Heap file
Goodbye Sequential Write Performance
bob
dave fred hary mikesteve vince
Random IO
Sequential IO
Option A: Put Index in Memory
RAM
Disk
Option B: Use a chronology of small index files
Writes
batch up
sort
write to disk
older files
small index file
…with tricks to optimise out the need for random IO
RAM
Disk
file metadata & bloom filter
Log Structured Merge Trees
• A collection of small, immutable indexes
• Append only, de-duplicate by merging files
• Low memory index structures increase read performance
Shift problem of Random Access from “write” to “read” concern
Option C: Brute Force
A B C
A1A2A3A4
B1
B2
B3B4
C1
C2
C3
C4
‘column per file’ arrangement
same order for each file
Option C: Columnar
Merge Join
compressedcolumns
A1
A2A3A4
B1
B2
B3B4
C1
C2
C3
C4
Brute Force, by Column
• Less IO, by column, compressed• Held in Row order => merge joins via
rowid• Predicates can operate on
compressed data• Late materialisation.
Many of the most scalable technologies play to one of these core efficiencies
Riak, Mongo etc
RAM
Disk
Kafka
(Queues are Databases - 1995 Jim Gray)
Hbase, Cassandra, RocksDB etc
LSM
Redshift etc, Parquet (Hadoop)
A B C
A1
A2A3A4
B1
B2
B3B4
C1
C2
C3
C4
Parallelism
Partitioning & Replication
Partitioning - KV
K-V storessingle endpoint query routing
Partitioning - Batch
Divide and conquer
Partitioning: Concurrency Limits
Use of secondary indexes can limit concurrency at
scale
Replication
Replication
• Replication provides one route out of this.
• Replicas isolate load -> scales out concurrency for general workloads.
• Obviously provides redundancy etc too.
• If async, trades off against consistency (CAP)
Atomaticity & Ordering
These can be expensive
Solution: Avoid, Isolate or embrace disorder (Bloom etc)
Atomic(Mutable)
Immutable
Circling Synchronous, Mutable State
Trapped in the Persist & Query pattern… in
a fully ACID world
Separating Paradigms - CQRS
Client
Command
Query
DB DBDenormalis
e/
Precompute
DRUID
realtime node
historynode
Query hits both
Operational /Analytic BridgeD
ATA
Client
Client
ClientMutable
Search
SQL
NoSQLStream
ImmutableViews
denormalise
Stream layer (fast)
Batch LayerServing Layer
All
you
r d
ata Query
Query
Lambda ArchitectureSeparating Stream & Batch
All
you
r d
ata
Stream Data platformsViews
Client
Client
Kafka
Search
Columnar
Hadoop
Stream processo
r
Isolate consistency concerns, Leverage in-flight data, Promote immutable replicas
Sys 1
Sys 2
Sys 3
Stream
Things we Like
Treating state is an immutable chronology
time
Listening and reacting to things as they are written
Replaying things that happened before
history
Regenerate state
Enrich views
Avoiding (or Isolating) the need to mutate
Mutable Immutable
Read-optimising the immutable
Denormalise
Primitive operations for Shards and Replicas (sync/async)
Being able to reason about time in an asynchronous world
Blending the utility of different tools in a single data platform
Sys 1
Sys 2
Sys 3
Stream
Thanks
slides available @ benstopford.com