Upload
-
View
2.143
Download
0
Embed Size (px)
DESCRIPTION
When and why to move from RDMBS to Cassandra, it's quirks and limitations
Citation preview
Practical Cassandra
Vitalii [email protected]
@tivv00
NoSQL key-value vs RDBMS – why and when
Cassandra architecture
Cassandra data model
Life without joins or HDD space is cheap today
Hardware requirements & deployment hints
RDBMS problems
Sometimes you reach the point where single server can't cope
Relational Replication
Not write scalable
Data is not instantly visible
Sharding
No foreign keys or joins
No transactions
Reduced reliability (multiple servers)
Schema update is a pain
Cassandra NoSQL
Master-Master Replication + Sharding in one bottle
Peer-to-peer architecture (no SPOF)
Easy cluster reconfiguration
Eventual consistency as a standard
All data in one record – no need to join
Flexible schema
Our data
We have intelligent Internet cache
Intelligent means we don't cache everything or we would need Google's DC
It's still hundreds of millions of sites
And 10s of TB of packed data
Randomly updated
Analysis must be able to process all of this in term of hours
Cassandra ring
- server
- client
Ring partitioner types
Order Preserving
Each server serves key range
Range queries possible
Read/Write/Disk space hot spots possible
Complex to fix key range
Random
Data is smoothly distributed on servers
No range queries
No hot spots
Fixed key range
Runtime CAP-solving
The whole thing is about replication
CAP: Consistency, Availability, Partition tolerance – choose two.
With cassandra you can choose at runtime.
Runtime CAP-solving
Quorum read/write Fast writes
Fast reads Fast, less consistency
Data model
Keyspaces – much like database in RDBMS
Column Families – storage element, like tables in RDBMS
Columns – you can have million for a row, names are flexible, still like columns in RDBMS
Super Column – A column that has structured content, superseded by composite columns
Twitter DB
Example
Users tableID, Name, Birthday
Tweets tableUserID, TweetID,
TweetContent
Twitter Keyspace
Users CFKey: User ID
Name(Str), Birthday(Str)
Timeline CFKey: User ID
<TweetID>(TweetContent)
Twitter DB
Example (alternative)
Users tableID, Name, Birthday
Tweets tableUserID, TweetID,
TweetContent
Twitter Keyspace
Data CFKey: User ID
Name(Str), Birthday(Str),<TweetID>(TweetContent)
Example (data)
Users
Data
Tweets
ID Name
1 Tom
2 John
User ID Text
1 1 Hello
1 2 See me?
2 3 See you!
Key Data
1 Name = Tom T_1 = Hello T_2 = See me?
2 Name = John T_3 = See you!
Data model
You can have same key in multiple column families
You can have different set of columns for different keys in same column family
You can query a range of columns for a key (columns are sorted) with pagination
You can have (and it's useful) to have columns without values
ACID vs BASE
Super Heroes are good, but not scalable. So, what do we loose?
No Atomicity
You've got no transactions – no rollback
The maximum you have is atomic update to single row
Failed operation MAY be applied(that's why counters are not reliable)
Eventual Consistency
Cassandra has no central governor
This means no bottleneck
This also means no one knows if database as a whole is consistent
Regular repair is your friend!
No Isolation
All mutations are timestamped to restore order from chaotic arrival
You MUST have your clock synchronized
That's how operation are applied on server :)
Controlled Durability
Cassandra uses transaction log to ensure durability on single server
Durability of the whole database depends on both total number of replicas and write operation replication factor
Remember, single server 99% uptime means 36.6% (0.99100) of “full cluster working” uptime for 100 servers – most time you've got at least one server down!
Data querying
With SQL you simply ask.
You can easily scan the whole DB
Indexes may help
Any calculation is repeated each time
This can be slow on read
Data querying
With NoSQL you can't efficiently scan the whole db
No “group by” or “order by”
You must prepare your data beforehand
You have multiple copies of data
You must recalculate on application logic change
The precalculated reads are fast
Think on your queriesin advance!
There is no “I'll simply add an index, some hints and my query will become fast”
Any index is created and maintained from application code
Now cassandra have secondary indexes, but they are much inferior to custom ones
What's wrong with secondary indexes
They work on fixed column names
They are consistent with data
This means they live near the data they index
This means they are distributed between nodes by row key, not by indexed column value
This means you need to ask every node to get single value
What's wrong with secondary indexes
Node 1
A: phone=1
B: phone=3
Phone index:
1=A,3=B
Node 2
C: phone=3
D: phone=5
Phone index:
3=C,5=D
Node 4
G: phone=3
H: phone=7
Phone index:
3=G,7=H
Node 3
E: phone=1
F: phone=5
Phone index:
1=E,5=F
“Index” example
Column family people
Key: Fred [phone=2223355, phone2=4445566, fax=9998877]
Key: John [phone=4445566, mobile=099123456]
Column family phone_directory
Key: 2223355 [Fred]
Key: 4445566 [Fred, John]
Key: 9998877 [Fred]
Key: 099123456 [John]
“Join” example
Column family customer
Key: Boeing [email: [email protected]]
Key: Oracle [skype: java]
Column family orders
Key: 1 [customer: Boeing, total: 200m]
Key: 2 [customer: Oracle, total: 300m]
Key: 3 [customer: Boeing, total: 500m]
Column family customer_order_totals
Key: Boeing[ 1:200m, 3:500m]
Key: Oracle[ 2:300m]
Peer-to-peer replication
Your operation can return OK even if it was not written to every replica
Hinted handoff will try to repair later
Even if your operation have failed, it may have been written to some replicas
This inconsistency won't be repaired automatically
This are drawbacks of “no master” architecture
You need to repair regular!
Tombstones and Repair
Delete events are recorded as Tombstones to ensure arriving“before delete” data won't be used
Regular repair not only makes sureyour data is replicated, but also that your deletes are replicated.If you don't, beware of ghosts!
Resources & Environment
Disk space requirements
Memory requirements
Native plugins & configuration
Disk estimations
Say, we've got 1TB of data
Replication factor 3 make it 3TB
Data duplication make it 12TB
Tombstones/repair space make it 24TB
Backups make it 36TB
Memory estimations
Cassandra has certain in-memory structures that are linear to data amount
Key and Row caches – configured at column family level. Change defaults if you've got a lot of CFs
Bloom filters and key samples cache are configured globally in latest versions
Estimate minimum ~0.5% of RAM for your data amount
Native specifics
Cassandra (like may other large things) likes JNA. Please install.
Cassandra maps files to memory – cassandra process virtual and resident memory size will grow because of mmap.
Default heap sizes are large – tame it if it's not only task on the host