Upload
planet-cassandra
View
2.247
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
High Throughput Analytics with Cassandra & Azure
Charles LamannaPrincipal Dev Lead
@clamanna
MetricsHubkeep cloud services up and running for the lowest possible cost
Live Status
Cost Awareness
Alerts and Notifications
Actions and Scaling
$
2000+ customers in 6 months
10/18/2012 12/7/2012 1/26/2013 3/17/2013 5/6/2013 6/25/20130
500
1000
1500
2000
2500
storing data200M data points per hour80,000 data points per second (peak)
Planning for huge data ingestion ratesRequires high scale, real-time data
1,000 data points per minute per VM12 data points per endpoint per minute
Aggregate, analyze and take actions based on this data stream (in near real-time)
Must be cheap, scalable and reliable
Evaluated several technologiesAggregation in memory; good performance, bad COGs
Rolling tables for aggs; good tooling/support, hard to scale
Aggregation on write; easy to scale and good COGs
Cassandra UpsideScales fluidly Grows horizontally – double the nodes, double capacityAdd / remove capacity / nodes with no downtime
Highly availableNo single point of failureReplication factor (i.e. hot copies) is just a config switch
… and by the wayLittle-to-no operations cost
New nodes take minutes to setupNodes just keep running for months on end
“Aggregate on write” – no jobs required!Distributed counters make it easy to do aggregates on write
…and a nice kicker: has *great* perf / COGS in Azure
architecture68 virtual machines (PAAS and IAAS)
Table StorageJobs Worker Role (24 instances)
SQL Database
Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Web API Web Role
(8 instances)
End User Web Browsers
Monitored Customer Resources
(e.g. websites; SQL databases)
Monitored Virtual Machines
Endpoints Replicated datain multiple
datacenters
ClientsPaaS
IaaS
Services
Avoiding state
• Application logic / code all lives on stateless machines
• Keeps it simple: decreases human operations cost
• Use Azure PAAS offerings (Web and Worker roles)
Table Storage
Jobs Worker Role (24 instances)
SQL Database
Blob storage
Portal Web Role
(3 instances)
Cassandra VM Cluster
(32 XL instances)
Web API Web Role
(8 instances)
Endpoints Replicated datain multiple
datacenters
PaaS
Azure Cloud Services (PAAS)
• Scale horizontally (grew from 1 to 30+ instances)
• Managed by the platform (patched; coordinated recycling; failover; etc.)
• 1 click deployment from Visual Studio (with automatic load balancer swaps)
Web Role Worker Role
Table Storage
Jobs Worker Role (24 instances)
SQL Database
Blob storage
Web API Web Role
(8 instances)
Endpoints Replicated datain multiple
datacenters
Maintains all state for metrics / time series data
32 XL Linux Virtual Machines Portal Web
Role (3 instances)
Cassandra VM Cluster
(32 XL instances)
Cassandra Cluster
IaaS
32 nodes, 8 “pods” of 4 nodes
……..
……….
Exposed via a single endpoint
Exposed via a single endpoint
Exposing the pods• Each pod of 4 nodes
has a single load balanced endpoint
• Clients (on our stateless roles) treats the endpoints as a pool
• Blacklists and skips an endpoint if it starts producing a lot of errors
Where does the data go?
• Data files are on 16 mounted network backed disks (*not* ephemeral disks)
• Data disks are geo-replicated (3 copies local; 1 remote) for “free” DR
• Azure data disks offer great throughput (VMs end up CPU bound)
Our Column Families (CQL 3)
CREATE TABLE oneminute (
rk text, ck text, cnt counter, sum counter, PRIMARY KEY (rk, ck)
);
Updating values…Realtime “average” values at any granularity, for any time window
updateoneminute/tenminute/oneday
setsum = sum + {sample_value},cnt = cnt + 1
where rk = '{customer+metric}' and ck = '{tags_and_timestamp}'
Reading values…
*ONE* round trip to fetch a metric over time (e.g. CPU over past week)
select * from oneminutewhere rk = ‘{customer_name}' and ck < '{metric_path_start}' and ck >= '{metric_path_end}‘order by ck desc;
Some hard lessons…
• Static private IPs are a must; otherwise, reboots / outages can confuse the cluster when nodes come back up
• Monitor performance carefully; once you tip over, it is hard to rebalance the cluster and add new nodes
• Fit the cluster to the platform: in Azure, match the Upgrade Domains / Fault Domains to preserve uptime during service maintenance / hardware failure
Single node tests..• 4 disks, RAID 0, no read cache
Workload (%write)
Ops / sec Latency median
Latency 95th
Latency 99th
%100 20018 1.5 3.7 7.9%75 8361 85.9 376.6 584.8%25 5412 459.9 759.1 940.1
• 4 disks, RAID 0, read cacheWorkload (%write)
Ops / sec Latency median
Latency95th
Latency99th
%100 19208 1.5 3.8 7.9 18543 1.5 3.6 7.9 18563 1.4 3.6 8.2
%75 7112 195.9 595.8 1099.6 7581 168.9 589.5 985.2 5149 256.5 774.0 1402.9
%25 15358 23.0 110.2 309.1 3742 279.2 563.0 789.7 15376 22.1 98.8 293.3
jbod RAID00
1000
2000
3000
4000
5000
6000
7000
JBOD vs RAID 0 for read-heavy workload
Workload (%write)
Ops / sec
Latency Median
Latency 95th
Latency99th
%100 13638 1.9 4.9 24.0%75 3239 11.2 687.0 1099.3%25 1825 243.6 687.0 808.7
Multi-node load tests..
• 4 Nodes; RF = 3 (Quorom)
• 8 Disks, RAID 0