Sizing Your Couchbase Cluster: Couchbase Connect 2015

Preview:

Citation preview

HOW MANY NODES?PROPERLY SIZING YOUR COUCHBASE CLUSTERPerry KrugSr. Solutions Architect

©2015 Couchbase Inc. 3

Application Server

Size Couchbase Server

Sizing == performance Serve reads out of RAM Enough IO for writes and disk operations Mitigate inevitable failures

Reading Data Writing Data

Couchbase Server

Give medocument A

Here is document A

A

Couchbase Server

Please storedocument A

OK, I storeddocument A

A

Application Server

©2015 Couchbase Inc. 4

Scaling out permits matching of aggregate flow rates so queues do not grow

network networknetwork

Couchbase Server Couchbase Server Couchbase Server

Application Server Application ServerApplication Server

5 Factors of Sizing

©2015 Couchbase Inc. 6

How many nodes?

5 Key Factors determine number of nodes needed:

1) RAM2) Disk3) CPU4) Network5) Data Distribution/Safety

(per-bucket, multiple buckets aggregate)Couchbase Servers

Web application server

Application user

©2015 Couchbase Inc. 7

RAM sizing

1)Total RAM Managed document cache:

Working set Metadata Active+Replicas

Index caching (I/O buffer)

Keep working set in RAM for best read performance

Server

Give medocument A

Here is document A

A

A

A

Reading Data

Application Server

©2015 Couchbase Inc. 8

Working set depends on your application

Late stage social game

Many users no longer active; few logged in at

any given time.

Ad NetworkAny cookie can show

up at any time.

Business applicationUsers logged in during

the day. Day moves around the globe.

working/total set = 1working/total set = .01 working/total set = .33

Couchbase Server Couchbase Server Couchbase Server

©2015 Couchbase Inc. 9

RAM Sizing - View/Index cache (disk I/O)

File system cache availability for the index has a big impact performance:

Test runs based on 10 million items with 16GB bucket quota and 4GB, 8GB system RAM availability for indexes

Performance results show that by doubling system cache availability query latency reduces by half throughput increases by 50%

Leave RAM free with quotas

©2015 Couchbase Inc. 10

Disk Sizing: Space and I/O

2) Disk Sustained write rate Rebalance capacity Backups XDCR Views/Indexes Compaction Total dataset:

(active + replicas + indexes)

Append-only

I/O

Space

Please storedocument A

OK, I storeddocument A

A

Server

A

A

Writing Data

Application Server

©2015 Couchbase Inc. 11

Disk Sizing: Space and I/O Disk Writes are Buffered

Bursts of data expand the disk write queue Sustained writes need corresponding throughput

Disk throughput affected by disk speed SSD > 10K RPM > EBS SSDs give a huge boost to write throughput and

startup/warmup times RAID can provide redundancy and increase throughput

Throughput = read/write+compaction+indexing+XDCR 2.1 introduces multiple disk threads

Default is 3 (1 writer / 2 readers), max is 8 combined

Best to configure different paths for data and indexes Plan on about 3x space (append-only, compaction,

backups, etc)

©2015 Couchbase Inc. 12

CPU sizing

3)CPU Disk writing Views/compaction/XDCR RAM r/w performance not impacted Min. production requirement:

4 cores+1 per bucket+1 core per Design Doc+1 core per XDCR stream

©2015 Couchbase Inc. 13

Network sizing

4) Network Client traffic Replication (writes) Rebalancing XDCR

Reads+Writes

Replication (multiply writes) and Rebalancing

network networknetwork

Couchbase ServerCouchbase Server Couchbase Server

Application ServerApplication ServerApplication Server

©2015 Couchbase Inc. 14

Network Considerations

Low latency, high throughput (LAN) - within cluster

Eliminate router hops: Within Cluster nodes Between clients and cluster

Check who else is sharing the network Increase bandwidth by:

Add more nodes (will scale linearly) Upgrade routers/switches/NIC’s/etc

©2015 Couchbase Inc. 15

Data Distribution

5)Data Distribution / Safety (assuming one replica): 1 node = Single point of failure 2 nodes = +Replication 3+ nodes = Best for production

Autofailover Upgrade-ability Further scale-ability

Note: Many applications will need more than 3 nodes

Servers fail, be prepared. The more nodes, the less impact a failure will have.

©2015 Couchbase Inc. 16

How many nodes recap

5 Key Factors determine number of nodes needed:

1) RAM2) Disk3) CPU4) Network5) Data Distribution/Safety

(per-bucket, multiple buckets aggregate)

Couchbase Servers

Web application server

Application user

Deployment Considerations

©2015 Couchbase Inc. 18

Hardware Minimums

RAM: At least ~4GB (highly dependent on data set)

Disk: Fastest “local” storage available-SSD is better-RAID 0 or 10, not 5

CPU (minimums): 8 cores+ 1-per bucket+ 1-per design document+ 1-per XDCR stream

Hardware requirements/recommendations are the intersection of what’s needed versus what’s available.

©2015 Couchbase Inc. 19

Hardware Considerations

Designed for commodity hardware Scale out, not up…more smaller nodes better

than less larger ones (can scale up later) Tested and deployed in EC2 Physical hardware offers best performance and

efficiency Certain considerations with using VM’s:

RAM use inefficient / Disk IO usually not as fast Local storage better than shared SAN 1 Couchbase VM per physical host You will generally need more nodes Don’t overcommit

©2015 Couchbase Inc. 20

Couchbase in AWS

R3 or C3 instances best value for performance Higher RAM-to-CPU ratios Come with SSD’s

Disk Choice: SSD’s are best Ephemeral is okay Single EBS not great, use LVM/RAID Views/indexes on ephemeral, main data on EBS or both

on SSD Backups: Use cbbackup locally on each node and

migrate to EBS/S3 Can use EBS snapshots

©2015 Couchbase Inc. 21

Couchbase in AWS

Deploy across AZ’s with rack/zone awareness Use a EIP/public-hostname instead of private IP:

Easier connectivity from outside AWS Easier restoration/better availability Couchbase XDCR across regions must use hostname

In AWS as with any cloud/virtual deployment, you will likely need more nodes than you would with a physical infrastructure

Effects of…

©2015 Couchbase Inc. 23

Views/Indexes

Effect on scale/sizing: Increase the CPU and disk IO requirements More complex views require more CPU More view output requires more disk IO More RAM should be left out of the quota for better IO

caching Indication:

Indexes significantly behind data writes (or growing delays)

What do to: Make sure you follow best practices in view writing Add more nodes to distribute processing “work” Look into SSD’s

©2015 Couchbase Inc. 24

XDCR

Effect on scale/sizing: XDCR is CPU Intensive Disk IO will double Memory needs to be sized accordingly (bi-directional

may mean more data) Indication:

A rising XDCR queue on source What to do:

More nodes on source and destination will drain queue faster (scales linearly)

Tune replication streams according to CPU availability

©2015 Couchbase Inc. 25

As your workload grows… Effects on scale/sizing:

More reads:• Individual documents will not be impacted (static working

set)• Views may require faster disks, more disk IO caching

More writes will increase disk IO needs Indications:

Cache miss ratio rising Growing disk write queue / XDCR queue Compaction not keeping up

What to do: Revise sizing calculations and add more nodes if needed

Most applications don’t need to scale the number of nodes based upon normal workload variation.

©2015 Couchbase Inc. 26

As your dataset grows… Effects on scale/sizing:

Your RAM needs will grow: Metadata needs increase with item count Is your working set increasing? Your disk space will likely grow (duh?)

Indications: Dropping resident ratio Rising ejections/cache miss ratio

What to do: Revise sizing calculations, add more nodes Remove un-needed data

This is the most common need for scaling and will most likely result in needing more nodes

©2015 Couchbase Inc. 27

Rebalancing

Yes there is resource utilization during a rebalance but a “properly” sized cluster should not have any effect on performance during a rebalance: Distribution of data and work across all nodes Managed caching layer separates RAM-based

performance from IO utilization Rebalance automatically manages working set in RAM Rebalance automatically throttles itself if needed Can be stopped midway without endangering data or

progress

Proper sizing includes not maxing out all resources: leave some headroom in preparation

Couchbase 4.0

©2015 Couchbase Inc. 29

Sizing Couchbase Server 4.0

Multi-Dimensional Scalability (MDS) – Optionally Scale each service independently: Data Index Query

5 factors still apply: RAM Disk CPU Network Data Safety/Distribution

©2015 Couchbase Inc. 30

Sizing Couchbase Server 4.0 - Data

Data Service in 4.0 same as previous Couchbase Server: Enough RAM to cache reads Enough Disk to eventually persist writes CPU primarily for Views and XDCR At least 3 nodes – Replication at the bucket level

Minimum requirements: 4GB RAM, 8 Cores CPU

©2015 Couchbase Inc. 31

Sizing Couchbase Server 4.0 - Index

Index service new to 4.0 (a.k.a. GSI or “Secondary Indexes”): Primarily RAM and Disk IO bound ForestDB persistence engine At least 2 nodes for HA, each index replicated

individually

Minimum Requirements: 8GB RAM, 8 core CPU, “fast disk”

Note: 4.0 is still in beta, final sizing numbers are being formulated

©2015 Couchbase Inc. 32

Sizing Couchbase Server 4.0 - Query

Query Service new to 4.0 (a.k.a. N1QL) Primarily CPU bound Optimized for multi-core systems Very low RAM and disk requirements At least 2 nodes for HA – Queries automatically load

balanced

Minimum Requirements: 4GB RAM, 16+ Core CPU

Note: 4.0 is still in beta, final sizing numbers are being formulated

©2015 Couchbase Inc. 33

Sizing Couchbase Server 4.0 - MDS

Multi-Dimensional Scalability (MDS) Option 1: All 3 services enabled on all nodes – Size for

aggregate requirements (Data+Index+Query) Option 2: Separated services – Size nodes independently

for different workloads. i.e.:

• Data Service: More nodes with more RAM, less disk, less CPU

• Index Service: Fewer nodes with more RAM, more disk, less CPU

• Query Service: Fewer nodes with less RAM, less disk, more CPU

©2015 Couchbase Inc. 34

Sizing Couchbase Server 4.0 - MDS

Independent Load Distribution Modular Architecture to Construct the Database for

Your Need Pick HW Capacity – scale up and/or scale out Pick Services Layout - overlap and/or isolate services Pick Data/Index Partitioning

Couchbase Cluster

Index ServiceQuery

ServiceData Service

node1 node8

©2015 Couchbase Inc. 35

Sizing is tricky business…

Work with the Couchbase Team

Validate your “on-paper” numbers with testing

Constantly monitor production

©2015 Couchbase Inc. 36

Dive in…

Gather your workload and dataset requirements: Item counts and sizes, read/write/delete ratios

Review our documentation and formulas Test, Deploy, Monitor…rinse and repeat

©2015 Couchbase Inc. 37

Want more?

Lots of details and best practices in our documentation:

http://www.couchbase.com/docs/

And my sizing blog:http://blog.couchbase.com/how-many-nodes-part-1-introduction-sizing-couchbase-server-20-

cluster

Get Started with Couchbase Server 4.0: www.couchbase.com/beta

Get Trained on Couchbase: training.couchbase.com

Thank you perry@couchbase.com | @couchbase

Recommended