Sizing Your Couchbase Cluster: Couchbase Connect 2015

HOW MANY NODES?PROPERLY SIZING YOUR COUCHBASE CLUSTERPerry KrugSr. Solutions Architect

Read this Article:

http://blog.couchbase.com/how-many-nodes-part-1-introduction-sizing-couchbase-server-20-

cluster

Application Server

Size Couchbase Server

Sizing == performance Serve reads out of RAM Enough IO for writes and disk operations Mitigate inevitable failures

Reading Data Writing Data

Couchbase Server

Give medocument A

Here is document A

Couchbase Server

Please storedocument A

OK, I storeddocument A

Application Server

Scaling out permits matching of aggregate flow rates so queues do not grow

network networknetwork

Couchbase Server Couchbase Server Couchbase Server

Application Server Application ServerApplication Server

5 Factors of Sizing

How many nodes?

5 Key Factors determine number of nodes needed:

1) RAM2) Disk3) CPU4) Network5) Data Distribution/Safety

(per-bucket, multiple buckets aggregate)Couchbase Servers

Web application server

Application user

RAM sizing

1)Total RAM Managed document cache:

Working set Metadata Active+Replicas

Index caching (I/O buffer)

Keep working set in RAM for best read performance

Server

Give medocument A

Here is document A

Reading Data

Application Server

Working set depends on your application

Late stage social game

Many users no longer active; few logged in at

any given time.

Ad NetworkAny cookie can show

up at any time.

Business applicationUsers logged in during

the day. Day moves around the globe.

working/total set = 1working/total set = .01 working/total set = .33

Couchbase Server Couchbase Server Couchbase Server

RAM Sizing - View/Index cache (disk I/O)

File system cache availability for the index has a big impact performance:

Test runs based on 10 million items with 16GB bucket quota and 4GB, 8GB system RAM availability for indexes

Performance results show that by doubling system cache availability query latency reduces by half throughput increases by 50%

Leave RAM free with quotas

Disk Sizing: Space and I/O

2) Disk Sustained write rate Rebalance capacity Backups XDCR Views/Indexes Compaction Total dataset:

(active + replicas + indexes)

Append-only

Please storedocument A

OK, I storeddocument A

Server

Writing Data

Application Server

Disk Sizing: Space and I/O Disk Writes are Buffered

Bursts of data expand the disk write queue Sustained writes need corresponding throughput

Disk throughput affected by disk speed SSD > 10K RPM > EBS SSDs give a huge boost to write throughput and

startup/warmup times RAID can provide redundancy and increase throughput

Throughput = read/write+compaction+indexing+XDCR 2.1 introduces multiple disk threads

Default is 3 (1 writer / 2 readers), max is 8 combined

Best to configure different paths for data and indexes Plan on about 3x space (append-only, compaction,

backups, etc)

CPU sizing

3)CPU Disk writing Views/compaction/XDCR RAM r/w performance not impacted Min. production requirement:

4 cores+1 per bucket+1 core per Design Doc+1 core per XDCR stream

Network sizing

4) Network Client traffic Replication (writes) Rebalancing XDCR

Reads+Writes

Replication (multiply writes) and Rebalancing

network networknetwork

Couchbase ServerCouchbase Server Couchbase Server

Application ServerApplication ServerApplication Server

Network Considerations

Low latency, high throughput (LAN) - within cluster

Eliminate router hops: Within Cluster nodes Between clients and cluster

Check who else is sharing the network Increase bandwidth by:

Add more nodes (will scale linearly) Upgrade routers/switches/NIC’s/etc

Data Distribution

5)Data Distribution / Safety (assuming one replica): 1 node = Single point of failure 2 nodes = +Replication 3+ nodes = Best for production

Autofailover Upgrade-ability Further scale-ability

Note: Many applications will need more than 3 nodes

Servers fail, be prepared. The more nodes, the less impact a failure will have.

How many nodes recap

5 Key Factors determine number of nodes needed:

1) RAM2) Disk3) CPU4) Network5) Data Distribution/Safety

(per-bucket, multiple buckets aggregate)

Couchbase Servers

Web application server

Application user

Deployment Considerations

Hardware Minimums

RAM: At least ~4GB (highly dependent on data set)

Disk: Fastest “local” storage available-SSD is better-RAID 0 or 10, not 5

CPU (minimums): 8 cores+ 1-per bucket+ 1-per design document+ 1-per XDCR stream

Hardware requirements/recommendations are the intersection of what’s needed versus what’s available.

Hardware Considerations

Designed for commodity hardware Scale out, not up…more smaller nodes better

than less larger ones (can scale up later) Tested and deployed in EC2 Physical hardware offers best performance and

efficiency Certain considerations with using VM’s:

RAM use inefficient / Disk IO usually not as fast Local storage better than shared SAN 1 Couchbase VM per physical host You will generally need more nodes Don’t overcommit

Couchbase in AWS

R3 or C3 instances best value for performance Higher RAM-to-CPU ratios Come with SSD’s

Disk Choice: SSD’s are best Ephemeral is okay Single EBS not great, use LVM/RAID Views/indexes on ephemeral, main data on EBS or both

on SSD Backups: Use cbbackup locally on each node and

migrate to EBS/S3 Can use EBS snapshots

Couchbase in AWS

Deploy across AZ’s with rack/zone awareness Use a EIP/public-hostname instead of private IP:

Easier connectivity from outside AWS Easier restoration/better availability Couchbase XDCR across regions must use hostname

In AWS as with any cloud/virtual deployment, you will likely need more nodes than you would with a physical infrastructure

Effects of…

Views/Indexes

Effect on scale/sizing: Increase the CPU and disk IO requirements More complex views require more CPU More view output requires more disk IO More RAM should be left out of the quota for better IO

caching Indication:

Indexes significantly behind data writes (or growing delays)

What do to: Make sure you follow best practices in view writing Add more nodes to distribute processing “work” Look into SSD’s

Effect on scale/sizing: XDCR is CPU Intensive Disk IO will double Memory needs to be sized accordingly (bi-directional

may mean more data) Indication:

A rising XDCR queue on source What to do:

More nodes on source and destination will drain queue faster (scales linearly)

Tune replication streams according to CPU availability

As your workload grows… Effects on scale/sizing:

More reads:• Individual documents will not be impacted (static working

set)• Views may require faster disks, more disk IO caching

More writes will increase disk IO needs Indications:

Cache miss ratio rising Growing disk write queue / XDCR queue Compaction not keeping up

What to do: Revise sizing calculations and add more nodes if needed

Most applications don’t need to scale the number of nodes based upon normal workload variation.

As your dataset grows… Effects on scale/sizing:

Your RAM needs will grow: Metadata needs increase with item count Is your working set increasing? Your disk space will likely grow (duh?)

Indications: Dropping resident ratio Rising ejections/cache miss ratio

What to do: Revise sizing calculations, add more nodes Remove un-needed data

This is the most common need for scaling and will most likely result in needing more nodes

Rebalancing

Yes there is resource utilization during a rebalance but a “properly” sized cluster should not have any effect on performance during a rebalance: Distribution of data and work across all nodes Managed caching layer separates RAM-based

performance from IO utilization Rebalance automatically manages working set in RAM Rebalance automatically throttles itself if needed Can be stopped midway without endangering data or

progress

Proper sizing includes not maxing out all resources: leave some headroom in preparation

Couchbase 4.0

Sizing Couchbase Server 4.0

Multi-Dimensional Scalability (MDS) – Optionally Scale each service independently: Data Index Query

5 factors still apply: RAM Disk CPU Network Data Safety/Distribution

Sizing Couchbase Server 4.0 - Data

Data Service in 4.0 same as previous Couchbase Server: Enough RAM to cache reads Enough Disk to eventually persist writes CPU primarily for Views and XDCR At least 3 nodes – Replication at the bucket level

Minimum requirements: 4GB RAM, 8 Cores CPU

Sizing Couchbase Server 4.0 - Index

Index service new to 4.0 (a.k.a. GSI or “Secondary Indexes”): Primarily RAM and Disk IO bound ForestDB persistence engine At least 2 nodes for HA, each index replicated

individually

Minimum Requirements: 8GB RAM, 8 core CPU, “fast disk”

Note: 4.0 is still in beta, final sizing numbers are being formulated

Sizing Couchbase Server 4.0 - Query

Query Service new to 4.0 (a.k.a. N1QL) Primarily CPU bound Optimized for multi-core systems Very low RAM and disk requirements At least 2 nodes for HA – Queries automatically load

balanced

Minimum Requirements: 4GB RAM, 16+ Core CPU

Note: 4.0 is still in beta, final sizing numbers are being formulated

Sizing Couchbase Server 4.0 - MDS

Multi-Dimensional Scalability (MDS) Option 1: All 3 services enabled on all nodes – Size for

aggregate requirements (Data+Index+Query) Option 2: Separated services – Size nodes independently

for different workloads. i.e.:

• Data Service: More nodes with more RAM, less disk, less CPU

• Index Service: Fewer nodes with more RAM, more disk, less CPU

• Query Service: Fewer nodes with less RAM, less disk, more CPU

Sizing Couchbase Server 4.0 - MDS

Independent Load Distribution Modular Architecture to Construct the Database for

Your Need Pick HW Capacity – scale up and/or scale out Pick Services Layout - overlap and/or isolate services Pick Data/Index Partitioning

Couchbase Cluster

Index ServiceQuery

ServiceData Service

node1 node8

Sizing is tricky business…

Work with the Couchbase Team

Validate your “on-paper” numbers with testing

Constantly monitor production

Dive in…

Gather your workload and dataset requirements: Item counts and sizes, read/write/delete ratios

Review our documentation and formulas Test, Deploy, Monitor…rinse and repeat

Want more?

Lots of details and best practices in our documentation:

http://www.couchbase.com/docs/

And my sizing blog:http://blog.couchbase.com/how-many-nodes-part-1-introduction-sizing-couchbase-server-20-

cluster

Get Started with Couchbase Server 4.0: www.couchbase.com/beta

Get Trained on Couchbase: training.couchbase.com

Thank you perry@couchbase.com | @couchbase

Sizing Your Couchbase Cluster: Couchbase Connect 2015

Technology

Couchbase Meetup - "Introduzione a NoSQL e Couchbase"

Couchbase 101: Couchbase Connect 2014

Couchbase Mobile Webinar: Building Apps for Couchbase Mobile .NET

Couchbase Mobile 102 – Couchbase Live New York 2015

Dealing with Memcached Challenges - Couchbase, Inc. · Dealing with Memcached Challenges Replacing a Memcached Tier With a Couchbase Cluster Summary Memcached is an open-source caching

Testing and deploying Couchbase Mobile – Couchbase Connect 2016

Couchbase at LinkedIn: Couchbase Connect 2014

Tunable Memory in Couchbase Server 3.0: Couchbase Connect 2014

Advanced Development with Couchbase Lite: Couchbase Connect 2014

Securing Your Couchbase Environment in Couchbase Server 4.0: Couchbase Connect 2015

Slides: Percona Xtradb Cluster ... Xtradb Cluster Reference Architecture 2016 Slides: Jay Janssen Managing Principal Architect Cluster Sizing Quorum Rule of 3’s Really: The loss

Building Hybrid Apps with Couchbase Mobile: Couchbase Connect 2015

Performance Tuning Couchbase: Couchbase Connect 2014

Couchbase in the Digital Economy – Couchbase Connect 2016

Couchbase at LinkedIn: Couchbase Connect 2015

Couchbase Sydney meetup #1 Couchbase Architecture and Scalability

Cisco: Application clustering with Couchbase – Couchbase Connect 2016

Introduction to Couchbase Mobile: Couchbase Connect 2014

Visual Analytics with Tableau & Couchbase: Couchbase Connect 2015

The Future of Couchbase Mobile: Couchbase Connect 2014