47

RedisConf17 - Operationalizing Redis at Scale

Embed Size (px)

Citation preview

Operationalizing RedisAt ScaleAt Square

The Dream

●Standard Setup & Deployment●Security: SSL/mTLS●High Availability●Monitoring & Visibility

The Dream

●Standard Setup & Deployment●Security: SSL/mTLS●High Availability●Monitoring & Visibility

LXC Container

Hardware

OS

LXC Container LXC Container LXC Container LXC Container

• Persistence not guaranteed

• Persistence provided by RDB and AOF do not satisfy our performance requirements

• LRU key eviction

• MAXMEMORY configured on a per application basis

• Treat Redis as an ephemeral in-memory cache

• Resque / Sidekiq “require” persistence

• Force jobs to be idempotent

• Rely on persistence in the form of slaves and failovers

• Asynchronous replication implies that there is always a possibility of lost data

Persistence & Durability

• RDB• BGSAVE causes Redis process does a fork()

• Severe performance hit for larger datasets• Can result in clients not being served• Pressure from memory allocations of large data copies

• Potentially requires 2x memory on high write instances• (Almost) Never perform on the active master

• Since any replica can be failed onto to become the new master, BGSAVEs can only be reliably run on tertiary replicas

• Disable Redis automated BGSAVEs, opt for manual control instead

Persistence & Durability

●AOF○appendfsync everysec defaults to an fsync() every 1 second.○Writes are still lost

○appendfsync always is required to guarantee the persistence we need○Comes at the cost of speed○Lower throughput

Persistence & Durability

Persistence & Replication

• RDB and AOF disabled by default• If a Redis instance restarts, it will restart with an empty dataset

• That empty dataset will then get replicated across all of its replicas

• Never have Redis restart automatically• Require manual intervention from DBA for startup• runit service instead runs our special launch script

• Script will only start redis if a file exists• /data/redis/redis-start

• Never automatically join clusters

Persistence & Replication

The Dream

●Standard Setup & Deployment●Security: SSL/TLS●High Availability●Monitoring & Visibility

• Open-source, developed at Square• Simple SSL/TLS proxy with mutual authentication for securing non-TLS services• Authentication

• Enforce mutual authentication by always requiring a valid client certificate• Certificate hotswapping

• Can reload certificates at runtime without dropping existing connections• Automatic reloading

ghostunnel

app

redis-01

RW SIP

app

redis-01

RW SIPghostunnel-app

:6379socket

app

redis-01

RW SIPghostunnel-app

:6379socket

redis-02

socketghostunnel-app

:6379RO SIP

ghostunnel-repl:6380

$> slaveof localhost 6380

1) OK

$> info replication

# Replication

role:slave

master_host:127.0.0.1

master_port:6380

...

slave0:ip=/unixsocket,port=0,state=online,offset=12345,lag=1

...

GET @@HOSTNAME;

$> info server

# Server

...

run_id:1234567890abcdef

...

GET @@HOSTNAME;

●Redis does not implement user access control●redis.sock accessible only by root○Available only to DBAs and other infrastructure engineers

●Connecting remotely will require usage of application SSL cert

User Access Control

openssl s_client \

-connect $REDIS_HOST:$REDIS_PORT \

-CAfile redis-app-1.ca.pem \

-cert redis-app-1.crt \

-key redis-app-1.key

The Dream

●Standard Setup & Deployment●Security: SSL/mTLS●High Availability●Monitoring & Visibility

●Pros○Monitoring○Notification○Automated Failover

●Cons○Plaintext communication required.○Even if isolated to a single security zone, cross-DC failovers would not work.

Redis Sentinel

• Pros• Great sharding tool• High Availability in the event of outage due to its distributed design

• Cons• Consistency not easy to achieve

• Can enforce consistency, but comes at the cost of performance• Cannot enforce consistency in the case of network partition

• Does not support SSL/TLS

Redis Cluster

DC-1

DC-2

Topology

RW SIP

RO SIP

Topology

$> info replication

$ Replication

role:master

...

$> config get min-slaves-to-write

1) "min-slaves-to-write"

2) "0"

$> config set min-slaves-to-write 1000

1) OK

SET GLOBAL READ_ONLY=1;

$> info replication

$ Replication

role:slave

...

$> config get min-slaves-to-write

1) "min-slaves-to-write"

2) "1000"

SET GLOBAL READ_ONLY=1;

The Dream

●Standard Setup & Deployment●Security: SSL/mTLS●High Availability●Monitoring & Visibility

Looking Forward• SpinCycle Automation• Native SSL/TLS Support• Square Internal Patches

Q&A

Links

• https://www.percona.com/live/17/sessions/metrics-collection-storage-visualization-scale

• https://github.com/square/ghostunnel• https://github.com/square/p2• https://github.com/square/spincycle

square.com