49
A Tale of a Server Architecture Ville Lautanala @lautis

A Tale of a Server Architecture (Frozen Rails 2012)

Embed Size (px)

DESCRIPTION

Ville Lautanala's talk from Frozen Rails 2012: how Flowdock uses chef and ZooKeeper to manage a set of distributed services.

Citation preview

Page 1: A Tale of a Server Architecture (Frozen Rails 2012)

A Tale of a Server Architecture

Ville Lautanala@lautis

Page 2: A Tale of a Server Architecture (Frozen Rails 2012)

WHO AM I @lautis

Page 3: A Tale of a Server Architecture (Frozen Rails 2012)
Page 4: A Tale of a Server Architecture (Frozen Rails 2012)

Flowdock, team collaboration app with software developer as primary target audience.Right-hand side: chat, left-hand side: inbox or activity stream for your team.If you’ve read a Node.JS tutorial you probably know needed the architecture.

Page 5: A Tale of a Server Architecture (Frozen Rails 2012)

Facts

• Single page JavaScript front-end

• WebSocket based communication layer

• Three replicated databases

• Running on dedicated servers in Germany

• 99.98% availability

WebSockets == no third-party load-balancers/PaaS for us99.99% according to CEO, but I’m being conservative

Page 6: A Tale of a Server Architecture (Frozen Rails 2012)

Goal: beat your hosting provider in uptime

Have a good uptime on unreliable hardware.

Page 7: A Tale of a Server Architecture (Frozen Rails 2012)

We don’t want to wake up at night to fix our app like this guy in this picture. The founders had previously a hosting company.

Page 8: A Tale of a Server Architecture (Frozen Rails 2012)

This is not an exact science, every app is different.

Page 9: A Tale of a Server Architecture (Frozen Rails 2012)

Architecture Archaeology

We haven’t been always doing very well

Page 10: A Tale of a Server Architecture (Frozen Rails 2012)

Flowdock 2010

MongoDB

Messages

PostgreSQL

Rails

Apache

Simple stack, but the messaging part quickly became hairy. It had HTTP streaming, Twitter integration and e-mail server. Lot of brittle state.

Page 11: A Tale of a Server Architecture (Frozen Rails 2012)

Divide and Conquer

Nice strategy for building your SOA, sorting lists and taking over the world.

Page 12: A Tale of a Server Architecture (Frozen Rails 2012)

MongoDB

Redis

HTTP Streaming

API

Message Backend

PostgreSQL

Rails

WebSocket APIIRCRSS

Stunnel

GeoDNS

HAproxy

These are all different processes. More components, but this has enabled us to easily add new features to components

Page 13: A Tale of a Server Architecture (Frozen Rails 2012)

Separated concerns...

Page 14: A Tale of a Server Architecture (Frozen Rails 2012)

but many parts to configure

Page 15: A Tale of a Server Architecture (Frozen Rails 2012)

So, you need to setup boxes...

Page 16: A Tale of a Server Architecture (Frozen Rails 2012)

ChefInfrastructure as (Ruby) Code

Chef lets you to automate server configuration with Ruby code.

Page 17: A Tale of a Server Architecture (Frozen Rails 2012)

Chef at Flowdock

• Firewall configuration

• Distribute SSH host keys

• User setup

• Join mesh-based VPN

• And app/server specific stuff

Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.SSH host keys prevent MITMWe have a mesh-based VPN, which is automatically configured based on Chef data

Page 18: A Tale of a Server Architecture (Frozen Rails 2012)

•Cookbooks

•Recipes

•Roles

Page 19: A Tale of a Server Architecture (Frozen Rails 2012)

Chef server

Centralized chef server which nodes communicate with and get updates from.

Page 20: A Tale of a Server Architecture (Frozen Rails 2012)

include_recipe "flowdock:users" package "ruby"

%w{port listen_to flowdock_domain}.each do |e| template "#{node[:flowdock][:oulu][:envdir]}/#{e.upcase}" do source "envdir_file.erb" variables :value => node[:flowdock][:oulu][e] owner "oulu" mode "0600" endend

runit_service "oulu" do options :use_config => trueend

cookbooks/flowdock/oulu.rb

Recipe for our IRC server

Page 21: A Tale of a Server Architecture (Frozen Rails 2012)

roles/rails.rbname "rails"description "Rails Box"run_list(  "recipe[nginx]", "recipe[passenger]")override_attributes( passenger: { version: "3.0.7" })

Recipe in Ruby DSLEach node can be assigned any number of rolesOverride attributes can be used to override recipe attributes

Page 22: A Tale of a Server Architecture (Frozen Rails 2012)

Managing Chef cluster

$ knife cookbook upload -a -o cookbooks

Page 23: A Tale of a Server Architecture (Frozen Rails 2012)

Managing Chef cluster

$ knife search node role:flowdock-app-serverNode Name: imaginary-serverEnvironment: qaFQDN: imaginary-server.flowdock.dmzIP: 10.0.0.1Run List: role[qa], role[flowdock-app-server], role[web-server]Roles: qa, flowdock-app-server, web-serverRecipes: ubuntu, firewall, chef, flowdock, unicorn, haproxyPlatform: ubuntu 12.04Tags:

Page 24: A Tale of a Server Architecture (Frozen Rails 2012)

Managing Chef cluster

$ knife ssh 'role:qa' 'echo "lol"'imaginary-server lolqa-db1 lolqa-db2 lol

Most useful command: trigger chef run on servers

Page 25: A Tale of a Server Architecture (Frozen Rails 2012)

Testing Chef Recipes

• Use Chef environments to isolate changes

• Run chef-client on throw-away VMs

• cucumber-chef

sous-chef could be used to automate VM setupOur experience with cucumber-chef and sous-chef is limitedYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken

Page 26: A Tale of a Server Architecture (Frozen Rails 2012)

Automatic FailoverAvoiding Single Point of Failures

MongoDB works flawlessly as failover is built-in, but how to handle Redis?

Page 27: A Tale of a Server Architecture (Frozen Rails 2012)

HAproxyTCP/HTTP Load Balancer with Failover handling

HAproxy provides easy failover for Rails instances

Page 28: A Tale of a Server Architecture (Frozen Rails 2012)

MongoDB has automatic failover built-in

MongoDB might have many problems, but failover isn’t one of them. Drivers are always connected to master.

Page 29: A Tale of a Server Architecture (Frozen Rails 2012)

Redis and Postgres have replication, but failover is manual

Not only do you need to promote master automatically, but also change application configuration.

Page 30: A Tale of a Server Architecture (Frozen Rails 2012)

ZooKeeper

Page 31: A Tale of a Server Architecture (Frozen Rails 2012)

Distributed coordination

Each operation has to be agreed by majority of servers. Eventual consistency.

Page 32: A Tale of a Server Architecture (Frozen Rails 2012)

require 'zk'

$queue = Queue.newzk = ZK.newzk.register('/hello_world') do |event| # need to reset watch data = zk.get('/hello_world', watch: true).first# do stuff

$queue.push(:event)end

zk.create('/hello_world', 'sup?')$queue.pop # Handle local synchronizationzk.set('/hello_world', 'omg, update')

Using the high-level zk gem. Block is run every time value is updated.ZK gem has locks and other stuff implemented.

Page 33: A Tale of a Server Architecture (Frozen Rails 2012)

zk = ZK.new

zk.with_lock('/lock', :wait => 5.0) do |lock| # do stuff # others have to waitend

Page 34: A Tale of a Server Architecture (Frozen Rails 2012)

Redis master failover using ZooKeeper

Page 35: A Tale of a Server Architecture (Frozen Rails 2012)

gem install redis_failover

but in 3 programming languages

Page 36: A Tale of a Server Architecture (Frozen Rails 2012)

Redis Failover

Node Manager

Node Manager

Redis NodeRedis Node

ZooKeeper

Monitor

Update

App

App

App

Watch

Our apps might not use redis_failover or read ZK directly. Script restarts the app when ZK changes.HAproxy or DNS based solutions also possible, but this gives us more control over the app restart.

Page 37: A Tale of a Server Architecture (Frozen Rails 2012)

Postgres failover with pgpool-II and ZooKeeper

pgpool manages pg cluster, queries can be distributed to slavesI’m afraid of pgpool, configuration and monitoring scripts are really scary

Page 38: A Tale of a Server Architecture (Frozen Rails 2012)

Postgres Failover

pgpool

PGPG

App

ZooKeeper

PGpool monitor

zookeeper/pgpool monitoring is used to provide redundancy to pgpoolIf pgpool fails, app needs to reconnect to new server

Page 39: A Tale of a Server Architecture (Frozen Rails 2012)

Zoos are keptSimilar scheme can be used for other master-slave based replications, e.g. handling twitter integration failover.

REMEMBER TO TEST

Page 40: A Tale of a Server Architecture (Frozen Rails 2012)

Test your failover

You might only need some failover few times a year.

Not sure if everything of our stuff is top-notch, but there have been one-time use cases for the complicated stuff.

Page 41: A Tale of a Server Architecture (Frozen Rails 2012)

Chef vs ZooKeeper

Chef ZooKeeper

Configuration files Dynamic configuration variables

Server boostrap Failover handling

Chef write long configuration files, ZooKeeper only contains few variablesChef boostraps server and keeps them up-to-date, ZooKeeper is used to elect master nodes in master-slave scenarios.

Page 42: A Tale of a Server Architecture (Frozen Rails 2012)

Mesh-based VPN between boxes

Encrypted MongoDB traffic between masters and slaves. Saved the day few times when there has been routing issues between data centers.

Page 43: A Tale of a Server Architecture (Frozen Rails 2012)

SSL endpoints in AWS

Routing issues between our German ISP and Comcast. Move SSL front ends closer to client to fix this and reduce latency. Front-page loads 150ms faster.

Page 44: A Tale of a Server Architecture (Frozen Rails 2012)

WinningWe don’t need to worry about waking up at nights. The whole team could go sailing and be without internet access at the same time.

Page 45: A Tale of a Server Architecture (Frozen Rails 2012)

Lessons learned

What have we learned?

Page 46: A Tale of a Server Architecture (Frozen Rails 2012)

WebSockets are cool, but make your life harder

Heroku, Amazon Elastic Load Balancer, CloudFlare and Google App engine don’t work with WS. If you only need to stream stuff, using HTTP EventStreaming is better choice.

Page 47: A Tale of a Server Architecture (Frozen Rails 2012)

♫ Let it crash ♫

Make your app crash, at least you are there to fix things.

Page 48: A Tale of a Server Architecture (Frozen Rails 2012)

Questions?

Page 49: A Tale of a Server Architecture (Frozen Rails 2012)

Thanks!