Upload
flowdock
View
9.994
Download
1
Embed Size (px)
DESCRIPTION
Ville Lautanala's talk from Frozen Rails 2012: how Flowdock uses chef and ZooKeeper to manage a set of distributed services.
Citation preview
A Tale of a Server Architecture
Ville Lautanala@lautis
WHO AM I @lautis
Flowdock, team collaboration app with software developer as primary target audience.Right-hand side: chat, left-hand side: inbox or activity stream for your team.If you’ve read a Node.JS tutorial you probably know needed the architecture.
Facts
• Single page JavaScript front-end
• WebSocket based communication layer
• Three replicated databases
• Running on dedicated servers in Germany
• 99.98% availability
WebSockets == no third-party load-balancers/PaaS for us99.99% according to CEO, but I’m being conservative
Goal: beat your hosting provider in uptime
Have a good uptime on unreliable hardware.
We don’t want to wake up at night to fix our app like this guy in this picture. The founders had previously a hosting company.
This is not an exact science, every app is different.
Architecture Archaeology
We haven’t been always doing very well
Flowdock 2010
MongoDB
Messages
PostgreSQL
Rails
Apache
Simple stack, but the messaging part quickly became hairy. It had HTTP streaming, Twitter integration and e-mail server. Lot of brittle state.
Divide and Conquer
Nice strategy for building your SOA, sorting lists and taking over the world.
MongoDB
Redis
HTTP Streaming
API
Message Backend
PostgreSQL
Rails
WebSocket APIIRCRSS
Stunnel
GeoDNS
HAproxy
These are all different processes. More components, but this has enabled us to easily add new features to components
Separated concerns...
but many parts to configure
So, you need to setup boxes...
ChefInfrastructure as (Ruby) Code
Chef lets you to automate server configuration with Ruby code.
Chef at Flowdock
• Firewall configuration
• Distribute SSH host keys
• User setup
• Join mesh-based VPN
• And app/server specific stuff
Firewall set up is based on IP-whitelist. Only nodes in chef can access private services.SSH host keys prevent MITMWe have a mesh-based VPN, which is automatically configured based on Chef data
•Cookbooks
•Recipes
•Roles
Chef server
Centralized chef server which nodes communicate with and get updates from.
include_recipe "flowdock:users" package "ruby"
%w{port listen_to flowdock_domain}.each do |e| template "#{node[:flowdock][:oulu][:envdir]}/#{e.upcase}" do source "envdir_file.erb" variables :value => node[:flowdock][:oulu][e] owner "oulu" mode "0600" endend
runit_service "oulu" do options :use_config => trueend
cookbooks/flowdock/oulu.rb
Recipe for our IRC server
roles/rails.rbname "rails"description "Rails Box"run_list( "recipe[nginx]", "recipe[passenger]")override_attributes( passenger: { version: "3.0.7" })
Recipe in Ruby DSLEach node can be assigned any number of rolesOverride attributes can be used to override recipe attributes
Managing Chef cluster
$ knife cookbook upload -a -o cookbooks
Managing Chef cluster
$ knife search node role:flowdock-app-serverNode Name: imaginary-serverEnvironment: qaFQDN: imaginary-server.flowdock.dmzIP: 10.0.0.1Run List: role[qa], role[flowdock-app-server], role[web-server]Roles: qa, flowdock-app-server, web-serverRecipes: ubuntu, firewall, chef, flowdock, unicorn, haproxyPlatform: ubuntu 12.04Tags:
Managing Chef cluster
$ knife ssh 'role:qa' 'echo "lol"'imaginary-server lolqa-db1 lolqa-db2 lol
Most useful command: trigger chef run on servers
Testing Chef Recipes
• Use Chef environments to isolate changes
• Run chef-client on throw-away VMs
• cucumber-chef
sous-chef could be used to automate VM setupOur experience with cucumber-chef and sous-chef is limitedYou need also to monitor stuff e.g. runs have finished on nodes, backups are really taken
Automatic FailoverAvoiding Single Point of Failures
MongoDB works flawlessly as failover is built-in, but how to handle Redis?
HAproxyTCP/HTTP Load Balancer with Failover handling
HAproxy provides easy failover for Rails instances
MongoDB has automatic failover built-in
MongoDB might have many problems, but failover isn’t one of them. Drivers are always connected to master.
Redis and Postgres have replication, but failover is manual
Not only do you need to promote master automatically, but also change application configuration.
ZooKeeper
Distributed coordination
Each operation has to be agreed by majority of servers. Eventual consistency.
require 'zk'
$queue = Queue.newzk = ZK.newzk.register('/hello_world') do |event| # need to reset watch data = zk.get('/hello_world', watch: true).first# do stuff
$queue.push(:event)end
zk.create('/hello_world', 'sup?')$queue.pop # Handle local synchronizationzk.set('/hello_world', 'omg, update')
Using the high-level zk gem. Block is run every time value is updated.ZK gem has locks and other stuff implemented.
zk = ZK.new
zk.with_lock('/lock', :wait => 5.0) do |lock| # do stuff # others have to waitend
Redis master failover using ZooKeeper
gem install redis_failover
but in 3 programming languages
Redis Failover
Node Manager
Node Manager
Redis NodeRedis Node
ZooKeeper
Monitor
Update
App
App
App
Watch
Our apps might not use redis_failover or read ZK directly. Script restarts the app when ZK changes.HAproxy or DNS based solutions also possible, but this gives us more control over the app restart.
Postgres failover with pgpool-II and ZooKeeper
pgpool manages pg cluster, queries can be distributed to slavesI’m afraid of pgpool, configuration and monitoring scripts are really scary
Postgres Failover
pgpool
PGPG
App
ZooKeeper
PGpool monitor
zookeeper/pgpool monitoring is used to provide redundancy to pgpoolIf pgpool fails, app needs to reconnect to new server
Zoos are keptSimilar scheme can be used for other master-slave based replications, e.g. handling twitter integration failover.
REMEMBER TO TEST
Test your failover
You might only need some failover few times a year.
Not sure if everything of our stuff is top-notch, but there have been one-time use cases for the complicated stuff.
Chef vs ZooKeeper
Chef ZooKeeper
Configuration files Dynamic configuration variables
Server boostrap Failover handling
Chef write long configuration files, ZooKeeper only contains few variablesChef boostraps server and keeps them up-to-date, ZooKeeper is used to elect master nodes in master-slave scenarios.
Mesh-based VPN between boxes
Encrypted MongoDB traffic between masters and slaves. Saved the day few times when there has been routing issues between data centers.
SSL endpoints in AWS
Routing issues between our German ISP and Comcast. Move SSL front ends closer to client to fix this and reduce latency. Front-page loads 150ms faster.
WinningWe don’t need to worry about waking up at nights. The whole team could go sailing and be without internet access at the same time.
Lessons learned
What have we learned?
WebSockets are cool, but make your life harder
Heroku, Amazon Elastic Load Balancer, CloudFlare and Google App engine don’t work with WS. If you only need to stream stuff, using HTTP EventStreaming is better choice.
♫ Let it crash ♫
Make your app crash, at least you are there to fix things.
Questions?
Thanks!