A tale of scalability

A tale of scalabilityFrom one node to multiples DC's

Who?

2014 FIFA World Cup Brazil- 450K simultaneous users (ARG vs SWE)- 580Gbps (ARG vs SWE)- 1659 years watched (all games)- 7 x 1 (GER vs BRA)bbb, sportv, off, pfc, combate, gnews ...

Agenda

1. What this presentation is not about! 2. Basic glossary3. The story4. Questions

PS1: this presentation was not made by someone good with graphs/drawing/diagrams.

PS2: this reflects only my own unique personal individual stupid opinion not my employer.

http://upload.wikimedia.org/wikipedia/commons/a/a3/PSX_mainboard.jpg

http://upload.wikimedia.org/wikipedia/commons/a/a3/PSX_mainboard.jpg

http://upload.wikimedia.org/wikipedia/commons/9/96/Mb10k_all.jpg

http://upload.wikimedia.org/wikipedia/commons/9/96/Mb10k_all.jpg

What this presentation is not about!

Byzantine fault, 2PC, Paxos, RAFT, Threading, Locks, Leader election, ZAB, Consensus

problem, CRDTs, CALM, CAP theorem, to sum up this is not a deep distributed systems

presentation.

Basic glossary

Scalability: ability to enlarge to accommodate growth. (why?)

Basic glossary

Availability: the proportion of time a system is in a functioning condition. (why?)

Basic glossaryFault tolerance: ability to continue operating properly in the event of the failure. (why?)

Failover systems: software with automatic fault tolerance.

The story :: BananaApp

1st Solution :: 1 Server

databaseappserver

BRA

US

CAN Lost World

1st Solution :: problems

More users hit the app, it becomes slower :(CPU load was 1.3 (NOK) [h/top, w]I/O utilization was high (NOK) [iostat, vmstat]RAM usage 45% (OK) [free -m]Disk space 5% (OK) [df -h]

database

2nd Solution :: 2 Servers

appserver

BRA

USCAN

Lost World

database

2nd Solution :: good parts

Distributed the loadWe can fine tune each server separately

2nd Solution :: problems

More users hit the app, it becomes slower :(APP's CPU load was 1.3 (NOK)DB's CPU load was 0.3 (OK)I/O utilization was normal (OK)Introduce of network latency / point of failure

appserver

2nd Solution :: new conceptsPoint of Failure (or single point of failure [SPoF]) is a part of a system that, if it fails, will stop the entire system from working.Examples: our database and app server

When a solution is free from SPoF we can say it's a failover system.

3rd Solution :: 4 Servers

loadbalancerPPL

database

app1

app2

3rd Solution :: new concepts"Load balancer (LB) is a device/software that distributes network or application traffic across a number of servers (reals)." (F5)

1. How does it chooses which server to send? round-robin, least conn, weighted... 2. How does it knows about a dead node? health check /page, tcp:80...

3rd Solution :: LB exampleshttp {

upstream myapp1 {

server srv1.example.com;


}

server {

listen 80;

location / {

proxy_pass http://myapp1;

health_check uri=/health;

}

}}

listen appname 0.0.0.0:80 mode http stats enable balance leastconn option httpclose option forwardfor option httpchk HEAD /health HTTP/1.1 server srv1 srv1.example.com:80 check server srv2 srv2.example.com:80 check

NGINX HAProxy

3rd Solution :: problems

Users are getting signed out "randomly". This problem is also known as: session persistence, session stickiness. Nginx: sticky cookie srv_id expires=1h domain=.example.com path=/;

HAProxy: cookie srv_id insert indirect nocache

4th Solution :: 5 Servers

loadbalancerPPL

database

app1

app2

memcachedredis ...

4th Solution :: +problems

We now have 3 SPoFs: LB, memcache and database.

5th Solution :: LB (float/virtual ip)

/etc/sysctl.conf net.ip_nonlocal_bind=1/etc/ha.d/haresources lb1 192.168.0.10

5th Solution :: Database

Partition and Replication

ABCD

C B

D A

B (1,2) C (2,3)

A (3,0) D (0,1)

5th Solution :: mongo (master/bkp)

5th Solution :: cassandra (cluster)

5th Solution :: you got the idea

lb1PPL lb2

app1

app2

app3

appn

DB

Session

db1 db2

db3 db4

s1 s2

5th Solution :: + caching

lb1PPL lb2

app1

app2

app3

appn

DB

Session

db1 db2

db3 db4

s1 s2

Caching

c1

c2

c3

5th Solution :: Cachingproxy_cache_path /data/nginx/cache keys_zone=one:10m;

http {

upstream myapp1 {



}

server {

listen 80;

proxy_cache one;

location / {

proxy_cache_valid any 1m;

proxy_pass http://myapp1;

}

}}

NGINX

5th :: + Microservice Architecture

Application API's

Core

n1

c1 c1

n1

Search

n1

c1 c1

n1

Recommendation

n1

c1

n1

Social

n1

c1 c1

n1

mongodb elasticsearch spark/hadoop neo4j

n1

c1

5th :: Single datacenter (yet SPoF)

6th solution :: multihoming

6th solution :: models to replication

master / backup

master / master

2PC

Paxos

6th solution :: database

Cassandra can help you

6th :: DNS round robin

$ dig a www.youtube.com

6th solution :: anycastBorder Gateway Protocol (BGP) makes routing decisions based on paths, network policies or rule-sets configured by a network administrator, and is involved in making core routing decisions.

DNS solves www.example.com to 1.1.1.1

Clients from Colorado mostly will be routed to Colorado's DC.

Clients from California mostly will be routed to California's DC.

1.1.1.1

1.1.1.1

http://www.example.com

6th solution :: sub domain per cli

6th :: GSLB (Global Server Load Balancing)

6th :: GSLB multiples A records (BR)

$ dig a www.youtube.com

6th :: GSLB multiples A records (DE)

7th day: you shall rest

BR-DKC101

Summarizing

lb1

lb2

app1app2app3

appn

DB

Session

db1

db2d

b3

db4

s2

Cachingc

1

c2

c3

s1

US-DKC102

lb1

lb2

app1app2app3

appn

DB

Session

db1

db2d

b3

db4

s2

Cachingc

1

c2

c3

s1

JP-DKC103

lb1

lb2

app1app2app3

appn

DB

Session

db1

db2d

b3

db4

s2

Cachingc

1

c2

c3

s1

Bonus - Vagrant

Bonus - Docker (docker-compose)

Bonus - don’t blindly trust vendors

Link to this presentation

slideshare.net/leandro_moreira

http://www.slideshare.net/leandro_moreira/

http://www.slideshare.net/leandro_moreira/

Questions?

leandromoreira.com.br

http://leandromoreira.com.br

http://leandromoreira.com.br

References● https://f5.com/glossary/load-balancer● http://leandromoreira.com.br/2014/11/20/how-to-start-to-learn-high-scalability/● http://nginx.org/en/docs/http/load_balancing.html● https://www.digitalocean.com/community/tutorials/how-to-use-haproxy-to-set-up-http-load-balancing-on-an-ubuntu-vps● https://academy.datastax.com/courses/● http://en.wikipedia.org/wiki/Single_point_of_failure● http://book.mixu.net/distsys/single-page.html● https://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch-p2● http://docs.mongodb.org/manual/core/sharding-introduction/● http://docs.mongodb.org/manual/core/replication-introduction/● http://nginx.com/resources/admin-guide/caching/● https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en● http://martinfowler.com/articles/microservices.html● http://www.netflix.com/WiMovie/70140358?trkid=12244757● http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html● http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html● http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/● http://tech.3scale.net/2014/06/18/redis-sentinel-failover-no-downtime-the-hard-way/● http://www.slideshare.net/gear6memcached/implementing-high-availability-services-for-memcached-1911077● http://docs.couchbase.com/moxi-manual-1.8/● http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html● http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35590.pdf● http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf● http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/● http://nil.csail.mit.edu/6.824/2015/papers/paxos-simple.pdf● http://the-paper-trail.org/blog/consensus-protocols-paxos/● donkeykong.com● http://backreference.org/2010/02/01/geolocation-aware-dns-with-bind/● http://www.tenereillo.com/GSLBPageOfShame.htm● http://backreference.org/2010/02/01/geolocation-aware-dns-with-bind/

https://f5.com/glossary/load-balancer

https://f5.com/glossary/load-balancer

http://leandromoreira.com.br/2014/11/20/how-to-start-to-learn-high-scalability/

http://leandromoreira.com.br/2014/11/20/how-to-start-to-learn-high-scalability/

http://nginx.org/en/docs/http/load_balancing.html

http://nginx.org/en/docs/http/load_balancing.html

https://www.digitalocean.com/community/tutorials/how-to-use-haproxy-to-set-up-http-load-balancing-on-an-ubuntu-vps

https://www.digitalocean.com/community/tutorials/how-to-use-haproxy-to-set-up-http-load-balancing-on-an-ubuntu-vps

https://academy.datastax.com/courses/

https://academy.datastax.com/courses/

http://en.wikipedia.org/wiki/Single_point_of_failure

http://en.wikipedia.org/wiki/Single_point_of_failure

http://book.mixu.net/distsys/single-page.html

http://book.mixu.net/distsys/single-page.html

https://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch-p2

https://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch-p2

http://docs.mongodb.org/manual/core/sharding-introduction/

http://docs.mongodb.org/manual/core/sharding-introduction/

http://docs.mongodb.org/manual/core/replication-introduction/

http://docs.mongodb.org/manual/core/replication-introduction/

http://nginx.com/resources/admin-guide/caching/

http://nginx.com/resources/admin-guide/caching/

https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en

https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en

http://martinfowler.com/articles/microservices.html

http://martinfowler.com/articles/microservices.html

http://www.netflix.com/WiMovie/70140358?trkid=12244757

http://www.netflix.com/WiMovie/70140358?trkid=12244757

http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html


http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/

http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/

http://tech.3scale.net/2014/06/18/redis-sentinel-failover-no-downtime-the-hard-way/

http://tech.3scale.net/2014/06/18/redis-sentinel-failover-no-downtime-the-hard-way/

http://www.slideshare.net/gear6memcached/implementing-high-availability-services-for-memcached-1911077

http://www.slideshare.net/gear6memcached/implementing-high-availability-services-for-memcached-1911077

http://docs.couchbase.com/moxi-manual-1.8/

http://docs.couchbase.com/moxi-manual-1.8/



http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35590.pdf


http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf

http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf

http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/

http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/

http://nil.csail.mit.edu/6.824/2015/papers/paxos-simple.pdf

http://nil.csail.mit.edu/6.824/2015/papers/paxos-simple.pdf

http://the-paper-trail.org/blog/consensus-protocols-paxos/

http://the-paper-trail.org/blog/consensus-protocols-paxos/

http://donkeykong.com

http://donkeykong.com

http://backreference.org/2010/02/01/geolocation-aware-dns-with-bind/


http://www.tenereillo.com/GSLBPageOfShame.htm

http://www.tenereillo.com/GSLBPageOfShame.htm



References● https://aphyr.com/tags/Distributed-Systems● http://pbs.cs.berkeley.edu/pbs-vldb2012.pdf● http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36737.pdf● http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability● https://www.usenix.org/legacy/events/nsdi06/tech/full_papers/freedman/freedman.pdf● http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35590.pdf

https://aphyr.com/tags/Distributed-Systems

https://aphyr.com/tags/Distributed-Systems

http://pbs.cs.berkeley.edu/pbs-vldb2012.pdf

http://pbs.cs.berkeley.edu/pbs-vldb2012.pdf



http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability

http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability

https://www.usenix.org/legacy/events/nsdi06/tech/full_papers/freedman/freedman.pdf

https://www.usenix.org/legacy/events/nsdi06/tech/full_papers/freedman/freedman.pdf



Technology

A tale of scalability