Fail the Right Way - Node.js in Production

FAIL... THE RIGHTWAY

NODE.JS IN PRODUCTION

|

ssw2014.formidablelabs.com

@ryan_roemer formidablelabs.com

http://ssw2014.formidablelabs.com/

http://twitter.com/ryan_roemer

http://formidablelabs.com/

WELCOME TO PRODUCTIONProduction can be a rough place for

your Node.js apps. Things can go verywrong out in the wild.

FORMIDABLE LABS

3:00 AM

OUR FOCUSWhether on PAAS, IAAS, or bare metal.

Design for Failure: Keep your Node.js apps up

Avoidance: Get yourself out of the failover business

Isolate: One failure at a time

Analyze: Debug and diagnose problems quickly

1. DESIGN FOR FAILUREFail and recover at multiple levels.

Let's look at failure from a systemperspective.

SINGLE NODE.JS WORKER.Never ignore errors

Have a strong bias for killing theworker.

Handle: uncaughtException,

Listen: foo.on("error")

Domains

http://nodejs.org/api/domain.html#domain_warning_don_t_ignore_errors

http://nodejs.org/api/domain.html

MULTIPLE NODE.JS WORKERSUse or to

multiplex CPUs and isolate errors.Workers: die early on errors

Master: monitor and kill workers

cluster recluster

http://nodejs.org/api/cluster.html

https://github.com/doxout/recluster

MULTIPLE NODE.JS WORKERS

var recluster = require("recluster");var cluster = recluster("./server.js");cluster.run();

// Hot reload: kill -s SIGUSR2 CLUSTER_PIDprocess.on("SIGUSR2", function() { console.log("Got SIGUSR2, reloading cluster..."); cluster.reload();});

SERVERUse or alternatives

Restart the Node.js master

monit

http://mmonit.com/monit/

SERVICELoad-balancers

Heartbeat / ping monitors

Availability zones, etc.

MAKE IT HOTEverything up to this point should have

hot failover.

DATACENTERHot failover across

datacenters?Typically very costly

But, the real deal if you're serious

DISASTER RECOVERY"Business Continuity"

Don't let a technological problem end your business

Have a worst case, "lose some data" recovery plan

2. AVOID FAILURESGet out of the business of failover

when you don't have to do it yourself.

RESOURCES TO NOT SUPPORTDon't rely on system / service

resources you don't need to.Disk: NAS, disks, SSDs.

Datastores: DB, cloud services.

... Load Balancers, DNS, etc.

HOW TO AVOIDUse SAAS wherever possible! (DB, LBs, storage).

Or PAAS for some Node.js apps.

Design Stateless, fungible servers (no disk risks).

3. ISOLATE FAILURESIsolate failures you can't

avoid.

RESOURCES TO SUPPORTLook to resources you must depend on:

CPU/Load: Run out of this and it's over.

HTTP: Each different host you hit.

Datastores: Connections? Different Hosts?

... also, memory, I/O, etc. and combinations thereof

SOME ANECDOTESNode.js apps can be bad neighbors.

DB (auto-suggest) vs. HTTP (vendor translations)

DB (CRUD app) vs. CPU/Load (co-located PHP app)

Read vs. Write DB operations.

HOW TO ISOLATECreate "micro-services" that stand on their own.

Monitor for cross-pressure and respond. (Next section!)

4. ANALYZE EVERYTHINGData drives problem discovery

and action.

LOG, MONITOR, MINE

https://scoutapp.com/

https://www.pingdom.com/

http://www.pagerduty.com/

http://loggly.com/

http://aws.amazon.com/elasticmapreduce/

DECISIONS, GOALSThings to look for in Node.js apps...

IdentifyResource pressure: CPU, I/O,memory, network

Performance: Throughput,latency

Errors/Bugs: Quantitative,qualitative

DecideScale up, scale down?

Separate services?

RECAPDesign for failure

Avoid

Isolate

Analyze

THANKS!

|

ssw2014.formidablelabs.com

@ryan_roemer formidablelabs.com

http://ssw2014.formidablelabs.com/

http://twitter.com/ryan_roemer

http://formidablelabs.com/

Software

Fail the Right Way - Node.js in Production