26
FAIL... THE RIGHT WAY NODE.JS IN PRODUCTION | ssw2014.formidablelabs.com @ryan_roemer formidablelabs.com

Fail the Right Way - Node.js in Production

Embed Size (px)

Citation preview

Page 1: Fail the Right Way - Node.js in Production

FAIL... THE RIGHTWAY

NODE.JS IN PRODUCTION

|

ssw2014.formidablelabs.com

@ryan_roemer formidablelabs.com

Page 2: Fail the Right Way - Node.js in Production

WELCOME TO PRODUCTIONProduction can be a rough place for

your Node.js apps. Things can go verywrong out in the wild.

Page 3: Fail the Right Way - Node.js in Production

FORMIDABLE LABS

Page 4: Fail the Right Way - Node.js in Production

3:00 AM

Page 5: Fail the Right Way - Node.js in Production

OUR FOCUSWhether on PAAS, IAAS, or bare metal.

Design for Failure: Keep your Node.js apps up

Avoidance: Get yourself out of the failover business

Isolate: One failure at a time

Analyze: Debug and diagnose problems quickly

Page 6: Fail the Right Way - Node.js in Production

1. DESIGN FOR FAILUREFail and recover at multiple levels.

Let's look at failure from a systemperspective.

Page 7: Fail the Right Way - Node.js in Production

SINGLE NODE.JS WORKER.Never ignore errors

Have a strong bias for killing theworker.

Handle: uncaughtException,

Listen: foo.on("error")

Domains

Page 8: Fail the Right Way - Node.js in Production

MULTIPLE NODE.JS WORKERSUse or to

multiplex CPUs and isolate errors.Workers: die early on errors

Master: monitor and kill workers

cluster recluster

Page 9: Fail the Right Way - Node.js in Production

MULTIPLE NODE.JS WORKERS

var recluster = require("recluster");var cluster = recluster("./server.js");cluster.run();

// Hot reload: kill -s SIGUSR2 CLUSTER_PIDprocess.on("SIGUSR2", function() { console.log("Got SIGUSR2, reloading cluster..."); cluster.reload();});

Page 10: Fail the Right Way - Node.js in Production

SERVERUse or alternatives

Restart the Node.js master

monit

Page 11: Fail the Right Way - Node.js in Production

SERVICELoad-balancers

Heartbeat / ping monitors

Availability zones, etc.

Page 12: Fail the Right Way - Node.js in Production

MAKE IT HOTEverything up to this point should have

hot failover.

Page 13: Fail the Right Way - Node.js in Production

DATACENTERHot failover across

datacenters?Typically very costly

But, the real deal if you're serious

Page 14: Fail the Right Way - Node.js in Production

DISASTER RECOVERY"Business Continuity"

Don't let a technological problem end your business

Have a worst case, "lose some data" recovery plan

Page 15: Fail the Right Way - Node.js in Production

2. AVOID FAILURESGet out of the business of failover

when you don't have to do it yourself.

Page 16: Fail the Right Way - Node.js in Production

RESOURCES TO NOT SUPPORTDon't rely on system / service

resources you don't need to.Disk: NAS, disks, SSDs.

Datastores: DB, cloud services.

... Load Balancers, DNS, etc.

Page 17: Fail the Right Way - Node.js in Production

HOW TO AVOIDUse SAAS wherever possible! (DB, LBs, storage).

Or PAAS for some Node.js apps.

Design Stateless, fungible servers (no disk risks).

Page 18: Fail the Right Way - Node.js in Production

3. ISOLATE FAILURESIsolate failures you can't

avoid.

Page 19: Fail the Right Way - Node.js in Production

RESOURCES TO SUPPORTLook to resources you must depend on:

CPU/Load: Run out of this and it's over.

HTTP: Each different host you hit.

Datastores: Connections? Different Hosts?

... also, memory, I/O, etc. and combinations thereof

Page 20: Fail the Right Way - Node.js in Production

SOME ANECDOTESNode.js apps can be bad neighbors.

DB (auto-suggest) vs. HTTP (vendor translations)

DB (CRUD app) vs. CPU/Load (co-located PHP app)

Read vs. Write DB operations.

Page 21: Fail the Right Way - Node.js in Production

HOW TO ISOLATECreate "micro-services" that stand on their own.

Monitor for cross-pressure and respond. (Next section!)

Page 22: Fail the Right Way - Node.js in Production

4. ANALYZE EVERYTHINGData drives problem discovery

and action.

Page 24: Fail the Right Way - Node.js in Production

DECISIONS, GOALSThings to look for in Node.js apps...

IdentifyResource pressure: CPU, I/O,memory, network

Performance: Throughput,latency

Errors/Bugs: Quantitative,qualitative

DecideScale up, scale down?

Separate services?

Page 25: Fail the Right Way - Node.js in Production

RECAPDesign for failure

Avoid

Isolate

Analyze

Page 26: Fail the Right Way - Node.js in Production

THANKS!

|

ssw2014.formidablelabs.com

@ryan_roemer formidablelabs.com