AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Preview:

DESCRIPTION

 

Citation preview

Ariel Tseitlin

Chaos Monkey & The Simian Army

About Netflix

With more than 30 million streaming members in the United States, Canada, Latin America, the United Kingdom, Ireland and the Nordics, Netflix, Inc. (NASDAQ: NFLX) is the world's leading internet subscription service for enjoying movies and TV programs[1][1] http://ir.netflix.com/

Personalization Engine User Info

Movie Metadata

Movie Ratings

Similar Movies

API

ReviewsA/B Test Engine

2B requests per day

into the Netflix API

12B outbound requests per day to API dependencies

A complex distributed system

Growth is good (and scary)

30x growth in two years!

Growth is good (and scary)

Things will break

Chaos Monkey taught us…

• State is bad• Clusters are good• Surviving instance failure is a low bar

The Sick and Wounded

Latency Monkey

Latency Monkey taught us

• Startup resiliency is often missed• An ongoing unified approach to runtime dependency

management is important (visibility & transparency gets missed otherwise)

• Know thy neighbor (unknown dependencies)

Clutter happens

Janitor Monkey taught us…

• Label everything• Clutter builds up

Ranks of the Simian Army

• Chaos Monkey

• Chaos Gorilla

• Latency Monkey

• Janitor Monkey

• Conformity Monkey

• Circus Monkey

• Doctor Monkey

• Howler Monkey

• Security Monkey

• Chaos Kong

• Efficiency Monkey

Big impact on availability

• Results of the monkeys

Open

We are sincerely eager to hear your feedback on this

presentation and on re:Invent.

Please fill out an evaluation form when you have a

chance.

Recommended