12
Netflix Confidential Santa Cloud: How Netflix Does Holiday Capacity Planning August 9th, 2016

Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Embed Size (px)

Citation preview

Page 1: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Santa Cloud: How Netflix Does Holiday Capacity Planning

August 9th, 2016

Page 2: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Page 3: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Page 4: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

● Additional complexities

○ Volatile customer traffic

○ Option to do regional failover (Chaos Kong)

○ No budgeting process

● Can’t run too lean or too wasteful

Capacity Tightrope Walking

● Holiday season = “let’s not get fired” season

Page 5: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

● Charter: “Ensure availability of cloud capacity in

an efficient manner, allowing engineering

organizations to further prioritize innovation

and availability.”

● Cross-functional team amongst:

○ Engineering

○ Data Science

○ Finance

Cloud Capacity Planning

Page 6: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

● Match methodology for the environment

○ Bottoms-up

○ Tops-down

○ Highly iterative

● Engagement with the largest service teams

● Evaluate migrations and coordinate changes

Holiday Preparation

Page 7: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

● One of our largest services changing hardware type

● Service profile:

○ ~20% of our total footprint

○ Autoscales ~65% in the course of a day

○ Runs in all regions

○ CPU bound, memory intensive

Case Study: API Service

Page 8: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Changing The Trough

Reservation line

Page 9: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Managing Failover Capacity

● Evolution of the Chaos Kong

● Cascading failovers

● Who gets capacity?

Page 10: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

● Through detailed planning:

○ Maximize trough borrowing

○ Highest chance of capacity availability

● Charter: “Ensure availability of cloud capacity in

an efficient manner, allowing engineering

organizations to further prioritize innovation

and availability.”

Our Role In The API Migration

● “Rinse-repeat” for the other large service teams

Page 11: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Page 12: Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016

Netflix Confidential

Questions