43
Black Friday 2013 Ernest Mueller, Bazaarvoice Engineering

CloudAustin Black Friday 2013

Embed Size (px)

DESCRIPTION

A 2014 CloudAustin presentation on how we prepared for and executed on our high traffic surge over Black Friday.

Citation preview

Page 1: CloudAustin Black Friday 2013

Black Friday 2013

Ernest Mueller, Bazaarvoice Engineering

Page 2: CloudAustin Black Friday 2013

What Is Black Friday?

• The National Retail Federation writes: For some retailers, the holiday season [Nov-Dec] can represent as much as 20-40% of annual sales.

• ShopperTrak says: National retail sales increased 2.7% and foot traffic decreased 14.6% when compared to the same two months last year (2012).

• Black Friday (the Friday after Thanksgiving) and Cyber Monday (the Monday after that) have become big discounting and promotional events that retailers use to push holiday purchasing.

• Summary: It’s a big deal to many of our clients and is becoming more ecomm-driven every year

Page 3: CloudAustin Black Friday 2013

3

Historically

In 2011 we served

1.52 BAnd in 2012 we served

2.03 B.

Roadmap Prediction

Bazaarvoice expected

review impressions on Black Friday & Cyber Monday 2013. That’s a 30% YoY growth rate.

Results

Bazaarvoice served

review impressions on Black Friday & Cyber Monday 2013. That’s a 31.4% YoY growth rate.

Black Friday/Cyber Monday 2013 @BV

2.67 B2.6 B

Page 4: CloudAustin Black Friday 2013

If you took all the reviews we served up to shoppers on

Black Friday 2013 and printed them into paperback book

form, it would take a bookshelf almost 11 miles long

to hold them.

Page 5: CloudAustin Black Friday 2013

Step 0: Architecture

Page 6: CloudAustin Black Friday 2013

Scaling Isn’t Just For Black Friday

• We continuously work to scale the product – our data size doubles year over year

• Architectural changes to meet the demand are constant and ongoing – there is no “maintenance mode” at scale

• Your base architecture needs to be scalable

• Then you have to refactor again and again

Page 7: CloudAustin Black Friday 2013
Page 8: CloudAustin Black Friday 2013
Page 9: CloudAustin Black Friday 2013
Page 10: CloudAustin Black Friday 2013

10

The Three Amigos

Page 11: CloudAustin Black Friday 2013
Page 12: CloudAustin Black Friday 2013

Dove’s Thoughts• Upping performance and

running your system at 40% instead of 80% gave a lot of insight into our second order set of bottlenecks and performance characteristics

• The choice of where to place/span ASGs and other Amazon bits was a major talking point among the Amigos, and ended up being located per AZ because of our DNS/HAProxy front end

• The “diagonal scaling” challenge of instance size vs number of instances vs PIOPS speed is hard and you basically just have to run tests to dial in on the minima; this changes a lot over time

• Remember, with the public cloud a lot of this is black box and while that removes a lot of work from you, it adds other work and requires certain best practices to make the most of your system

Page 13: CloudAustin Black Friday 2013

Step 1: Planning

Page 14: CloudAustin Black Friday 2013

This Year

• We started Black Friday specific work on August 12, 2013.

• That’s when client readiness surveys start coming in!

• We’ve done this previous years, but this year there was a big additional demand placed on the planning…

Page 15: CloudAustin Black Friday 2013

15

The Old Meets The New

Page 16: CloudAustin Black Friday 2013

Communicate and Coordinate

• The first step is always internal communication

• We create an “Internal Preparedness Statement” to provide a concise, definitive statement for Engineering, Sales, Support, and Implementation

• Regular weekly prep status meetings

• From the August 12 “Planning is beginning” notification till the celebratory happy hour on Dec 16, I have 1,287 emails that mention “Black Friday.”

• Due to the new distributed-team challenge, we needed a person responsible for coordinating our overall Black Friday response…

Page 17: CloudAustin Black Friday 2013
Page 18: CloudAustin Black Friday 2013

Step 2: Freezing

Page 19: CloudAustin Black Friday 2013

BV Holiday Freeze StatementSoft FreezeWe observe a general change freeze period starting 1 November and ending 15 January. During this period, we do not introduce changes to Bazaarvoice products that are integrated with our clients' websites. We may introduce changes into back-end systems that do not impact the end-user site experience.

Hard FreezeWe only release infrastructure and configuration changes required to restore service to or prevent a service disruption to one or more of our customers. The Critical System Change periods are:• 5 days prior to and 5 days after Black Friday (24 November

2013 through 4 December 2013)• 4 days prior to and 7 days after Christmas (21 December

2013 through 1 January 2014)

Page 20: CloudAustin Black Friday 2013

What Does Freeze Mean To You?

Page 21: CloudAustin Black Friday 2013

Step 3: Scaling

Page 22: CloudAustin Black Friday 2013

Traffic Projections and Scaling Plan

• Sadly, the answer isn’t as simple as “Amazon, yay!”

• Even they run out of resources over this period

• We conduct detailed YOY traffic projections

• We come up with a scaling plan to fit the projections

• Leave headroom!

Page 23: CloudAustin Black Friday 2013

Traffic Projection Tips

• Your system has various axes of scaling within it – trend and estimate them all

• We estimate incoming and outgoing reviews per day, peak requests per second on display servers, and calculate per-server acceptable capacity at each level (tomcat, Solr, database)

• Once you’ve done it one year, it’s easier because you can apply proportional lift to current traffic

• Keep an ear to the ground for environmental changes! This year retailers decided to start earlier and spike a little less on BF, so scaling came earlier than last year – but we read the news so we were prepared

Page 24: CloudAustin Black Friday 2013
Page 25: CloudAustin Black Friday 2013

0

200000000

400000000

600000000

800000000

1000000000

1200000000

1400000000

1600000000

PageviewsUGC Im-pressions

1.337 B1.330 B

Page 26: CloudAustin Black Friday 2013
Page 27: CloudAustin Black Friday 2013

Step 4: Supporting

Page 28: CloudAustin Black Friday 2013

Situational Awareness

• When the clock is running, you need your monitoring, alerting, response, etc. to be highly optimized for speed.

• We use a variety of monitoring types – nagios, zabbix, datadog, Keynote, pingdom

• And PagerDuty of course, aka “The One Ring”

• We write out runbooks for common response tasks such that we can have level 1 support people do them – or at least so that we don’t screw them up!

• Custom tooling is a must.

Page 29: CloudAustin Black Friday 2013

164k RPS

10 m2.xlarg

e

12 m2.xlarg

e

10 m2.xlarg

e

12k RPS

21k RPS

CDNHit Rate 80%TTL 600s

4330 ms

8210 ms

AWS East

AWS West

1023 ms

c1

3.4k RPS2340 ms

System Stats Histogram

3.4k RPS

1240 ms

c2

Page 31: CloudAustin Black Friday 2013
Page 32: CloudAustin Black Friday 2013

Escalated Response

• We had 3x daily (9 AM, 2 PM, 9 PM) status calls for all teams to check in

• We sent out overall status system performance to the entire company daily

• Oncall shifts of 12 hours apiece – not fully online but not “waiting for pages” either, need to be eyeballing the system at regular intervals

Page 33: CloudAustin Black Friday 2013
Page 34: CloudAustin Black Friday 2013

Step 5: Practicing

Page 35: CloudAustin Black Friday 2013

Test Your Plan!

• Test your scaling

– Amazon limits are your enemy – there’s a thousand of ‘em and many are hidden

• Test your monitoring

• Test your paging

• Test your runbooks

• We had two “game days” to scale up, apply load, provoke issues and execute on remediation

Page 36: CloudAustin Black Friday 2013

Drag picture to placeholder or click icon to add

Step 6: Profit

Page 37: CloudAustin Black Friday 2013

How It Went Down

• 23 teams across R&D and Support

• 40 engineers participating as Black Friday representatives

• 11 weeks of planning

• 2 stress-testing "Game Days”

• 26 round-the-clock status calls (8 “yellow” status, 18 “green”)

• 35 issues examined during the period

• $136,620.27 for the week in hosting costs

• Zero downtime

Page 38: CloudAustin Black Friday 2013
Page 39: CloudAustin Black Friday 2013

November Performance (c3)

Page 40: CloudAustin Black Friday 2013
Page 41: CloudAustin Black Friday 2013

Questions?

Page 42: CloudAustin Black Friday 2013

Recruiting Moment - BV:IO 2014

• Bazaarvoice’s internal tech conference and hackathon!

• Last year: Alamo Drafthouse, Adrian Cockroft (Netflix), Jason Baldridge (UT), Nick Bailey (Datastax), Peter Wang (Continuum Analytics)

• This year: Norris Conference Center, Theo Schlossnagle (Circonus), Greg Brockman (Stripe CTF), Bob Metcalf (UT)

• Late-nighter hackathon to develop sweet social commerce solutions

• Plus – COD: Black Ops!

Page 43: CloudAustin Black Friday 2013

43

Register: bvio2014.eventbrite.com

Team Signups On Hacker League

Koderz Only