Inside Election Night at The New York Times Or, Panic in the Newsroom... Nick Rockwell, CTO NYT @nicksrockwell 03.21.17
Pre-Post-Mortem
✘ Who’s responsible? ✘ What if something goes wrong? ✘ Oh it did go wrong in 2012… ✘ What if there’s more load than we expect?
✓ Team, Roles & Responsibilities ✓ Build an Election Night Runbook (16 pages!) ✓ Dry runs around debates ✓ Integrate a CDN...
Timeline
8/21 - Olympics are a wrap
8/24 - First Election prep meeting
9/21 - Meet w/ Fastly
9/23 - Commit to using Fastly
10/25 - In production
11/5 - Agreement signed
11/8 - Election night!
Plan B for Elections
8 Additional www-varnish for content requests
8 Additional www-varnish for userinfo requests
8 Additional www-fe (just in case)
4 Additional www-varnish for elections app
Mobileweb and video load tests next week to inform possible buildout
Final test tonight for MobileWeb
Also Auth Scaling, Warming Amazon ELBs.. etc..
You already know this but... “A DDoS attack is like someone anonymously placing a press ad including your phone number and offering an Aston Martin for sale at $200. You’re bombarded by calls, your life is misery, the callers aren’t aware they’re part of a trick, and your attacker is almost impossible to trace.” - https://www.sidewaysdictionary.com/
Joys of CDN Obvious: � Scaled caching � Better performance due to edge delivery � DDoS protection Slightly less obvious: � Consistent performance � Better everything - TLS negotation, compression, etc. � Cascading effects of smaller, simpler infrastructure
What is risk? It’s not risk if someone else is responsible. It’s not risk if there’s no chance of
consequential failure. It’s still risk if you mitigate it. It’s still risk if you hedge, create
contingencies, and plan.
Risk and Accountability Our current ideologies of risk-taking and
accountability are at odds. Risk-taking can only take place within a
context of judgment that is opaque. A culture that values “boldness”, action-bias,
or the appearance of certainty, usually destroys true risk-taking.
Boring bullet list of stuff we’re changing
Logic changes in varnish if the request came from Fastly
Moving Abra back to the client-side (yay Ken)
Userinfo back to the client-side (can’t decrypt the session cookie..yet)
Audit what www services we can cache in Fastly
Connected CREAM to Fastly’s purge API
???????????????????? SO MANY THINGS
When are things happening
10/4-5 - First rounds of production tests (WWW)
10/09 - Testing during debate (WWW)
10/13-19 - Testing with Mobile Web (internally, public, debate)
10/25 - Production launch
11/08 - Hide somewhere and hope Trump doesn’t win
11/10’ish - back to datacenter if necessary (it wasn’t…)
To end: we are just getting started...
What’s next: � Continuing to shrink provisioning � Continuing to “purge” or replace downstream caching � Logs into BigQuery � Looking at edge processing opportunities: ⛈ Load balancing, WAF ⛈ Image service ⛈ Auth & Meter