Tectonic Summit 2016: Ticketmaster's Public Cloud & Kubernetes Strategy

Preview:

Citation preview

Ticketmaster - CoreOS Tectonic Summit 2016

COREOS TECTONIC SUMMITDECEMBER 12, 2016

Ticketmaster - CoreOS Tectonic Summit 2016

JUSTIN DEAN● SVP, Platform & Technical Operations● ~1.75 Years at Ticketmaster● Passionate about building high

performance organizations ● Nerdy about automating my beer &

BBQ pipeline (see PitmasterPi on github)

Ticketmaster - CoreOS Tectonic Summit 2016

OUR STORY● About Ticketmaster● Our Journey● Large Enterprise Challenges &

Lessons Learned● Why Kubernetes● CoreOS Partnership● Up Next

Ticketmaster - CoreOS Tectonic Summit 2016

● Publicly Traded Company (LYV)● $7.6B Revenue● $25B in GTV (Gross Transaction Value)●Top 5 eCommerce site

ABOUT USHISTORY● 1976 - Founded at Arizona State University● 1996 - Ticketmaster.com launched● 2010 - Live Nation and Ticketmaster join forces to

power live experiences

Ticketmaster - CoreOS Tectonic Summit 2016

We power unforgettable moments of joy!

Ticketmaster - CoreOS Tectonic Summit 2016

Concerts, Sports, Arts & Theater, Small Venues & Clubs

Ticketmaster - CoreOS Tectonic Summit 2016

Ticketmaster - CoreOS Tectonic Summit 2016

TECH COMPLEXITY

Ticketmaster - CoreOS Tectonic Summit 2016

● Every era of software, many not ready for containers and cloud

● 1970s: Custom VMS OS on Emulated VAX (The Host)

● 2000s: Xen Cloud, Big-Iron Filers, NFS, custom built infrastructure

PRE-MODERN TECHNOLOGY Tech Museum

Ticketmaster - CoreOS Tectonic Summit 2016

TECH SCALE● 21 Ticketing Systems and over 250

unique products● 1,400+ people in Product & Tech org● Custom Private Cloud with over 22,000

VMs across 7 global data centers● Over 15,000+ network endpoints across

the world (Venues, Arenas, Kiosks, etc)● Over 60% VM growth in last year

1 BILLION MACHINES!!*

*Not really :)

Ticketmaster - CoreOS Tectonic Summit 2016

{

Onsales = Black Friday every day!● Huge spikes / demand for

tickets● Global company = across time

zones● Limited inventory (Beyonce

Tickets!)● Multiple sales channels

0 to 150M transactions in minutes! That’s a spike of >8 GBps !!!!!

Self Inflicted DDOS-as-a-Business

BIG SCALE, BIG CHALLENGES

Ticketmaster - CoreOS Tectonic Summit 2016

COMPETITION

Ticketmaster - CoreOS Tectonic Summit 2016

● Market leader with huge surface area ● Competitors of every size and shape ● Speed and agility are absolutely key● Scale and complexity of 40-year old business make rapid changes very

hard

COMPETITIVE LANDSCAPE

&

Ticketmaster - CoreOS Tectonic Summit 2016

TO RECAP...

Public company / market pressure / highly competitive landscape

Legacy tech, not ready for containers

Tech debt with high interest rates

Huge scale and complexity

Black Friday every day

Ticketmaster - CoreOS Tectonic Summit 2016

MUST

GETFASTER!

Ticketmaster - CoreOS Tectonic Summit 2016

SIMPLIFY OUR PLATFORMMore Revenue and

Market Share

Better Products & Features

Deliver Products Faster

Autonomous Product Teams

Simplify Our Platform

Ticketmaster - CoreOS Tectonic Summit 2016

OUR JOURNEY

Ticketmaster - CoreOS Tectonic Summit 2016

OUR JOURNEY

Self-d

isrup

tion

Lean

Transf

ormati

on

Autono

mous D

elivery

Teams

Public

Cloud

Kube

rnetes

2013 2016 2017WE ARE HERE

Ticketmaster - CoreOS Tectonic Summit 2016

SELF-DISRUPTIONSe

lf-

disru

ption

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Public

Cloud

Kube

rnetes

2013

2016

2017

Ticketmaster - CoreOS Tectonic Summit 2016

LEAN TRANSFORMATION● Laser focused on highest priorities● Created 65+ cross-functional delivery teams● Eventually all roads led to “blocked by ops”● Got faster at developing; did not get faster at delivering

Self-d

isrupti

on

Lean

Trans

form

ation

Auton

omou

s

Delivery

Team

s

Public

Cloud

Kube

rnetes

2013

2016

2017

Ticketmaster - CoreOS Tectonic Summit 2016

AUTONOMOUS DELIVERY TEAMS● Moved application support teams out of TechOps and into the

product teams directly● Embedded Systems Engineers into product delivery teams

(closer to truly “cross-functional”)● Self-Service Tools: Surge towards getting teams out of the ops

business● Self-Sufficient businesses (build it, run it, own it, optimize it,

monetize it)

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delive

ry Te

ams

Public

Cloud

Kube

rnetes

2013

2016

2017

Microbusiness

Ticketmaster - CoreOS Tectonic Summit 2016

TRANSFORMATION INSIGHTSRealized our ability to innovate is dampened by our overly complex software factory:

30-50%Of development time spent

moving code around ($60M-$90M problem)

150Custom-built

ways to release products (often

manually)

~50%Incidents were preventable; mostly self-

inflicted

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Public

Cloud

Kube

rnetes

2013

2016

2017

Ticketmaster - CoreOS Tectonic Summit 2016

PUBLIC CLOUD

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Vehicle for deep introspection of every product

Immediate access to infrastructure as APIs

Forcing function to modernize all products to cloud native standard (all the *.-ilities)

Public Cloud = Huge carbon filter

Ticketmaster - CoreOS Tectonic Summit 2016

CLOUD ENABLEMENT TEAM ● Small team of experts dedicated to developing:

▪ Future state architecture▪ Path to Public Cloud▪ Cloud Native Solution Patterns ▪ Cure us of our on-prem addiction (NFS, Always scaled, HW reliance, SW trees,

etc)● Provide Self-Service tooling and documentation for those solutions ● Enable teams to:

▪ Raise their tech maturity▪ Containerize and retool their app ▪ Migrate themselves to the cloud

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Ticketmaster - CoreOS Tectonic Summit 2016

CLOUD ENABLEMENT METHOD7 “Simple” Steps:1. Containerize your app; use CoreOS2. Terraform your infrastructure3. Instrument everything, rich telemetry - no SSH or RDP! 4. Use synthetic monitoring to understand the health of your product5. Security, security, security6. Design shared-nothing architecture (no NFS)7. Build for availability - no single points of failure

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Ticketmaster - CoreOS Tectonic Summit 2016

READY TO ROLL● Highly skilled team ● Modern new stack architecture● Comprehensive DIY toolkit/software● 1,000+ pages of detailed documentation and solution patterns

Self-d

isrupti

on

Lean

Transf

ormati

on

Auton

omou

s

Delivery

Team

s

Publi

c Clou

d

Kube

rnetes

2013

2016

2017

Ticketmaster - CoreOS Tectonic Summit 2016

Everybody has a plan until they get punched in the

face.- Mike Tyson

Ticketmaster - CoreOS Tectonic Summit 2016

LEARNINGS

Public Cloud

$$

$

$

$$

$

$

$

$

$$$

$

$$$

$

$$

$$

$

$

Learn the API's/Primitives, Learn to build Infrastructure,Learn to code it in Terraform

Programmatic Checkout Page

65,000 permutations on how to

use AWS service

offerings =

64,999 ways to get

it wrong

Rich set of Primitives and API's

100's of Devs, different tech stacks

Ticketmaster - CoreOS Tectonic Summit 2016

LEARNINGS SUMMARY● Huge learning curve● Hard to manage distributed systems at scale● Wrong people to build & optimize infrastructure (across 100+

teams)● Baking purchasing decisions into distributed terraform code is BAD

...Spending too much time writing software to deploy software

instead of writing software to make money

Ticketmaster - CoreOS Tectonic Summit 2016

SOLUTION: CONTAINER ORCHESTRATION● Abstract complexities of infrastructure from development teams,

including how to:▪ Design▪ Deploy▪ Purchase▪ Optimize

● Allows us to easily manage distributed systems at scale

Ticketmaster - CoreOS Tectonic Summit 2016

WE CHOSE KUBERNETES● Kubernetes started organically appearing all over our company● Ahead of other container management platform and rapidly

improving● Amazing community with hockey-stick velocity ● Kubernetes APIs and primitives are sweet!

▪ Iteration time is seconds VS minutes▪ Automated rollbacks▪ Scaling and self-healing are much faster than ASG’s

● Kubernetes gets us much better utilization of our EC2 instances● Successfully used it to solve a major stability issues

Ticketmaster - CoreOS Tectonic Summit 2016

OPENTSDB ON KUBERNETES● Critical system for application monitoring

▪ 500k metrics per second● Large queries during ticketing sales were DDOS’ing OpenTSDB

services● Kubernetes pod health checks detect this and restart the failed

containers● Kubernetes primitives took a service that required hand holding to

something that manages itself● Learning Moment! A reboot from an automated OS upgrade

required manual intervention

Ticketmaster - CoreOS Tectonic Summit 2016

SIMPLIFICATION WITH KUBERNETES

Public Cloud

$$

$

$

$$

$$

$

$

$$$$

$$$

$

$$

$$

$

$

Public CloudKubernetes cluster optimized by Cluster Ops team

Kubernetes APIs / abstraction

Homogenized deployments via Kubernetes

$ $$ $

$

Ticketmaster - CoreOS Tectonic Summit 2016

KUBERNETES PROJECTGOAL: Deploy a Ticketmaster product into a production-grade Kubernetes cluster and equip team with the skills required to support its operation.

● Fully-remote team of 6● Tons of work!

▪ How many clusters to build?▪ Which architecture is right for us?▪ How should we deploy and test the cluster?▪ Which networking option to use inside of AWS?

Ticketmaster - CoreOS Tectonic Summit 2016

QUESTIONS● Kubernetes @scale best practices and pitfalls

▪ Kubernetes @Ticketmaster Roadmap: − Documented Reference Architecture specific to

Ticketmaster based on all the below that includes answers to any below questions. We need a documented roadmap for the team to start building based on Apprenda Experience/Reference architecture.

▪ Guidance on what goes in K8S and what should not (if anything)

▪ What have we missed? What didn’t we ask?▪ Best practices around secrets; how do companies manage this

at scale? Risks, alternatives, etc.?▪ Kubernetes upgrades, possible w/o downtime?▪ Insight on cloud primitives that are not K8S managed

(Lambda, S3, SQS, KMS, RDS, etc….). What are other companies doing here? Are some of these on the K8S roadmap to orchestrate? Are these resources managed by “clusterops”, or do delivery teams self-build outside the k8s workflow? This is called the K8s service catalog

▪ What do they recommend for configuring containers within kubernetes

▪ How do they recommend granting iam roles to containers● Kubernetes cross-domain (AWS/onprem/other cloud) insight

▪ Good idea? Possible pitfalls?▪ How to front end AWS and Onprem so we can dynamically run

HOT aws expensive stuff on onprem behind the scenes▪ Cross AWS region?▪ If we run Kubernetes in Equinix, how do they recommend

logging into ECR with Kubernetes● Cluster Networking

▪ What do they recommend for loadbalancer services in aws▪ Overlay networking▪ Software defined firewall▪ Best ecosystem components (calico vs x, etc)

● Team / Operations▪ How do engineering teams interactive with the cluster, kubectl

on their laptops? Probably not▪ How long do they see it taking to build enough knowledge for

production support of k8s▪ Insight on other companies K8S support models (what does

ops do, what does devops do, what are the governance models)

▪ Understanding of Implications on chargeback in AWS. How much effort goes into tagging and reporting on ephemeral resources (containers) that move around on AWS primitives (EC2 instances)

● CET (cloud enablement team)▪ How to marry it into our CET strategy, specifically Terraform▪ Help on rollout strategy. Start working in context with early

adopter enthusiastic teams asap OR wait until we have it more ‘operationally mature’. Both tactics have merit, help us think through the strategy here.

● Persistent storage, period. ▪ Torus, Ceph, EFS, NFS, Gluster, portworx ; pros / cons▪ Databases (large/shared) on k8s?▪ Other persistent workloads: elastic, cache, message bus, etc..

● Ongoing Apprenda Engagement▪ Information regarding their consulting offerings/ prices/

models of engagement. On prem team? Support team? Customized kubernetes solutions and maintenance.

▪ Connect us to peer group in Kubernetes space● Should we just leverage Tectonic? ● Archtics (massive legacy windows/powerbuilder/sybase/rdp over internet

to sports teams) Help● Prometheus help

overlay networking?

Calico?

Flannel?

VPC networking? Canal? cluster ops

team?

Linkerd?

auth?

how many etcd

nodes?

Terraform vs Kube

API?

Prometheus?

24/7 support

?

Ticketmaster - CoreOS Tectonic Summit 2016

COMMUNITY ENGAGEMENT● Spent time with CoreOS, Kelsey

Hightower, Apprenda● Attended conferences● Hosted Meetups● Joined SIGs● Joined

Ticketmaster - CoreOS Tectonic Summit 2016

MILESTONESSimple Kubernetes

cluster

Operationalize Kubernetes

Enterprise Ready / HA Kubernetes Cluster

Address consumability by appsOn-call production support

First customers golive on Kubernetes

Expand!

1

2

3

45

6

*

Ticketmaster - CoreOS Tectonic Summit 2016

WORK BEGINS...BUT● Continued to identify new questions● Had not figured out operational support● Needed enterprise-level features (auth)● Needed answers based on experience; not theory● Needed to accelerate implementation

Ticketmaster - CoreOS Tectonic Summit 2016

STRATEGIC PARTNERSHIP

Ticketmaster - CoreOS Tectonic Summit 2016

MILESTONES✔ Simple Kubernetes

cluster

Operationalize Kubernetes

Enterprise Ready / HA Kubernetes Cluster

Address consumability by appsOn-call production support

First customers golive on Kubernetes

Expand!

1

2

3

45

6

*

Ticketmaster - CoreOS Tectonic Summit 2016

WHY TECTONIC● Vanilla upstream Kubernetes - No lock in● Immediate enterprise level confidence● Supported reference architecture (instead of DIY)● Recommendations on operational practices, service provider

integration, third party add-ons, etc. ● Production Go-Live Support● Automatic OS Updates! *Bummer, no more fun upgrade projects!

Ticketmaster - CoreOS Tectonic Summit 2016

COREOS PARTNERSHIP● Providing input on Tectonic roadmap● Influence the roadmap for things that REALLY matter to Enterprises● Jointly solve Enterprise + Web Scale challenges● Help foster the Enterprise Kubernetes community

Ticketmaster - CoreOS Tectonic Summit 2016

NEW TICKETMASTER WEB PLATFORM ON K8SBefore:

● Semi-manual stack creation, bespoke cloudformation + python boto scripts = 20+ mins to deploy

● Low Confidence

Now: ● K8S + Tectonic, fully

automated = 60 second app updates

● High Confidence● Unlocked Daily Delivery

Culture

Ticketmaster - CoreOS Tectonic Summit 2016

LET THE MAKERS MAKE

● We have an amazing company of Makers, Creators, Visionaries

● We must create the space for them to innovate and deliver great solutions to the market

Ticketmaster - CoreOS Tectonic Summit 2016

RECAP● Use Kubernetes to abstract infrastructure

complexities● Have a cluster ops team do the

optimization voodoo; not everyone else● Stop wasting effort writing software to

deploy software ● Let the Makers Make! Give time and

mindshare back to your most valuable asset (your people) to do what they do best: Make Things!

Ticketmaster - CoreOS Tectonic Summit 2016

TICKETMASTER KUBERNAUTSStop by and say hi during the break!

&Join us at the Sysdig/CoreOS/Ticketmaster

party tonight!Food, drinks, LIVE BAND!!

Justin Dean Kraig Amador Abe Ingersoll Bindi BelangerJean-François Nadeau

Ticketmaster - CoreOS Tectonic Summit 2016

justin.dean@ticketmaster.com@justinmdean

Recommended