AWS re:Invent 2016| GAM401 | Riot Games: Standardizing Application Deployments Using Amazon ECS and...

Preview:

Citation preview

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Adam Rozumek

November 28, 2016

Riot Games: Standardizing

Application Deployments Using

Amazon ECS and Terraform

GAM401

RIOT GAMES | GAM 401Standardizing Application

Deployments Using Amazon ECS

and Terraform

SYSTEMS ENGINEER

ADAM ROZUMEK

INTRODUCTIONSWHO AM I?

AMAZON ECS. TERRAFORM. VIDEO GAMES.

WHAT TO EXPECT

ECS CONSOLIDATION WINS & LESSONS

Adapting existing deployments & infrastructure

TERRAFORM ENCAPSULATION STRATS

Object Oriented Development Operations

MULTI-GAME MODULAR SERVICE DESIGN

A different kind of scaling

1

2

3

AMAZON ECS. TERRAFORM. VIDEO GAMES.

WHAT TO EXPECT

ECS CONSOLIDATION WINS & LESSONS

Adapting existing deployments & infrastructure

TERRAFORM ENCAPSULATION STRATS

Object Oriented Development Operations

MULTI-GAME MODULAR SERVICE DESIGN

A different kind of scaling

1

2

3

AMAZON ECS. TERRAFORM. VIDEO GAMES.

WHAT TO EXPECT

ECS CONSOLIDATION WINS & LESSONS

Adapting existing deployments & infrastructure

TERRAFORM ENCAPSULATION STRATS

Object Oriented Development Operations

MULTI-GAME MODULAR SERVICE DESIGN

A different kind of scaling

1

2

3

AMAZON ECS. TERRAFORM. VIDEO GAMES.

WHAT TO EXPECT

ECS CONSOLIDATION WINS & LESSONS

Adapting existing deployments & infrastructure

TERRAFORM ENCAPSULATION STRATS

Object Oriented Development Operations

MULTI-GAME MODULAR SERVICE DESIGN

A different kind of scaling

1

2

3

7.5MILLION

PEAK CONCURRENT

PLAYERS

100MILLION

MONTHLY ACTIVE

PLAYERS

MORE THAN

27MILLION

DAILY ACTIVE

PLAYERS

MORE THAN MORE THAN

2016 LEAGUE OF LEGENDS STATS

DATA PRODUCTS & SERVICESOUR MISSION

Empower teams at Riot to make timely, data-informed products by maintaining a

scalable and reliable data platform

AWS re:Invent 2015 | (GAM303) Riot Games: Migrating Mountains of Big Data to AWS

Sean Maloney

PROBLEM

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Total ownership

We want to empower developers to:

• Provision their own infrastructure

• Execute their own deployments

• Monitor their own metrics

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Resource attribution

• Who owns these EBS volumes?

• What applications depend on these security groups?

• Can these AMIs be deleted?

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Security

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Security

• Auditing is important, but it’s reactive

• Operational time sink

Security Monkey AWS Trusted Advisor

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Enforcing conventions

• Common syntax for tags

• Simplify reservations

• Standardize networks

AWS re:Invent 2014 | (GAM304) How Riot Games re:Invented Their AWS Model

Jonathan McCaffrey

Marty Chong

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Enforcing conventions

• Common syntax for tags

• Simplify reservations

• Standardize networks

AWS re:Invent 2014 | (GAM304) How Riot Games re:Invented Their AWS Model

Jonathan McCaffrey

Marty Chong

PROBLEMSCALING TOTAL OWNERSHIP ON AWS

Amazon EC2

Container Service Terraform

CONTAINERS

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

• Dockerfiles capture application dependencies

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

• Dockerfiles capture application dependencies

• Common use cases have great community support

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

• Dockerfiles capture application dependencies

• Common use cases have great community support

• Profit from our own engineering community

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

Embrace the abstraction

• Plan for failure at all levels

• Avoid manual intervention whenever possible

• It’s ephemeral all the way down

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

Scheduling is hard

We need to:

• Quickly and “fairly” run tasks

• Prevent resource conflicts

• Provide reasonable fault tolerance

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

We need AWS hardware

• Enable total ownership of the AWS resources backing our containers

• Avoid the security, resource attribution, and convention degradation pitfalls

HISTORY

E

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

COS

First iteration (2013)

• Traditional ASGs

• AMIs configured to launch a single Docker container

• Managed by Netflix’s Asgard

E

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

COS

First iteration (2013)

• Easy to adopt

• Familiar AWS concepts

E

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

COS

First iteration (2013)

• No increased resource utilization

• No elasticity improvements

CONTAINERS STANDARDIZED APPLICATION UNITS… ON AWS

Second iteration (2013)

• Third party container orchestration

• Production scale had unexpected challenges

CONTAINERS STANDARDIZED APPLICATION UNITS… ON ECS!

Third iteration: Amazon EC2 Container Service

• ECS AMI provides all necessary software

• Designed with AWS integration in mind

• Free!

AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE

Nov 2014

ECS ANNOUNCEDRe:Invent 2014

AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE

Nov 2014 April 2015 Dec 2015 August 2016

ECS ANNOUNCEDRe:Invent 2014

ECS GENERALLY

AVAILABLE

ECR & NEW REGIONSECS becomes available in the final

missing region we need (Frankfurt)

for our globally deployed

applications

AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE

Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016

ECS ANNOUNCEDre:Invent 2014

ECS GENERALLY

AVAILABLE

ECR & NEW REGIONSECS becomes available in the final

missing region we need (Frankfurt)

for our globally deployed

applications

SERVICE

SCALINGAutomatic task count

scaling based on

CloudWatch metrics

introduced

AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE

Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016

ECS ANNOUNCEDRe:Invent 2014

ECS GENERALLY

AVAILABLE

ECR & NEW REGIONSECS becomes available in the final

missing region we need (Frankfurt)

for our globally deployed

applications

SERVICE

SCALINGAutomatic task count

scaling based on

CloudWatch metrics

introduced

TASK SPECIFIC

IAM ROLESUnlocked a lot of cluster

sharing potential

AMAZON EC2 CONTAINER SERVICEKEY FEATURE TIMELINE

Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016

ECS ANNOUNCEDre:Invent 2014

ECS GENERALLY

AVAILABLE

ECR & NEW REGIONSECS becomes available in the final

missing region we need (Frankfurt)

for our globally deployed

applications

SERVICE

SCALINGAutomatic task count

scaling based on

CloudWatch metrics

introduced

TASK SPECIFIC

IAM ROLESUnlocked a lot of cluster

sharing potential

APPLICATION

LOAD BALANCERSSeveral key improvements

for ECS

BEYOND ECSINFRASTRUCTURE AS CODE

• At scale, orchestrating infrastructure in a consistent, reproducible way is key

BEYOND ECSINFRASTRUCTURE AS CODE

• At scale, orchestrating infrastructure in a consistent, reproducible way is key

Total ownership

TERRAFORM

TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE

• Intelligent & flexible flow control

TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE

• Intelligent & flexible flow control

• Configuration drift managed

TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE

• Intelligent & flexible flow control

• Configuration drift managed

• AWS resources documented

TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE

• Intelligent & flexible flow control

• Configuration drift managed

• AWS resources documented

• Share common use cases

VPC NATGateway

Route 53Hosted Zone

RouteTables

VPNGateway

VPC InternetGateway

Application Subnets Tools

Instances

Instances

Instances

Availability

Zone C

Availability

Zone B

Availability

Zone A

TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE

TERRAFORMINFRASTRUCTURE AS OBJECT-ORIENTED CODE

Simplified infrastructure definitions

CASE STUDIES

ECS CLUSTERTERRAFORM BUILDING BLOCKS

VPC NATGateway

Route 53Hosted Zone

RouteTables

VPNGateway

VPC InternetGateway

Application Subnets Tools

Instances

Instances

Instances

Availability

Zone C

Availability

Zone B

Availability

Zone A

ECS Cluster

Auto Scaling Group

Security Group

Security Group

Security Group

Instance Instance Instance Instance Instance Instance

Launch Configuration User DataIAM Role CloudWatch

Alarms

ECS CLUSTERTERRAFORM BUILDING BLOCKS

Module 1

Module 2

Module 3

ECS CLUSTERTERRAFORM BUILDING BLOCKS

MICROSERVICESWITHOUT SERVICE ENDPOINTS

ECS Cluster

Autoscaling Group

Security Group

Security Group

Security Group

Instance Instance Instance Instance Instance Instance

Launch Configuration User DataIAM Role CloudWatch

Alarm

ECS Service Task Definition

CloudWatch Alarms IAM Role

MICROSERVICESWITH SERVICE ENDPOINTS

ECS Cluster

ECS Service Task Definition

CloudWatch Alarms IAM Role

Application Load Balancer

Monitoring

CloudWatch Alarms SNS Topics

Security Group

Security Group

Listeners

Target GroupsRoute 53

PERSISTENT DATALOSE ECS HOSTS WITHOUT LOSING DATA

ECS Cluster

ECS Service

Application Load Balancer

Monitoring

CloudWatch Alarms SNS Topics

Security Group

Security Group

Listeners

Target GroupsRoute 53

Attachment Group

EBS EBS EBS

Elastic Network Interface

Elastic IP

LESSONS LEARNED

LESSONSWHAT WORKED FOR US

• Socialize your templates & take advantage of the community!

• Terraform community modules

• Terraform AWS blog

LESSONSWHAT WORKED FOR US

• Break apart your stacks

ECS Cluster

LESSONSWHAT WORKED FOR US

• Break apart your stacks, but don’t overdo it

ECS Service

ALB

SNS Topics

Route 53

IAM Role

Task Definition

CloudWatch Alarms

LESSONSWHAT WORKED FOR US

• Break apart your stacks, but don’t overdo it

• Use modules to reduce code duplication

CloudWatch Alarms SNS Topics

ECS Cluster ALB

LESSONSWHAT WORKED FOR US

• Use remote state files

LESSONSWHAT WORKED FOR US

• Be liberal with your cluster provisioning

• Don’t risk resource contention in production

• With good orchestration, additional clusters != additional operational overhead

LESSONSWHAT WORKED FOR US

• Tag everything all of the time

• Keep your tags organized in your Terraform templates

• Have top level variables that get applied to every resource

• Create a tag for every dimension that is useful to your business

LESSONSWHAT WORKED FOR US

• Centralize your logs

or

Amazon

CloudWatch

Logs

LESSONSWHAT WORKED FOR US

• Stay up to date with release blogs and application updates

• ECS updates on the AWS blog

• Terraform’s open source GitHub repository changelog

LESSONSWHAT WORKED FOR US

• Capture your AMI generation process

LESSONSWHAT WORKED FOR US

• Profile your memory requirements, monitor for scheduling issues

LESSONSWHAT WORKED FOR US

• Take advantage of flow control in Terraform

• Use the Terraform graph tool to visualize your dependencies

RIOT ENGINEERING BLOGTHESE PROBLEMS AND MORE

https://engineering.riotgames.com/

Riot engineering

How we use data

http://na.leagueoflegends.com/en/tag/insights

JOIN US!

ENJOY NOMS.

DRINK DRINKS.

PLAY GAMES.

Tuesday

November 29th, 2016

6:30 – 10:30PM

Paiza Lounge

The Venetian

36th Floor

Thank you!

Remember to complete

your evaluations!

Recommended