20
Alert workflow in Gaming DevOps Eduardo Saito Director of Engineering - Server Operations GREE International November 2013

Gaming dev ops - Eduardo Saito

Embed Size (px)

Citation preview

Page 1: Gaming dev ops - Eduardo Saito

Alert workflow in Gaming DevOps

Eduardo Saito Director of Engineering - Server Operations GREE International November 2013

Page 2: Gaming dev ops - Eduardo Saito

Traditional Alert workflow

NOC

Ops

Dev

SME (Network, DBA,…)

Page 3: Gaming dev ops - Eduardo Saito

Traditional Alert workflow

NOC

Ops

Dev

SME (Network, DBA,…)

Page 4: Gaming dev ops - Eduardo Saito

Alert workflow – previous

Critical

Page 5: Gaming dev ops - Eduardo Saito

Alert workflow – previous

Ops Dev

Critical

Page 6: Gaming dev ops - Eduardo Saito

Alert workflow – previous

Ops Dev

Critical

Ops: where’s the runbook for this? Ops: app bug or system issue?

Ops: who’s the devel of this game? Phone #?

Ops: I can’t find the developer… who’s his manager?

Critical

Non- Critical

Page 7: Gaming dev ops - Eduardo Saito

Alert workflow 2.0

Ops Dev

Critical

Ops: where’s the runbook for this? Ops: app bug or system issue?

Ops: who’s the devel of this game? Phone #?

Ops: I can’t find the developer… who’s his manager?

Page 8: Gaming dev ops - Eduardo Saito

Alert Workflow 3.0 - current

Ops

Dev, Project X, Server

Page 9: Gaming dev ops - Eduardo Saito

Alert Workflow 3.0 - current

Ops

Dev, Project X, Server

Dev, Project Y, Client, Android Dev, … Each alert go directly to

the right team that can resolve it !

Page 10: Gaming dev ops - Eduardo Saito

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Page 11: Gaming dev ops - Eduardo Saito

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Page 12: Gaming dev ops - Eduardo Saito

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

Page 13: Gaming dev ops - Eduardo Saito

Alerts go to the person that can resolve

Type Scope Checked by Who to page?

ELB Load balancer health-check

ELB No one – email alert only

System-level Check cpu / disk / memory / network

Pingdom / Nagios

Ops team

App-level Application issues / bugs

Pingdom Dev and Ops teams

App-level alerts can be triggered by issues in:

•  Server-side •  Client-side

•  iOS •  Android

Page 14: Gaming dev ops - Eduardo Saito

Dev and Ops are responsible

Team On-call

Ops 8

Dev 32, from 20 games (Server-side or client-side Android or iOS)

Analytics 5

Page 15: Gaming dev ops - Eduardo Saito

Big display dashboard = quick status

Page 16: Gaming dev ops - Eduardo Saito

Big display dashboard = quick status

Page 17: Gaming dev ops - Eduardo Saito

IM Bot = better communication

Skype Bot informs in the

game channel that an alert was

triggered

Page 18: Gaming dev ops - Eduardo Saito

Ops and Dev receive the alert, and

troubleshoot

IM Bot = better communication

Page 19: Gaming dev ops - Eduardo Saito

Skype Bot detects issue is resolved

and send all-clear

IM Bot = better communication

Page 20: Gaming dev ops - Eduardo Saito

Thank You!

[email protected] We’re hiring! Vancouver and San Francisco http://gree-corp.com/jobs