58
Getting real with erlang From the idea to a live system Knut Nesheim @knutin Paolo Negri @hungryblank Thursday, November 3, 2011

Getting real with Erlang: From the idea to a live system

  • Upload
    wooga

  • View
    8.612

  • Download
    1

Embed Size (px)

DESCRIPTION

From 0 to 1,000,000 daily users with Erlang This talk wants to sum up the experience about building and maintaining live our first social game server completely written in erlang. The game has been now published and it’s time to think about the results achieved. In this talk we’ll describe details of our architecture, principles that drove its design and considerations about hosting the infrastructure. We’ll go in detail through what worked well and what needs to be improved, all backed by data coming from our live system. Come and follow us if you want to get the taste of real world erlang.

Citation preview

Page 1: Getting real with Erlang: From the idea to a live system

Getting real with erlangFrom the idea to a live system

Knut Nesheim @knutinPaolo Negri @hungryblank

Thursday, November 3, 2011

Page 2: Getting real with Erlang: From the idea to a live system

Social GamesFlash client (game) HTTP API

http://www.flickr.com/photos/theplanetdotcom/4879421344Thursday, November 3, 2011

Page 3: Getting real with Erlang: From the idea to a live system

Social GamesHTTP API

• @ 1 000 000 daily users

• 5000 HTTP reqs/sec

• more than 90% writes

Thursday, November 3, 2011

Page 4: Getting real with Erlang: From the idea to a live system

November 2010

• At the erlang user group!

• Learn where/how erlang is used

• 0 lines of erlang code across all @wooga code base

Thursday, November 3, 2011

Page 5: Getting real with Erlang: From the idea to a live system

Why looking into erlang?

HTTP API

• @ 1 000 000 daily users

• 5000 HTTP reqs/sec

• around 60000 queries/sec

Thursday, November 3, 2011

Page 6: Getting real with Erlang: From the idea to a live system

60000 qps

• Most maintenance effort in databases

• mix of SQL / NoSQL

• Already using in RAM data stores (REDIS)

• RAM DB are fast but expensive / high maintenance

Thursday, November 3, 2011

Page 7: Getting real with Erlang: From the idea to a live system

Social games data

• User data self contained

• Strong hot/cold pattern - gaming session

• Heavy write load (caching is ineffective)

Thursday, November 3, 2011

Page 8: Getting real with Erlang: From the idea to a live system

User session

1. start session

2. game actions

3. end session

User data

1. load all

2. read/update many times

3. data becomes cold

Thursday, November 3, 2011

Page 9: Getting real with Erlang: From the idea to a live system

User session

1. start session

2. game actions

3. end session

Erlang process

1. start (load state)

2. responds to messages (use state)

3. stop (save state)

Thursday, November 3, 2011

Page 10: Getting real with Erlang: From the idea to a live system

User session DB usage

Stateless server Stateful server(Erlang )

start session load user state load user state

game actions many queries

end session save user state

Thursday, November 3, 2011

Page 11: Getting real with Erlang: From the idea to a live system

Erlang process

• Follows session lifecycle

• Contains and defends state (data)

• Acts as a lock/serializer (one message at a time)

Thursday, November 3, 2011

Page 12: Getting real with Erlang: From the idea to a live system

December 2010

• Prototype

user1 user2 user3

user4 user5 user6

user7 user8 userN

erlangprocess = user session

Thursday, November 3, 2011

Page 13: Getting real with Erlang: From the idea to a live system

January 2011

• erlang team goes from 1 to 2 developers

• Distribution/clustering

• Error handling

• Deployments

• Operations

Open topics

Thursday, November 3, 2011

Page 14: Getting real with Erlang: From the idea to a live system

Architecture

Thursday, November 3, 2011

Page 15: Getting real with Erlang: From the idea to a live system

Architecture  goals

• Move  data  into  the  applica4on  server

• Be  as  simple  as  possible

• Graceful  degrada4on  when  DBs  go  down• Easy  to  inspect  and  repair  state  of  cluster• Easy  to  add  more  machines  for  scaling  out

15

Thursday, November 3, 2011

Page 16: Getting real with Erlang: From the idea to a live system

16

SessionSessionSession

Thursday, November 3, 2011

Page 17: Getting real with Erlang: From the idea to a live system

17

Worker

SessionSessionSession

Thursday, November 3, 2011

Page 18: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

18

Worker

SessionSessionSession

Thursday, November 3, 2011

Page 19: Getting real with Erlang: From the idea to a live system

Coordinator

Worker

SessionSessionSession

Worker

SessionSessionSession

19

Worker

SessionSessionSession

Coordinator

Thursday, November 3, 2011

Page 20: Getting real with Erlang: From the idea to a live system

Coordinator

Worker

SessionSessionSession

Worker

SessionSessionSession

20

Worker

SessionSessionSession

LockCoordinator

Thursday, November 3, 2011

Page 21: Getting real with Erlang: From the idea to a live system

Coordinator

Worker

SessionSessionSession

Worker

SessionSessionSession

21

Worker

SessionSessionSession

Coordinator

DBs

Lock

Thursday, November 3, 2011

Page 22: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

22

Worker

SessionSessionSession

Coordinator

DBs

New  user  comes  online

Lock

Thursday, November 3, 2011

Page 23: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

22

Worker

SessionSessionSession

Coordinator

DBs

New  user  comes  online

Flash  calls  ”setup”

Lock

Thursday, November 3, 2011

Page 24: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

22

Worker

SessionSessionSession

Coordinator

DBs

New  user  comes  online

Flash  calls  ”setup”

session:start(Uid)on  suitable  worker Lock

Thursday, November 3, 2011

Page 25: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

22

Worker

SessionSessionSession

Coordinator

DBs

New  user  comes  online

Flash  calls  ”setup”

session:start(Uid)on  suitable  worker

s3:get(Uid) Lock

Thursday, November 3, 2011

Page 26: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

22

Worker

SessionSessionSession

Coordinator

DBs

New  user  comes  online

Flash  calls  ”setup”

session:start(Uid)on  suitable  worker

lock:acquire(Uid)test-‐and-‐set

s3:get(Uid) Lock

Thursday, November 3, 2011

Page 27: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

22

Worker

SessionSessionSession

Coordinator

DBs

New  user  comes  online

Flash  calls  ”setup”

session:start(Uid)on  suitable  worker

lock:acquire(Uid)test-‐and-‐set

s3:get(Uid)

Game  acFons  from  Flash

Lock

Thursday, November 3, 2011

Page 28: Getting real with Erlang: From the idea to a live system

Worker

SessionSessionSession

Worker

SessionSessionSession

23

Worker

SessionSessionSession

Coordinator

DBs

User  goes  offline  (session  Fmes  out)

lock:release/1

s3:put/1

gen_server  Fmeout

Lock

Thursday, November 3, 2011

Page 29: Getting real with Erlang: From the idea to a live system

Implementa4on  of  game  logic

Thursday, November 3, 2011

Page 30: Getting real with Erlang: From the idea to a live system

Dream  game  logic

25

+

Fast Safe

Thursday, November 3, 2011

Page 31: Getting real with Erlang: From the idea to a live system

Dream  game  logic

26

Thursday, November 3, 2011

Page 32: Getting real with Erlang: From the idea to a live system

Dream  game  logic

• We  want  high  throughput  (for  scaling)–Try  to  spend  as  liPle  CPU  Fme  as  possible

–Avoid  heavy  computaFon

–Try  to  avoid  creaFng  garbage

• ..and  simple  and  testable  logic  (correctness)–FuncFonal  game  logic  makes  thinking  about  code  easy

–Single  entry  point,  gets  ”request”  and  game  state

–Code  for  happy  case,  roll  back  on  game-‐related  excepFon

27

Thursday, November 3, 2011

Page 33: Getting real with Erlang: From the idea to a live system

How  we  avoid  using  CPU

• Remove  need  for  DB  serializaFon  by  storing  data  in  process

• Game  is  designed  to  avoid  heavy  liXing  in  the  backend,  very  simple  game  logic

• OpFmize  hot  parts  on  the  criFcal  path,  like  collision  detecFon,  regrowing  of  forest

• Generate  erlang  modules  for  read-‐heavy  configuraFon  (~1  billion  reads/1  write  per  week)

• Use  NIFs  for  parsing  JSON  (jiffy)

28

Thursday, November 3, 2011

Page 34: Getting real with Erlang: From the idea to a live system

How  to  find  where  CPU  is  used

• Profile  (eprof,  fprof,  kprof[1])• Measure  garbage  collecFon  (process_info/1,  gcprof[2])

• Conduct  experiment:  change  code,  measure,  repeat

• Is  the  increased  performance  worth  the  increase  in  complexity?

• SomeFmes  a  radically  different  approach  is  needed..

[1]:  github.com/knuFn/kprof

[2]:  github.com/knuFn/gcprof

29

Thursday, November 3, 2011

Page 35: Getting real with Erlang: From the idea to a live system

Opera4onsAugust  2011  -‐  ...

Thursday, November 3, 2011

Page 36: Getting real with Erlang: From the idea to a live system

Opera4ons

• At  wooga,  developers  also  operate  the  game

• Most  developers  are  ex-‐sysadmins

• Simple  tools:–remsh  for  deployments,  maintenance,  debugging

–automaFon  with  chef

–syslog,  tail,  cut,  awk,  grep–verbose  crash  logs  (SASL)–alarms  only  when  something  really  bad  happens

31

Thursday, November 3, 2011

Page 37: Getting real with Erlang: From the idea to a live system

Deployments

• Goal:  upgrade  without  annoying  users• SoT  purge• Set  system  quiet  (webserver  &  coordinator)

• Reload• Open  the  flood  gates• Migrate  process  memory  state  on-‐demand

• Total  4me  not  answering  game  requests:  <  1s

32

Thursday, November 3, 2011

Page 38: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

• Event  logging  to  syslog–Session  start,  session  end  (process  memory,  gc,  game  stats)

–Game-‐related  excepFons

• Latency  measurement  within  the  app

• Use  munin  to  pull  overall  server  stats–CPU  and  memory  usage  by  beam.smp

–Memory  used  by  processes,  ets  tables,  etc

–Throughput  of  app,  dbs–Throughput  of  coordinator,  workers,  lock

33

Thursday, November 3, 2011

Page 39: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

34

Game  error:  The  game  acFon  is  not  allowed  with  the  current  server  state  and  configuraFon

Thursday, November 3, 2011

Page 40: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

35

Half-‐word  emulator

Thursday, November 3, 2011

Page 41: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

36

Thursday, November 3, 2011

Page 42: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

37

Thursday, November 3, 2011

Page 43: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

38

Thursday, November 3, 2011

Page 44: Getting real with Erlang: From the idea to a live system

How  we  know  what’s  going  on

39

Thursday, November 3, 2011

Page 45: Getting real with Erlang: From the idea to a live system

Conclusions

Thursday, November 3, 2011

Page 46: Getting real with Erlang: From the idea to a live system

Conclusions  -‐  database

41

0

7500

15000

22500

30000

queries/sec

Ruby Stateless Erlang Stateful

700

Thursday, November 3, 2011

Page 47: Getting real with Erlang: From the idea to a live system

ConclusionsDb  maintenace

AWS  S3  as  a  main  datastore

one  document/user

0  maintenance/setup  cost

Redis  for  the  derived  data  (leaderboard  etc.)

load  very  far  from  Redis  max  capacity

42

Thursday, November 3, 2011

Page 48: Getting real with Erlang: From the idea to a live system

Conclusionsdata  locality

• average  game  call  <  1ms

• no  db  roundtrip  at  every  request• no  need  for  low  latency  network• efficient  setup  for  cloud  environment

43

Thursday, November 3, 2011

Page 49: Getting real with Erlang: From the idea to a live system

Conclusionsdata  locality

• finally  CPU  bound• no  CPU  4me  for  serializing/deserializing  data  from  db

• CPU  only  busy  transforming  data  (minimum  possible  ac4vity)

44

Thursday, November 3, 2011

Page 50: Getting real with Erlang: From the idea to a live system

ConclusionsCPU  usage

• 300K  daily  users• 1000  hcp  req/sec  (game  ac4ons)

• 4  m1.large  AWS  instances  (dual  core  8GB  RAM)

• 2  instances  (coordinators)  5%  CPU  load• 2  instances  (workers)  20%  CPU  load

45

Thursday, November 3, 2011

Page 51: Getting real with Erlang: From the idea to a live system

Conclusionsextra  benefits

en4re  user  state  in  one  process

+

immutability

=

Transac4onal  behavior

46

Thursday, November 3, 2011

Page 52: Getting real with Erlang: From the idea to a live system

Conclusionsextra  benefits

47

One  user  session  -‐>  one  erlang  process

The  erlang  VM  is  aware  of  processes

=>

the  erlang  VM  is  aware  of  user  sessions

Thursday, November 3, 2011

Page 53: Getting real with Erlang: From the idea to a live system

Conclusions

48

Thanks  to  VM  process  introspec4on

process  reduc4ons  -‐>  cost  of  a  game  ac4on

process  used  memory  -‐>  memory  used  by  session

We  gained  a  lot  of  knowledge  about  a  fundamental  ”business”  en4ty

Thursday, November 3, 2011

Page 54: Getting real with Erlang: From the idea to a live system

Conclusions

49

• a  radical  change  was  made  possible  by  a  radically  different  tool  (erlang)

Thursday, November 3, 2011

Page 55: Getting real with Erlang: From the idea to a live system

Conclusions

49

• a  radical  change  was  made  possible  by  a  radically  different  tool  (erlang)

• erlang  can  be  good  for  data  intensive/high  throughput  applica4ons

Thursday, November 3, 2011

Page 56: Getting real with Erlang: From the idea to a live system

Conclusions

49

• a  radical  change  was  made  possible  by  a  radically  different  tool  (erlang)

• erlang  can  be  good  for  data  intensive/high  throughput  applica4ons

• stateful  is  not  necessarily  hard/dangerous/unmaintainable

Thursday, November 3, 2011

Page 57: Getting real with Erlang: From the idea to a live system

Conclusions

50

• november  2010:  0  lines  of  erlang  @wooga

• november  2011:  1  erlang  game  server  live

...with  more  erlang  coming,  join  us

Thursday, November 3, 2011

Page 58: Getting real with Erlang: From the idea to a live system

Q&A

51

Knut Nesheim @knutinPaolo Negri @hungryblank

hPp://wooga.com/jobs

Thursday, November 3, 2011