15
10/9/13 1 Process Migration – A Review of Dynamic Load Balancing Algorithms (Part 1) Hessah Alsaaran Presentation #2 CS 655 – Advanced Topics in Distributed Systems Computer Science Department Colorado State University. Part 1 Outline Introduction Dynamic LoadBalancing Systems: 1. MOSIX 2. Sprite 3. Mach 4. LSF 5. RoCLB Discussion and Comparison 2 Part 2 Outline: Migration in cloud settings. Live migration. Any suggested topic. Introduction 3 Definition “The act of transferring a process between two machines during its execution.” [1] 4 Figure 1 from [1] Goals (Why use Migration?) Dynamic load distribution. Data locality. Failure recovery (with checkpointing). Administrative shutdowns. 5 (8) Forward References (7) Transfer X’s state General Algorithm for Migrating Process X 6 Source Node Destination Node (1) Migration request (2) Negotiation (3) Acceptance (4) Suspend Process X (5) Queue up incoming communication (6) Create new instance of X (9) Transfer queued communication (10) Process X resumes (11) Process X is deleted

10/9/13cs655/lectures/CS655_HessahAlsaaran_Proce… · Dynamic*load*distribution.*! ... 10/9/13 4 ProcessProfiling! Each*process*has*an ... !Full’vs.’Partial’Process’Migration’!

Embed Size (px)

Citation preview

10/9/13  

1  

Process  Migration  –    A  Review  of  Dynamic  Load-­‐Balancing  Algorithms  (Part  1)  

Hessah  Alsaaran  

Presentation  #2  CS  655  –  Advanced  Topics  in  Distributed  Systems  

Computer  Science  Department    Colorado  State  University.  

Part  1  Outline  

Ê  Introduction  

Ê  Dynamic  Load-­‐Balancing  Systems:  1.  MOSIX  

2.  Sprite  

3.  Mach  

4.  LSF  

5.  RoC-­‐LB  

Ê  Discussion  and  Comparison  

2  

Part  2  Outline:  •  Migration  in  cloud  settings.  

•  Live  migration.  •  Any  suggested  topic.  

Introduction  

3  

Definition  

Ê  “The  act  of  transferring  a  process  between  two  machines  during  its  execution.”  [1]  

4  

Figure  1  from  [1]  

Goals  (Why  use  Migration?)  

Ê  Dynamic  load  distribution.  

Ê  Data  locality.  

Ê  Failure  recovery  (with  checkpointing).  

Ê  Administrative  shutdowns.    

5  

(8)  Forward  References  

(7)  Transfer  X’s  state  

General  Algorithm  for  Migrating  Process  X  

6  

Source  Node   Destination  Node  (1)  Migration  request  

(2)  Negotiation  (3)  Acceptance  

(4)  Suspend  Process  X  (5)  Queue  up  incoming  

communication  

(6)  Create  new  instance  of  X  

(9)  Transfer  queued  communication  

(10)  Process  X  resumes  (11)  Process  X  is  deleted  

10/9/13  

2  

Migration  à  Load  Balancing  

Ê  Load  information  Ê  Information  about  the  node  and  its  load.  

Ê  Trade-­‐off  between  size  and  network  overhead.  

Ê  Load  information  dissemination:    Ê  Periodically  or  event  driven.  Ê  Trade-­‐off  between  frequency  and  stability.  

Ê  Load  Management  and  Migration  Decisions  Ê  Centralized  or  distributed.  

7  

Distributed  Scheduling  

Ê  The  migration  decision  is  distributed.  

Ê  Policies:  Ê  Sender-­‐initiated  policy:  the  overloaded  node  requests  the  

migration.  Ê  Suitable  for  lightly  loaded  systems  

Ê  Receiver-­‐initiated  policy:  the  under  loaded  node  requests  the  migration.  Ê  Suitable  for  lightly  loaded  systems  

Ê  Symmetric  policy:  combination  of  both  of  the  above  policies.  Ê  Random  policy:  randomly  choose  a  node.    

Ê  Simple  but  improves  performance.  (It  can  eliminate  the  management  overhead,  but  it  can  miss  the  critical  nodes  that  are  overloaded  or  under  loaded  à  performance  will  depend  on  the  load  distribution.)  

8  

Transfer  Strategies  

Ê  Eager  (all)  strategy:  Copy  all  address  space  at  migration  time.  

Ê  Eager  (dirty)  strategy:  Transfer  the  dirty  pages  at  migration  time,  then  fault  in  the  remaining  when  needed.  

Ê  Copy-­‐on-­‐reference  strategy:  Transfer  pages  on  demand.  If  it  is  dirty  then  its  from  the  source  node.  Otherwise,  it  can  be  from  the  source  node  or  the  backing  store.  

Ê  Flushing  strategy:  same  as  above  but  the  dirty  pages  are  flushed  to  the  backing  store.  

Ê  Precopy  strategy:  The  process  suspension  at  the  source  node  is  delayed  until  transfer  is  complete.  Dirty  pages  may  get  recopied  multiple  times.  

Ê  Trade-­‐off:  initial  migration  cost,  run-­‐time  cost  ,  home-­‐dependency,  and  reliability.    

9  

Alternatives  

Ê  Remote  Execution  Ê  Remote  invocation  of  existing  code,  or  transfer  code  only.  

Ê  Faster  than  process  migration  (no  state  transfer).  

Ê  But  it  doesn’t  move.  Ê  Once  the  the  process  is  created  at  a  node,  it  stays  there  until  it  

terminates  or  get  killed.  

Ê  A  process  cannot  resume  execution;  it  is  killed  then  restarted  on  the  other  node.  (So  is  it  really  an  alternative?)  

Ê  Can  be  suitable  for  very  short  processes  that  are  not  worth  transferring  their  state.  

10  

Alternatives  

Ê  Process  Cloning  Ê  Remote  forking:  children  inherit  state  from  their  parent.  

Ê  Process  migration  is  resembled  by  terminating  the  parent.  

Ê  It  involves  the  process  knowledge  and  action.  

Ê  No  resumption  here  too.  

11  

Alternatives  

Ê  Application-­‐level  Checkpointing  Ê  The  application  checkpoints  itself.  Ê  The  application  can  run  using  these  checkpoints.  Ê  This  is  one  way  of  doing  process  migration,  not  an  alternative.  

12  

10/9/13  

3  

Alternatives  

Ê  Mobile  agents  Ê  Processes  that  save  their  own  state,  move  to  a  new  node,  and  

resume  execution  (coded).  (self-­‐migrating  processes)  

Ê  They  perform  the  migration  decision.  

Ê  No  transparency.  

13  

(1)  MOSIX  “Multicomputer  OS  for  Unix”  

14  

MOSIX  -­‐  Distributed  OS  

Ê  Main  Feature:  Single  System  Image  

15   Figure  from  [3]  

Other  MOSIX  Main  Features  

Ê  Fully-­‐decentralized  control;  no  global  state  à  scalable.  

Ê  Guest  (migrated)  processes  run  in  sandboxes,  and  have  lower  priorities.    

16  

2  Types  of  Processes  

Ê  Linux  Processes  (not  affected  by  MOSIX)  Ê  Administrative  tasks  and  non-­‐migratable  processes.  

Ê  MOSIX  processes  Ê  Migratable  processes  that  are  created  using  the  “mosrun”  

command.  

Ê  Each  has  a  home-­‐node:  the  node  where  the  process  was  created.  

17  

Automatic  Resource  Discovery  

Ê  Randomized  gossip  algorithm.  Ê  Each  node  periodically  sends  its  information  along  with  

received  information  to  randomly  chosen  nodes.  

Ê  Information:    Ê  Includes:  current  load,  CPU  speed,  and  memory.  

Ê  Load  estimated  by  the  average  length  of  the  ready  queue  over  a  fixed  time  period.  (Is  that  enough?)  

Ê  Information  about  new  nodes  are  gradually  distributed,  while  information  about  disconnected  nodes  age  and  get  quickly  removed.    

Ê  Scalable.  

18  

10/9/13  

4  

Process  Profiling  

Ê  Each  process  has  an  execution  profile:  Ê  Includes:  age,  size,  rates  of  system-­‐calls  ,  I/O,  and  IPC.  

Ê  Used  along  with  load  info.  for  automatic  migration  decisions.  

19  

Dynamic  Load-­‐Balancing  

Ê  Using  process  migration.  Ê  Triggered  either  manually  or  automatically.  

Ê  Migration:  copying  process  memory  image  (compressed  when  sent  over  the  network)  and  setting  its  run-­‐time  environment  (sandbox).  

Ê  MOSIX  uses  the  eager  (dirty)  transfer  strategy:  Ê  At  migration  time:  transfer  only  the  dirty  pages  of  the  process.  

Ê  When  process  resumes  execution:  text  and  clean  pages  are  faulted  in  as  needed.  

20  

Migration  Algorithm  

Ê  The  migration  decision  is  done:  Ê  When  a  node  finds  another  node  with  significantly  reduced  

load.  

Ê  To  migrate  processes  that  request  more  memory    than  available.  

Ê  To  migrate  processes  from  slower  to  faster  nodes.  

Ê  The  selection  for  a  process  to  migrate  preferentially  selects:  Ê  Older  processes.  

Ê  Processes  with  communication  history  with  that  node.  

Ê  Processes  with  forking  history.  

21  

MOSIX’s  Virtualization  Layer  

22  

•  MOSIX  is  a  software  layer.  •  It  intercepts  all  system  calls  

•  If  it  is  a  migrated  process  à  forward  it  to  its  home-­‐node  ,  perform  the  operation  on  behalf  of  the  process,  then  send  results  back.  

Figure  from  [3]  

Results  of  the  Virtualization  Layer  

Ê  Migrated  processes  seem  to  be  running  in  their  home-­‐nodes  unaware  about  the  migration.  

Ê  Users  do  not  need  to  know  where  their  programs  are  running.  

Ê  Users  do  not  not  need  modify  their  applications.  

Ê  Users  do  not  need  to  login  or  copy  files  to  remote  nodes.  

Ê  Users  have  the  impression  they  are  running  on  a  single  machine.  

Ê  A  migrated  process  runs  in  a  sandbox  for  host  protection.  

23  

Drawbacks  of  the  Virtualization  Layer  

Ê  Migrated  processes  management  and  network  overhead.  

24  

Table  1  from  [4]  Experiment  settings:  •  Identical  Xeon  3.06GHz  servers,  with  1Gb/s  Ethernet.  •  4  applications:  

•  RC  (satisfiability):  CPU  intensive.  •  SW  (protein  sequences):  small  amount  of  I/O.  •  JELlium  (molecular  dynamics):  large  amount  of  I/O.  •  BLAT  (bioinformatics):  moderate  amount  of  I/O.  

Table  1  from  [1]  

10/9/13  

5  

Migratable  Sockets  (Direct  Communication  

Without  Migratable  Sockets  

Migratable  Sockets    

Ê  Each  process  has  a  mailbox.  

Ê  A  process  is  identified  by  its  home-­‐node  and  PID  in  their  home-­‐node.  (mapping  ?)  

Ê  Process  location  is  transparent.  

Ê  Network  overhead  is  reduced.  

25  

X   Y  process  

A   B   C   D  

Home-­‐node  

msg  

msg  msg  

msg  msg  

Limitations  

Ê  Suitable  for  applications  that  are  compute  intensive  or  have  low  to  moderate  I/O.  Ê  Otherwise  à  network  and  management  overhead.  

Ê  For  Linux:  MOSIX  supports  non-­‐threaded  Linux  applications.  

Ê  For  private  clusters:  only  trusted  and  authorized  nodes.  

Ê  For  clusters  with  fast  networks.  

Ê  Not  suitable  for  the  node  shutdown  case.  

26  

Issues  Not  Discussed  

Ê  Node  failure:  either  home-­‐node  or  remote-­‐node.  Ê  Checkpointing  is  supported,  but  where  are  the  checkpoints  

(saved  process  image)  stored?  

Ê  Fully-­‐decentralized?  Job  queuing  system  for  flood  control.  Ê  MOSIX  has  a  job  queue  that  gradually  distributes  the  jobs  

according  to  their  priorities,  requirements,  and  currently  available  resources.  

Ê  Where  is  the  job  queue?  

27  

(2)  Sprite  

28  

Sprite  

Ê  Like  MOSIX:  Ê  It  achieves  transparency  through  the  use  of  home-­‐nodes  (single  

image  system).  

Ê  It  uses  the  dirty  transfer  strategy.  

Ê  Unlike  MOSIX:  Ê  Its  goal  is  to  utilize  idle  nodes.  Ê  Uses  a  network  file  system  à  reduces  home-­‐node  service.  

Ê  It  uses  a  centralized  load  information  management  system.  Ê  A  single  process  is  maintaining  the  system’s  state.  

Ê  Scalability  issue  à  bottleneck.    

29  

(3)  Mach  

30  

10/9/13  

6  

Mach  

Ê  Task  migration  is  a  feature  added  to  the  Mach  microkernel.  

Ê  The  process  stays  on  the  source  node.  Ê  Only  the  task’s  state  is  

transferred.  

Ê  Most  system  calls  are  forwarded  to  the  source.  

31  

Figure  7  from  [1]  

Migration  Decision  

Ê  Distributed  file  system.  

Ê  Distributed  scheduling  decisions  of  applications  (profiled):  Ê  Migration  to  a  node  for  data  locality:  

Ê  Significant  amount  of  remote  paging.  

Ê  Significant  amount  of  communication.  

Ê  Migration  to  a  node  for  load  balancing:  Ê  CPU  intensive  applications.  

32  

(4)  LSF  

33  

LSF  (Load  Sharing  Facility)  

Ê  LSF  achieves  load  balancing  through:  Ê  Initial  placements  of  processes  (primary  goal).  Ê  Process  migration  with  checkpointing  (using  Condor’s  

checkpointing).  Ê  Checkpoints,  kills,  then  restart  the  process  on  destination  node.  

Ê  Centralized  load  information  management:  Ê  A  single  master  node.    

Ê  If  it  fails,  another  node  promotes  itself  (how?).  

Ê  Other  nodes  send  their  information  to  the  master  periodically.  Ê  The  master  is  responsible  of  scheduling.  

Ê  Based  on  load  information  and  processes  requirements.  

34  

(5)  Rate  of  Change  Load  Balancing  (RoC-­‐LB)  

Campos  and  Scherson  [2]  

35  

Rate  of  Change  Load  Balancing  (RoC-­‐LB)  

Ê  The  idea:  the  migration  decision  should  depend  on  how  the  load  changes  in  time:  Ê  To  predict  future  overload  or  starvation.  Ê  To  minimize  node  idle  time.  

36  

A   B   A   B   A   B   A   B  

Equalizing  load   Utilize  idle  nodes  

10/9/13  

7  

Migration  Algorithm  (When?)    

Ê  Each  node  periodically  computes  the  change  in  its  load  since  the  last  time  interval  àdifference  in  load  (DL).  

Ê  Assuming  that  the  DL  will  remain  the  same,  each  node  computes  how  many  time  intervals  it  needs  to  become  idle.  

Ê  If  this  time  less  than  the  network  delay  (ND)  à  initiate  migration  request.    

Ê  The  network  delay  (ND)  is  the  time  it  takes  to  receive  a  load  after  a  request  (adaptive).  

37  

Migration  Algorithm  (When?)    First  Exception  

Ê  If  the  node’s  load  <  CT  à  initiate  load  request.  

Ê  Reason:  small  change  in  load  can  cause  immediate  node  starvation.  

38  

Load  Critical  t  (CT)   Low  t  (LT)   High  t  (HT)  

Source  Neutral  Sink  

Migration  Algorithm  (When?)    Second  Exception  

Ê  If  the  node’s  load  >  HT  à  never  initiate  load  request.  Ê  Even  if  it  predicted  that  it  will  become  idle.  

Ê  Reason:  Ê  DL  must  be  high  à  the  value  is  probably  rare  and  short  lived.  

Ê  To  resist  high  spikes  in  load  à  delay  load  requests  until  the  load  falls  below  HT.    

39  

Migration  Algorithm  (Where?)    

Ê  Each  node  maintains  2  tables:  Sink  Table  and  Source  Table.  Ê  For  every  load  request  it  receives,  the  requesting  node  is  

considered  a  sink.  Ê  For  every  load  request  reply  it  receives,  the  replying  node  is  

considered  a  source.  

Ê  When  a  node  initiates  a  load  request:  Ê  It  selects  the  first  source  node  in  the  Source  Table.  

Ê  If  table  is  empty  à  select  a  random  node  (not  in  Sink  Table).  

Ê  Send  the  load  request  to  it.  Ê  It  may  forward  the  request  to  another  selected  source  node.  

40  

Migration  Algorithm  (How  many?)    

Ê  Using  DL,  the  node  predicts  what  its  load  will  be  after  ND  time  length  à  predicted  load  (PL).  

Ê  If  PL  ≥  zero  à  the  node  does  not  predict  idleness  in  the  next  ND  time  period.    

Ê  If  PL  <  zero  à  the  node  requests  abs(PL)  load.  

Ê  The  request  receiver  will  send  load:  Ê  If  its  load  >  HT.  Ê  Amount  =  min  (requested  amount,  a).  

41  

HT  

a  

Node’s  Load  

Migration  Algorithm  (Which?)    

Ê  Old  tasks  (CPU  time)  

Ê  Reason:  they  will  probably  live  longer,  making  the  migration  process  worthwhile.    

42  

10/9/13  

8  

Message  Passing  System  and  Table  Update  Algorithm  

Ê  Load  Request  Sending:  (Requesting  Node)  1.  The  requesting  node  will  create  a  message.  

2.  Pop  a  source  node  from  its  Source  Table.  

3.  Send  the  request  message  to  that  source  node.  

Ê  Receiving  a  Reply:  (Requesting  Node)  Ê  Add  the  replier  to  the  Source  Table.  

43  

Sink    Process  ID  

Number  of  Load  Units  Requested  

Load  State  of  Preceding  Nodes  

Number  of  Forwards  

Load  Request  Message  

44  

Ê  Receiving  Load  Request:  (Receiving  Node)  1.  Add  the  requester  node  to  the  Sink  Table.  2.  If  it  is  a  forwarded  message  à  add  the  preceding  node  to  the  table.  3.  Check  its  load:  

Ê  If  it  can  completely  fulfill  the  request:  Ê  Send  a  reply  to  the  requesting  node  with  the  requested  load.  

Ê  If  it  can  partially  fulfill  the  request:  a.  Send  a  reply  to  the  requesting  node  with  part  of  the  requested  load.  b.  Update  the  message  by:    

Ê  Decreasing  the  amount  of  requested  load.  Ê  Incrementing  the  number  of  forwards.  Ê  Change  the  preceding  node  field  to  itself.  

c.  Check  the  number  of  forwards    Ê  If  it  reached  a  max  value:  

Ê  Discard  message  and  inform  the  requesting  node.  Ê  Otherwise,  forward  the  message  to  a  source  node  from  the  table.  

Ê  If  it  cannot  fulfill  the  request:  Ê  Repeat  steps  b  and  c  (without  changing  the  amount  of  requested  load).    

Comments/Critique  

Ê  If  an  algorithm  is  balancing  the  load  through  “equalization”,  it  covers  the  case  of  not  wanting  idle  nodes.  

Ê  The  requested  load  only  covers  the  next  time  interval,  then  it  will  need  to  request  another  load    Ê  Repeating  process  migration  process  continually  (including  

message  passing).  Ê  Trade-­‐off  between  time  interval  size  and  the  incurred  overhead.  

Ê  The  node  should  not  only  have  “enough”  load,  it  should  have  a  mixture  of  tasks  (e.g.  CPU-­‐bound  and  I/O  bound)  to  effectively  utilize  the  resources.  

45  

Comparisons  &  Discussion  

46  

Comparisons  &  Discussion  (1)  

Ê  Balancing  Goal:  Ê  MOSIX,  Mach,  LSF  à  full  load  balancing  

Ê  RoC-­‐LB,  and  Sprite  à  utilizing  idle  nodes.  

47  

Comparisons  &  Discussion  (2)  

Ê  Full  vs.  Partial  Process  Migration  Ê  MOSIX,  Sprite,  Mach  à  partial  migration  Ê  RoC-­‐LB,  LSF  à  full  migration  

Ê  In  partial  migration:  (Dependency  between  the  source  and  destination  nodes.)    Ê  Increased  transparency.  Ê  Management  and  network  overhead.  (Inefficiency)  Ê  Compromises  fault-­‐tolerance  of  the  system.  (Inefficiency)  

Ê  In  full  migration:  Ê  Node  compatibility  becomes  an  issue.  Ê  More  information  needs  to  be  disseminated.  (Inefficiency)  Ê  Longer  migration  time,  but  less  run-­‐time  overhead.  

48  

10/9/13  

9  

Comparisons  &  Discussion  (3)    

Ê  Distributed  vs.  Centralized  load  balancing  algorithm:  Ê  MOSIX,  Mach,  RoC-­‐LB  à  Distributed.  Ê  Sprite,  LSF  à  Centralized.  

Ê  Distributed  algorithms:  Ê  More  scalable.  Ê  Load  information  storage  issue.  (Inefficiency)  

Ê  Centralized  algorithms:  Ê  Bottleneck.  (Inefficiency)  Ê  Single  point  of  failure.  (Inefficiency)  Ê  Fast,  up-­‐to-­‐date  information  about  all  nodes.  

49  

Inefficiency  à  Load  Representation    

Ê  A  node’s  load  is  estimated  based  on  the  number  of  load  units.  Ê  Examples:  

Ê  MOSIX  estimates  load  by  the  average  length  of  the  ready  queue  over  a  fixed  time  period.    

Ê  RoC-­‐LB  uses  load  change  over  time.  

Ê  Problem:  Ê  Load  units  (processes)  are  not  equal:  

Ê  Different  sizes  (running  times)  àMany  short  tasks  can  be  equivalent  to  a  long  one.  

Ê  CPU-­‐bound  vs.  I/O-­‐bound  processesà  must  have  a  mixture  to  effectively  utilize  the  resources.  

50  

If  I  would  build  a  system,  how  would  I  do  it?  

Ê  (1)  Load  balancing  through  full  process  migration  (no  home-­‐node  dependency)  à  more  failure  tolerant.  Ê  Checkpoints  at  application  level  à  not  transparent  but  produces  

significantly  smaller  checkpoint  files  that  are  also  machine-­‐independent.  

Ê  Replicate  these  checkpoints  at  a  configurable  level  on  different  nodes.  

Ê  Periodically  save  checkpoints  and  update  replicas.  If  a  node  holding  a  replica  does  not  respond,  choose  a  replacement.  

Ê  On  the  other  hand,  if  a  timeout  occurred  at  these  nodes,  one  will  take  over  the  process.  

Ê  When  a  process  has  terminated,  the  node  informs  the  nodes  holding  replicas  to  delete  these  replicas.  

51  

If  I  would  build  a  system,  how  would  I  do  it?  

Ê  (2)  Migration  decision  is  distributed  as  in  MOSIX  à  more  scalable.  Ê  But  change  the  load  representation  to  instead  of  basing  on  

only  the  number  of  units  as  if  they  are  equal:  Ê  Load  units  will  be  weighted  by  their  expected  running  time.  

Ê  However,  this  estimation  can  be  an  issue.  Ê  May  use  history  if  it  has  been  executed  recently  on  the  node.  

Ê  Keeping  a  configurable  ratio  of  CPU-­‐bound  and  I/O-­‐bound  processes.  Ê  This  information  can  be  extracted  from  processes  profiles.  

Ê  Instead  of  sending  load  info.  to  2  random  nodes,  let  the  number  of  random  nodes  increase  with  the  scale  of  the  system.  

52  

If  I  would  build  a  system,  how  would  I  do  it?  

Ê  (3)  Implement  the  system  in  the  application-­‐level  as  in  LSF  à  For  portability  and  simplicity,  although  it  may  degrade  its  performance.  

53  

Thank  You  

54  

10/9/13  

10  

References  

1.  Milojicic,  D.  S.,  Douglis,  F.,  Paindaveine,  Y.,  Wheeler,  R.,  and    Zhou,  S.  “Process  Migration”,  ACM  Computing  Surveys,  Vol.  32,  No.  3,  September  2000,  pp.  241–299.  

2.  Campos,  L.  M.,  and  Scherson,  I.  “Rate  of  Change  Load  Balancing  in  Distributed  and  Parallel  Systems”,  10th  Symposium  on  Parallel  and  Distributed  Processing,  1999.  

3.  Barak,  A.  “Overview  of  MOSIX”,  presentation,  2013,  Computer  Science,  Hebrew  University.  (http://www.mosix.org)  

4.  Barak,  A.,  and  Shiloh,  A.  “The  MOSIX  Cluster  Operating  System  for  High-­‐Performance  Computing  on  Linux  Clusters,  Multi-­‐Clusters  and  Clouds”,  white  paper,  2013.  (http://www.mosix.org)  

55  

10/15/13  

1  

Process  Migration      (Part  2)  

Hessah  Alsaaran  

Presentation  #2  CS  655  –  Advanced  Topics  in  Distributed  Systems  

Computer  Science  Department    Colorado  State  University.  

Outline    

Ê  “On  minimizing  the  resource  consumption  of  cloud  applications  using  process  migrations”  Ê  by  N.  Tziritas,  S.  U.  Khana,  C-­‐Z.  Xua,  T.  Loukopoulos,  and  S.  Lalis  

Ê  Journal  of  Parallel  and  Distributed  Computing  73,  2013.  

Ê  “Rebalancing  in  a  Multi-­‐Cloud  Environment”  Ê  By  D.  Duplyakin,  P.  Marshall,  K.  Keahey,  H.  Tufo,  and  A.  

Alzabarah  

Ê  Science  Cloud,  2013.  

1  

Distributed  Reassignment  Algorithm  (DRA)    (Tziritas  et  al.  ,  2013)  

 

DRA’s  Motivation  (Problem  Statement)  

Ê  Cloud  services  are  widely  used.  

Ê  Pay-­‐as-­‐you-­‐go.  

Ê  Minimizing  resource  consumption  is  important  for  users.  

Ê  Problem:  “Assuming  that  an  application  is  already  hosted  in  a  cloud,  reassign  the  application  components  to  nodes  of  the  system  to  minimize  the  total  communication  and  computational  resources  consumed  by  the  components  of  the  respective  application.”  (Tziritas  et  al.  ,  2013)  

Ê  Using  process  migration.  

3  

Assumptions  

Ê  (1)  The  nodes  are  in  a  tree  network    structure.  Ê  General  network  graphs  makes  the  assignment  problem  NP-­‐

complete  à  many  researchers  limit  their  topology  to  solve  the  problem  in  polynomial  time.    

4  Figure  1b  from  (Tziritas  et  al.  ,  2013)  

It  can  be  organized  in  a    

2-­‐level  hierarchy.  

Assumptions  

Ê  (2)  The  computing  requirement  of  a  process  does  not  change  dynamically  and  its  execution  cost  is  known  a  priori.  Ê  Does  not  always  hold  àlimitation.  

5  

10/15/13  

2  

Problem  Formulation  

6  Figure  1a  from  (Tziritas  et  al.  ,  2013)  

Problem:  given  an  assignment  between  processes  and  nodes  à    reassign  to  minimize  application  resource  consumption,  taking  capacity  into  consideration.  

Total    Execution  Cost  

Total    Communication  Cost  

Fixed  (regardless  of  assignment)    

Depends  on  path  length.  

Single-­‐Process  Migration  

7  Figure  2  from  (Tziritas  et  al.  ,  2013)  

The  migration  is  beneficial  if  the  positive  load  is  larger  than  the  negative  load.  

When  does  the  process  stop  migrating?  

Ê  When  it  becomes  center  of  gravity:  Ê  No  beneficial  migration  to  any  of  its  1-­‐hop  neighbors.  

8  

Possible  Problem  

Ê  Consider  migrating  p1  or  p2  to  n2.  

Ê  Consider  migrating  both  processes.  

 

Ê  This  grouping    of  co-­‐located  processes  produces:  super-­‐process.  

9  

Figure  3  from  (Tziritas  et  al.  ,  2013)  

Unbalanced  Super-­‐Process  Identification  

Ê  A  super-­‐process  that  is  not  at  the  center  of  gravity.    

10   11  

Figures  5  and  6  from  (Tziritas  et  al.  ,  2013)  

Source   Destination  

Negative  load  –  load  with  co-­‐located  processes  

Positive  Load  

P2  and  P5  form  the  super-­‐process  that  

maximizes  migration  benefit  

10/15/13  

3  

Information  for  Migration  Decision  

Ê  Each  node  should  know:  Ê  For  each  process  it  hosts,  the  volume  of  data  used  in  

communication  both  locally  and  externally.  

Ê  1-­‐hop  communicating  neighbors.  

Ê  For  each  process,  the  location  of  its  adjacent  processes.  Ê  Concurrent  migration  of  adjacent  processes  on  neighboring  nodes  

(i.e.  swapping)  is  not  permitted.    

12  

Capacity  Constraint  

Ê  Problem:  the  destination  node  cannot  host  a  super-­‐process.  

Ê  Solutions  in  the  extended  DRA:    Ê  Prune  some  processes  from  the  super-­‐process.  

Ê  The  least  beneficial  processes.    

Ê  Enable  multi-­‐hop  migration.  Ê  Each  node  must  know  others  that  are  at  most  k-­‐hops  away.    

13  

Inefficiencies  and  Limitations  

Ê  (1)  Local  information  vs.  global  information  Ê  Using  only  1-­‐hop  migrations  can  cause  a  series  of  migration  for  a  

process  until  it  reaches  an  x-­‐hop  node.    Ê  à  Management  and  network  overhead.  Ê  à  Can  increase  the  process  execution  time.  (more  in  point  4)  

Ê  (2)  Load  balancing  is  not  considered.  Ê  Only  capacity  limit  is  considered.  Ê  Communicating  processes  can  overload  a  node.  

Ê  à  Can  increase  the  process  execution  time.  (more  in  point  4)  

Ê  (3)  Process  length  is  not  considered.  Ê  Short  processes  may  not  be  worth  migrating.  

14  

Inefficiencies  and  Limitations  

Ê  (4)  Application  cost  =  execution  cost  +  communication  cost,  where  the  execution  cost  is  fixed.    Ê  Not  true;  cost  is  also  affected  by  how  long  they  use  resources  

(e.g.  cost  by  hour).  

Ê  Migration  can  cause  either  speed-­‐up  (e.g.  if  it  was  placed  on  an  under-­‐loaded  node)  or  slowdown  (e.g.  continuous  migration).  

Ê  (5)  Limited  by  assumptions:  tree  networks  and  unchanging  process  requirements.  Ê  The  authors  plan  to  consider  these  issues  in  a  future  extension.  

15  

Rebalancing  in  a  Multi-­‐Cloud  Environment  

(Duplyakin  et  al.  2013)  

16  

Motivation  (Problem  Statement)  

Ê  Multi-­‐cloud  environment  with  user  preferences.  

Ê  Example:  user  uses  a  community  cloud  (the  preferred  cloud),  but  depending  on  the  availability  of  its  limited  resources,  the  additional  work  is  launched  on  a  public  cloud.    

Ê  Automatic  rebalancing:  downscaling  and  up-­‐scaling.  

Ê  Automatic  up-­‐scaling  is  provided  by  Nimbus  Phantom  and  Amazon’s  Auto  Scaling  Service,  but  automatic  downscaling  was  not  addressed.    

Ê  Need  to  define  preferences  and  a  set  of  polices  .  

17  

10/15/13  

4  

Assumptions  

Ê  Multi-­‐clouds  are  deployed  by  a  central  auto-­‐scaling  service  (authors  use  an  open  source  Phantom)  Ê  It  deploys  instances  across  the  clouds  and  monitors  them.  

Ê  Master-­‐worker  environments  Ê  Distributed  workers  request  jobs  from  a  master.  Ê  Central  job  queue  and  and  scheduler  Ê  The  scheduler:  

Ê  Reschedules  jobs  upon  worker  failure  or  rebalancing  termination.  Ê  Provide  workers  status  info.  (state,  running  jobs  and  their  runtimes)  Ê  Add  and  remove  worker  instances.  

Ê  Parallel  workflow  Ê  Independent  jobs  that  can  be  executed  without  order.  

18  

Architecture    

19  Figure  1  from  (Duplyakin  et  al.  2013)  

Need  for  Rebalancing  Policies  

Ê  Because  rebalancing  in  multi-­‐cloud  environment  can  be  disruptive  to  the  workload.  Ê  Terminating  an  instance  can  cause  the  termination  of  a  running  

job    Ê  Job  is  rescheduled.    

Ê  Increase  in  total  execution  time.  

Ê  Without  rebalancing  Ê  Unexpected  expenses.  

20  

Downscaling  Policies  

1.   Opportunistic-­‐Idle  (OI):  Ê  Wait  until  there  are  idle  instances  in  the  less  desired  clouds,  

then  terminate  them.  

Ê  This  policy  continuously  terminates  excess  idle  nodes  until  it  reaches  the  desired  state.  

2.   Force-­‐Offline  (FO):    Ê  Similar  to  the  previous  policy,  but  with  graceful  termination:  

Ê  Mark  as  offline,  and  complete  job.  

21  

Downscaling  Policies  

3.   Aggressive  Policy  (AP):  Ê  Discard  partially  completed  jobs  to  satisfy  request  ASAP.  

Ê  To  minimize  re-­‐execution  overhead,  choose  instances  with  jobs  that  have  the  least  running  time.  

Ê  Use  work  threshold  for  progress.    Ê  Example:  if  t=25%  and  the  instance  is  running  a  2-­‐hour  job,  then  the  

policy  can  terminate  this  job  only  in  the  first  30  minutes.  

22  

Implementation    

Ê  Usage  of  existing  technologies:  Ê  Use  infrastructure  clouds  such  as  Amazon  EC2.  

Ê  The  implementation  is  an  integration  with  the  open  source  Phantom  service  for  auto-­‐scaling.  

Ê  Use  HTCondor  for  a  master-­‐worker  workload  management  system.  

Ê  Implemented  components:  Ê  The  sensor  and  the  rebalancing  policy  (Python).    Ê  Communication  with  the  HTCondor  masters  and  Phantom.  

23  

10/15/13  

5  

Inefficiency    

Ê  Instead  of  terminating  jobs,  they  can  be  check-­‐pointed  then  resumed.  Ê  Check-­‐pointing  is  an  available  service  in  HTCondor.    

24  

Policies  Trade-­‐offs  

25  Figure  6  from  (Duplyakin  et  al.  2013)  

Thank  you  

References    

Ê  Tziritas,  N.,  Khana,  S.  U.,  Xua,  C-­‐Z.,  Loukopoulos,  T.,  and  Lalis,  S.  2013.  On  minimizing  the  resource  consumption  of  cloud  applications  using  process  migrations.  Journal  of  Parallel  and  Distributed  Computing  73,  1690-­‐1704.  

Ê  Duplyakin,  D.,  Marshall,  P.,  Keahey,  K.,  Tufo,  H.,  and  Alzabarah,  A.  2013.  Rebalancing  in  a  Multi-­‐Cloud  Environment.  Science  Cloud’  13,  June  17,  2013,  New  York,  NY,  USA.  

27