18
Deploying large-scale virtual infrastructures with Kadeploy3 Luc Sarzyniec , S´ ebastien Badia, Emmanuel Jeanvoine, Lucas Nussbaum Grid’5000

Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Deploying large-scale virtual infrastructureswith Kadeploy3

Luc Sarzyniec, Sebastien Badia, Emmanuel Jeanvoine, Lucas Nussbaum

Grid’5000

Page 2: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Plan

1 IntroductionKadeploy3KabootstrapOur experiment

2 Scalability challenges

3 Experiment

4 Conclusion

Page 3: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Kadeploy3

Kadeploy DHCP TFTP/HTTP

• Used by Grid’5000 users toinstall/reinstall compute nodes

• Designed for scalability

• Support of a broad range of systems (Linux, Xen, *BSD, etc.)

• Manages catalog of images and user permissions

• Built on top of PXE, DHCP, TFTP (or HTTP)

1 / 11

Page 4: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Kabootstrap

Kadeploy DHCP TFTP/HTTP

KabootstrapInstall your own Kadeploy3testbed on cluster nodes

• Fully automated process

• Automatically gather network informations

• Get hardware information using Grid’5000 API

• Install and configure each service of the Kadeploy3 ecosystemknowing the current network/hardware configuration

• Kadeploy3 is network-intrusiveI Usage of VLANs

2 / 11

Page 5: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Our experiment

BordeauxGrenoble

Lille

Luxembourg

Lyon

NancyReims

Rennes

SophiaToulouse

GoalEvaluate the scalability of Kadeploy3

ProblemNot enough machines on Grid’5000

SolutionDeployment of virtual machines

• Boot over Network

• Execution of parallel commands

• CPU intensive tasks

• File broadcast

3 / 11

Page 6: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Plan

1 Introduction

2 Scalability challengesScalability challenges in KadeployTakTuk (G. Huard, LIG Grenoble)Kastafior (O. Richard, LIG Grenoble)

3 Experiment

4 Conclusion

Page 7: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Scalability challenges in Kadeploy

• Reboot commandsI Windowed operations

• Boot over networkI Use HTTP instead of TFTP (iPXE)

• Parallel command executionI Hierarchical connections, TakTuk

• Environment image broadcastI Topology-aware chain, Kastafior

4 / 11

Page 8: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

TakTuk (G. Huard, LIG Grenoble)

• Hierarchical connections between the nodes

• Adaptative work-stealing algorithm

• Auto-propagation mechanism

• Installed on Grid’5000

• Recommended by Grid’5000 team

http://taktuk.gforge.inria.fr/

TakTuk, Adaptive Deployment of Remote Executions,B. Claudel, G. Huard and O. Richard, HPDC’2009

5 / 11

Page 9: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Kastafior (O. Richard, LIG Grenoble)

imagesserver

Topology aware chained file broadcast

• Chain-based broadcast

• Initialization of the chain with tree-based parallel command

• Saturation of full-duplex networks in both directions

6 / 11

Page 10: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Plan

1 Introduction

2 Scalability challenges

3 ExperimentExperimental processResults of a run on Grid’5000Limits to scalabilityDemo

4 Conclusion

Page 11: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Experimental process

1. Virtual testbed preparation1.1 Reserve and reinstall all nodes on Grid’5000 20m

I Usage of Grid’5000 API

1.2 Prepare Service and Host nodes 5mI Pre-configure Host and Service machines (network interfaces)I Guess how much VMs each Host can runI Dispatch VMs in sub-networks (one per Grid’5000 site)I Launch the VMs on each Host

1.3 Install and configure Service nodes with Kabootstrap 15m

2. One or more Kadeploy runsLogin to the ’kabootstrapped’ frontend and start deployments

7 / 11

Page 12: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Experimental process

1. Virtual testbed preparation1.1 Reserve and reinstall all nodes on Grid’5000 20m

I Usage of Grid’5000 API

1.2 Prepare Service and Host nodes 5mI Pre-configure Host and Service machines (network interfaces)I Guess how much VMs each Host can runI Dispatch VMs in sub-networks (one per Grid’5000 site)I Launch the VMs on each Host

1.3 Install and configure Service nodes with Kabootstrap 15m

2. One or more Kadeploy runsLogin to the ’kabootstrapped’ frontend and start deployments

7 / 11

Page 13: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Experimental process

1. Virtual testbed preparation1.1 Reserve and reinstall all nodes on Grid’5000 20m

I Usage of Grid’5000 API

1.2 Prepare Service and Host nodes 5mI Pre-configure Host and Service machines (network interfaces)I Guess how much VMs each Host can runI Dispatch VMs in sub-networks (one per Grid’5000 site)I Launch the VMs on each Host

1.3 Install and configure Service nodes with Kabootstrap 15m

2. One or more Kadeploy runsLogin to the ’kabootstrapped’ frontend and start deployments

7 / 11

Page 14: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Experimental process

1. Virtual testbed preparation1.1 Reserve and reinstall all nodes on Grid’5000 20m

I Usage of Grid’5000 API

1.2 Prepare Service and Host nodes 5mI Pre-configure Host and Service machines (network interfaces)I Guess how much VMs each Host can runI Dispatch VMs in sub-networks (one per Grid’5000 site)I Launch the VMs on each Host

1.3 Install and configure Service nodes with Kabootstrap 15m

2. One or more Kadeploy runsLogin to the ’kabootstrapped’ frontend and start deployments

7 / 11

Page 15: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Results of a run on Grid’5000

BordeauxGrenoble

Lille

Luxembourg

Lyon

NancyReims

Rennes

SophiaToulouse

Virtualized infrastructure

• 4552 VMs, 355 Hosts (≈2400c)

• 47 Service nodes

• 6 Grid’5000 sites

Virtual machines

• 2 VMs per core

• 914MB RAM per VMI 2-16 VMs per node

Deployment results

• 430MB environment

• Nodeset of > 4000 nodes

• 58 minutes of deployment

• 4537 nodes deployed successfully (99.6%)

8 / 11

Page 16: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Limits to scalability

Nodes reboot

• Uses unreliable protocols: DHCP, TFTP

• Experimental workaroundsI TFTP replaced by HTTP thanks to iPXEI VMs hard disks on ramdiskI Kadeploy3 tuning (reboot windows, big timeouts)

Remote command execution and broadcast of system image

• Heavily stresses the network ARP, UDP and TCP timeouts

• Experimental workaroundsI Custom iPXE ISO image with big timeoutsI DNS settings (sub-networks architecture)I ARP tables size (on each Service node)

9 / 11

Page 17: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Demo

10 / 11

Page 18: Deploying large-scale virtual infrastructures with Kadeploy3kadeploy3.gforge.inria.fr/files/Kadeploy3-g5kschool.pdf · 2014. 2. 28. · Luc Sarzyniec, S ebastien Badia, Emmanuel Jeanvoine,

Conclusion

Grid’5000

• Extensive use of Grid’5000 infrastructureI No specific privilegesI Grid’5000 APII Kadeploy3 multi-siteI Kavlan with global VLAN

• Configuration of a Cloud of KVM virtual machinesI 4552 virtual machinesI On 355 physical machinesI From 6 sites of the Grid’5000 testbed

• Successfully deployed > 4500 nodes

• A lot of improvements in Kadeploy3 internals

Special thanks to the Grid’5000 team that was really reactive when fixing

the numerous bugs we found about KaVLAN and other pieces of the

infrastructure ;)

11 / 11