Virtualizzazione - University of Rome "Tor Vergata" Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica Corso di Sistemi Distribuiti

Virtualizzazione

Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica

Corso di Sistemi Distribuiti e Cloud Computing A.A. 2015/16

Valeria Cardellini

Riferimenti bibliografici •  Virtual machines and virtualization of clusters and data

centers, chapter 3 of Distributed and Cloud Computing. •  Virtualization, chapter 3 of Mastering Cloud Computing. •  J. Daniels,

Server virtualization architecture and implementation, ACM Crossroads, 2009.

•  M. Rosenblum, T. Garfinkel, Virtual machine monitors: current technology and future trends, IEEE Computer, 2005.

•  J.E. Smith, R. Nair, The architecture of virtual machines, IEEE Computer, 2005.

•  Altri articoli sul sito del corso

Valeria Cardellini - SDCC 2015/16

1


Virtualizzazione •  Virtualizzazione: livello alto di astrazione che nasconde i

dettagli dell’implementazione sottostante •  Virtualizzazione: astrazione di risorse computazionali

–  Si presenta all’utilizzatore una vista logica diversa da quella fisica

•  Le tecnologie di virtualizzazione comprendono una varietà di meccanismi e tecniche usate per risolvere problemi di: –  Affidabilità, prestazioni, sicurezza, …

•  Come? Disaccoppiando l’architettura ed il comportamento delle risorse percepiti dall’utente dalla loro realizzazione fisica

2

Virtualizzazione di risorse •  Virtualizzazione delle risorse hw e sw di sistema

–  Macchina virtuale, …

•  Virtualizzazione dello storage –  Storage Area Network (SAN), …

•  Virtualizzazione della rete –  Virtual LAN (VLAN), Virtual Private Network (VPN), …

•  Virtualizzazione del data center


3

Components of virtualized environment •  Three major components:

–  Guest –  Host –  Virtualization layer

•  Guest: system component that interacts with the virtualization layer rather than with the host

•  Host: original environment where the guest is supposed to be managed

•  Virtualization layer: responsible for recreating the same or a different environment where the guest will operate


4


Macchina virtuale •  Una macchina virtuale (VM, Virtual Machine)

permette di rappresentare le risorse hw/sw di una macchina diversamente dalla loro realtà –  Ad es. le risorse hw della macchina virtuale (CPU, scheda di

rete, controller SCSI) possono essere diverse dai componenti fisici della macchina reale

•  Una singola macchina fisica può essere rappresentata e usata come differenti ambienti di elaborazione –  Molteplici VM su una singola macchina fisica

5


Virtualizzazione: cenni storici

•  Il concetto di VM è un’idea “vecchia”, essendo stato definito negli anni ’60 in un contesto centralizzato –  Ideato per consentire al software legacy (esistente) di

essere eseguito su mainframe molto costosi e condividere in modo trasparente le (scarse) risorse fisiche

–  Ad es. il mainframe IBM System/360-67

•  Negli anni ’80, con il passaggio ai PC il problema della condivisione trasparente delle risorse di calcolo viene risolto dai SO multitasking –  L’interesse per la virtualizzazione svanisce

6


Virtualizzazione: cenni storici (2) •  Alla fine degli anni '90, l’interesse rinasce per rendere

meno onerosa la programmazione hw special-purpose –  VMware viene fondata nel 1998

•  Si acuisce il problema dei costi di gestione e di sottoutilizzo di piattaforme hw e sw eterogenee –  L’hw cambia più velocemente del sw (middleware e

applicazioni) –  Aumenta il costo di gestione e diminuisce la portabilità

•  Diventa nuovamente importante la condivisione dell’hw e delle capacità di calcolo non usate per ridurre i costi dell’infrastruttura

•  E’ una delle tecnologie abilitanti del Cloud computing

7

Vantaggi della virtualizzazione

•  Facilita la compatibilità, la portabilità e la migrazione di applicazioni ed ambienti –  Indipendenza dall’hw –  Create Once, Run Anywhere –  VM legacy: eseguire vecchi SO su nuove piattaforme


8

Vantaggi della virtualizzazione (2)

•  Permette il consolidamento dei server in un data center, con vantaggi economici, gestionali ed energetici –  Obiettivo: ridurre il numero totale di server ed utilizzarli in

modo più efficiente –  Vantaggi:

•  Riduzione dei costi e dei consumi energetici •  Semplificazione nella gestione, manutenzione ed upgrade dei

server •  Riduzione dello spazio occupato e dei tempi di downtime


9

Vantaggi della virtualizzazione (3) •  Permette di isolare componenti malfunzionanti o

soggetti ad attacchi di sicurezza, incrementando quindi l’affidabilità e la sicurezza delle applicazioni –  Macchine virtuali di differenti applicazioni non possono avere

accesso alle rispettive risorse –  Bug del software, crash, virus in una VM non possono

danneggiare altre VM in esecuzione sulla stessa macchina fisica

•  Permette di isolare le prestazioni

–  Ad es. tramite opportuno scheduling delle risorse fisiche che sono condivise tra molteplici VM

•  Permette di bilanciare il carico sui server –  Tramite la migrazione della VM da un server ad un altro


10

Taxonomy of virtualization techniques

•  Execution environment virtualization is the oldest, most popular and most developed area we will mostly investigate it


11


Uso di ambienti di esecuzione virtualizzati

•  In ambito personale e didattico –  Per eseguire diversi SO simultaneamente e semplificare

l’installazione di sw

•  In ambito professionale –  Per debugging, testing e sviluppo di applicazioni

•  In ambito aziendale –  Per consolidare l’infrastruttura del data center –  Per garantire business continuity: incapsulando interi

sistemi in singoli file (system image) che possono essere replicati, migrati o reinstallati su qualsiasi server

12


Architetture delle macchine virtuali A che livello realizzare la virtualizzazione? •  Dipende fortemente dalle interfacce offerte dai vari

componenti del sistema –  Interfaccia tra hw e sw (user ISA: istruzioni macchina non

privilegiate, invocabili da ogni programma) [interfaccia 4] –  Interfaccia tra hw e sw (system ISA: istruzioni macchina

invocabili solo da programmi privilegiati) [interfaccia 3] –  Chiamate di sistema [interfaccia 2]

•  ABI (Application Binary Interface): interfaccia 2 + interfaccia 4

–  Chiamate di libreria (API) [interfaccia 1]

•  Obiettivo della virtualizzazione –  Imitare il comportamento di queste

interfacce

Riferimento: “The architecture of virtual machines” 13

An application uses library functions (A1), makes system calls (A2), and executes machine instructions (A3)

Hardware

Operating System

ISA

Libraries

ABI

API

System calls

Applications

System ISA User ISA

A1

A2

A3

Machine reference model


14


Virtualization layers

•  Common virtualization layers include: –  Application level (also process VM) –  Library level (user-level API) –  Operating system level –  ISA level

•  Requires binary translation and optimization, e.g., dynamic binary translation

–  Hardware abstraction layer (also system VM): based on virtual machine monitor (VMM), also called hypervisor

•  VMM: software that securely partitions the resources of a computer system into one or more VMs

15


Virtualization layers (2)

16


Macchina virtuale di processo •  Virtualizzazione per un singolo processo

–  VM di processo: piattaforma virtuale che esegue un singolo processo

–  Fornisce un ambiente ABI o API virtuale per le applicazioni utente

•  Il programma è compilato in un codice intermedio (portabile), che viene successivamente eseguito nel sistema runtime

•  Esempi: JVM, .NET CLR Istanze multiple di combinazioni <applicazione, sistema runtime>

17


Monitor della macchina virtuale •  Uno strato sw separato che scherma

(completamente) l’hw sottostante ed imita l’insieme di istruzioni dell’architettura

•  Sul VMM possono essere eseguiti indipendentemente e simultaneamente sistemi operativi diversi

•  Esempi: VMware, Microsoft Virtual PC, Parallels, VirtualBox, Xen, KVM Istanze multiple di combinazioni

<applicazioni, sistema operativo>

18

Termini e classificazione VMM •  Host: piattaforma di base sulla quale si realizzano

le VM; comprende: –  Macchina fisica –  Eventuale sistema operativo nativo –  VMM

•  Guest: tutto ciò che riguarda ogni singola VM –  Sistema operativo ed applicazioni eseguite nella VM

•  Consideriamo per prima la virtualizzazione a livello di sistema (VMM o hypervisor)

•  Per il VMM, distinguiamo: –  VMM di sistema –  VMM ospitato –  Virtualizzazione completa –  Paravirtualizzazione


19


VMM di sistema o VMM ospitato

VMM ospitato VMM di sistema

host

guest

host

In quale livello dell’architettura di sistema collocare il VMM?

–  Direttamente sull’hardware (VMM di sistema o classico) –  Come applicazione su un SO esistente (VMM ospitato)

gues

t

20

VMM di sistema o VMM ospitato (2) •  VMM di sistema: eseguito direttamente sull’hw, offre

funzionalità di virtualizzazione integrate in un SO semplificato –  L’hypervisor può avere un’architettura a micro-kernel (solo

funzioni di base, no device driver) o monolitica –  Esempi: Xen, VMware ESX

•  VMM ospitato: eseguito sul SO host, accede alle risorse hw tramite le chiamate di sistema del SO host –  Interagisce con il SO host tramite l’ABI ed emula l’ISA di hw

virtuale per i SO guest –  Vantaggio: non occorre modificare il SO guest –  Vantaggio: può usare il SO host per gestire le periferiche ed

utilizzare servizi di basso livello (es. scheduling delle risorse) –  Svantaggio: degrado delle prestazioni rispetto a VMM di

sistema –  Esempi: User-Mode Linux, Parallels Desktop


21

VMM reference architecture •  Dispatcher: VMM entry point that reroutes the

instructions issued by the VM •  Allocator: decides about the system resources to be

provided to the VM •  Interpreter: executes a proper routine when the VM

executes a privileged instruction


22


Virtualizzazione completa o paravirtualizzazione Quale modalità di dialogo tra la VM ed il VMM per l’accesso alle risorse fisiche?

–  Virtualizzazione completa –  Paravirtualizzazione

•  Virtualizzazione completa –  Il VMM espone ad ogni VM interfacce hw simulate

funzionalmente identiche a quelle della sottostante macchina fisica

–  Il VMM intercetta le richieste di accesso privilegiato all’hw (ad es. istruzioni di I/O) e ne emula il comportamento atteso

–  Il VMM gestisce un contesto di CPU per ogni VM e condivide le CPU fisiche tra tutte le VM

–  Esempi: KVM, VMware ESXi, Microsoft Hyper-V

23


Virtualizzazione completa o paravirtualizzazione Quale modalità di dialogo tra la VM ed il VMM per l’accesso alle risorse fisiche?

•  Paravirtualizzazione

–  Il VMM espone ad ogni VM interfacce hw simulate funzionalmente simili (ma non identiche) a quelle della sottostante macchina fisica

–  Non viene emulato l’hw, ma viene creato uno strato minimale di sw (Virtual Hardware API) per assicurare la gestione delle singole istanze di VM ed il loro isolamento

–  Esempi: Xen, Oracle VM (basato su Xen), PikeOS

Confronto qualitativo delle diverse soluzioni per VM en.wikipedia.org/wiki/Comparison_of_platform_virtual_machines

24

Virtualizzazione completa •  Vantaggi

–  Non occorre modificare il SO guest –  Isolamento completo tra le istanze di VM: sicurezza, facilità di

emulare diverse architetture

•  Svantaggi –  VMM più complesso –  Collaborazione del processore per implementazione efficace

•  Perché? •  L’architettura del processore opera secondo almeno 2

livelli (ring) di protezione: supervisor e user –  Solo il VMM opera in supervisor mode, mentre il SO guest e

le applicazioni (quindi la VM) operano in user mode •  Problema del ring deprivileging: il SO guest opera in uno stato

che non gli è proprio ! non può eseguire istruzioni privilegiate •  Problema del ring compression: poiché applicazioni e SO guest

eseguono allo stesso livello, occorre proteggere lo spazio del SO Valeria Cardellini - SDCC 2015/16

25

Virtualizzazione completa (2) •  Possibile soluzione per risolvere il ring deprivileging

–  Trap-and-emulate: se il SO guest tenta di eseguire un’istruzione privilegiata (ad es. lidt in x86), il processore notifica un’eccezione (trap) al VMM e gli trasferisce il controllo; il VMM controlla la correttezza dell’operazione richiesta e ne emula il comportamento

–  Le istruzioni non privilegiate eseguite dal SO guest sono invece eseguite direttamente


Architettura x86 senza virtualizzazione Architettura x86 con virtualizzazione completa e supporto hw per la virtualizzazione

26

Hardware-assisted CPU virtualization •  Hardware-assisted CPU virtualization (Intel VT-x and

AMD-V) provides two new forms of CPU operating modes called root mode and guest mode, each supporting all four x86 protection rings


-  VMM runs in root mode, while all guest OSes run in guest mode in their original privilege levels: this eliminates ring aliasing and ring compression problems

-  VMM can control guest execution through control bits of hardware defined structures

27

Fast binary translation •  Ma il meccanismo di trap al VMM per ogni istruzione

privilegiata è offerto solo dai processori con supporto hardware per la virtualizzazione (Intel VT-x e AMD-V) –  IA-32 non lo è: come realizzare la virtualizzazione completa in

mancanza del supporto hw? •  Fast binary translation: il VMM scansiona il codice prima

della sua esecuzione per sostituire blocchi contenenti istruzioni privilegiate con blocchi funzionalmente equivalenti e contenenti istruzioni per la notifica di eccezioni al VMM


Architettura x86 con virtualizzazione completa e binary translation

-  I blocchi tradotti sono eseguiti direttamente sull’hw e conservati in una cache per eventuali riusi futuri

-  Maggiore complessità del VMM e minori prestazioni

28

Paravirtualization


•  Non-transparent virtualization solution: needs to modify the guest OS kernel

•  Nonvirtualizable OS instructions are replaced by hypercalls that communicate directly with the hypervisor -  A hypercall is to a hypervisor what a syscall is to a kernel

X86 architecture with paravirtualization 29

Paravirtualization (2)


•  Pros (vs full virtualization): –  overhead reduction –  relatively easier and more practical implementation: the

VMM simply transfers the execution of performance-critical operations (hard to virtualize) to the host

•  Cons: –  Requires the source code of OSes to be available

•  OSes that cannot be ported (e.g., Windows) can still take advantage of virtualization by using ad hoc device drivers that remap the execution of critical instructions to the virtual API exposed by the VMM

–  Cost of maintaining paravirtualized Oses

30

Paravirtualization: hypercall execution


•  The hypervisor (not the kernel) has interrupt handlers installed • When a VM application issues a guest OS system call, execution

jumps to the hypervisor to handle, which then passes control back to the guest OS

Courtesy of “The Definitive Guide to XEN hypervisor” by D. Chisnall

31

Memory virtualization •  In a non-virtualized environment

–  One-level mapping: from virtual memory to physical memory provided by page tables

–  MMU and TLB hardware components to optimize virtual memory performance

•  In a virtualized environment –  All VMs share the same machine memory and VMM partitions

memory among VMs –  Two-level mapping: from virtual memory to physical memory

and from physical memory to machine memory

•  Terminology –  Host physical memory: memory visible to VMM –  Guest physical memory: memory visible to guest OS –  Guest virtual memory: memory visible to applications;

continuous virtual address space presented by guest OS to applications


32

Two-level memory mapping


Going from guest virtual memory to host physical memory requires a two-level memory mapping:

Guest VA (virtual address) -> guest PA (physical address) -> host MA (machine address)

33

Shadow page table •  To avoid an unbearable performance drop due to the

extra memory mapping, VMM maintains shadow page tables (SPTs) –  Direct guest virtual-to-host physical address mapping

•  SPT maps guest virtual address to host physical address –  Guest OS maintains its own virtual memory page table (PT) in

the guest physical memory frames –  For each guest physical memory frame, VMM should map it to

host physical memory frame –  SPT maintains the mapping from guest virtual address to host

physical address –  VMM needs to keep the SPTs consistent with changes made

by the guest OS to its PT


34

Challenges in memory virtualization with SPT


•  Address translation –  Guest OS expects contiguous,

zero-based physical memory: VMM must preserve this illusion

•  Page table shadowing –  SPT implementation is complex –  VMM intercepts paging

operations and constructs copy of PTs

•  Overheads –  VM exits add to execution time –  SPTs consume significant host

memory –  SPTs need to be kept

synchronized with guest PTs

35

Hw support for memory virtualization • SPT is a software-managed solution; let us consider

an hardware solution –  Second Level Address Translation (SLAT) is the hardware-

assisted solution for memory virtualization (Intel EPT and AMD RVI) to translate the guest virtual address into the machine’s physical address


–  Using SLAT significant performance gain with respect to SPT: around 50% for MMU intensive benchmarks

36

Performance comparison of hypervisors

•  Developments in virtualization techniques and CPU architectures have reduced the performance cost of virtualization but overheads still exist –  Especially when multiple VMs compete for hw resources

•  We consider some recent performance comparison studies –  See the course site for full papers

•  Shared results of the studies –  No one size fits all solution exists –  Different hypervisor show different performance

characteristics for varying workloads


37

Performance comparison of hypervisors (2) A component-based performance comparison of four

hypervisors (IM 2013) –  Microsoft Hyper-V, KVM, VMware vSphere and Xen, all with

hardware-assisted virtualization settings –  Analyzed components: CPU, memory, disk I/O and network I/O

•  Overall results –  When considering hardware-assisted virtualization,

performance can vary between 3% and 140% depending on the type of hw resource, but no single hypervisor always outperforms the others

•  vSphere performs the best, but the other 3 perform respectably •  CPU and memory: lowest levels of overhead •  I/O and network: Xen overhead for small disk operations

•  Hint: consider the type of applications because different hypervisors may be best suited for different workloads


38

Performance comparison of hypervisors (3)

Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks (BigData 2013) •  Experimental measurements of several benchmarks

using Hadoop MapReduce to evaluate and compare the performance impact of three popular hypervisors: a commercial one (not disclosed), Xen, and KVM

•  For CPU-bound benchmarks, negligible performance difference between the three hypervisors

•  Significant performance variations were seen for I/O-bound benchmarks –  Commercial hypervisor better at disk writing, while KVM

better for disk reading –  Xen better when there was a combination of disk reading and

writing with CPU intensive computations Valeria Cardellini - SDCC 2015/16

39

Virtualizzazione a livello di SO •  Finora considerata virtualizzazione a livello di sistema •  Virtualizzazione a livello di SO (o container-based

virtualization): permette di eseguire molteplici ambienti di esecuzione tra di loro isolati all’interno di un singolo SO –  Detti container, jail, virtual execution environment (VE), virtual private

server (VPS)


–  Evoluzione del comando chroot dei sistemi Unix-like

–  Ogni container ha un proprio insieme di processi, file system, utenti, interfacce di rete con indirizzi IP, tabelle di routing, regole del firewall, …

40

–  I VE condividono il kernel dello stesso SO (e.g., Linux)

Virtualizzazione a livello di SO (2)


Layer di virtualizzazione in OpenVZ

41

Virtualizzazione a livello di SO (3) •  Vantaggi rispetto a virtualizzazione basata su VMM

–  Degrado di prestazioni pressoché nullo •  Le applicazioni invocano direttamente le chiamate di sistema, non c’è

bisogno di passare per il VMM –  Tempi minimi di startup/shutdown –  Densità elevata

•  Centinaia di istanze su una singola macchina fisica, ad es. con Solaris Containers fino a 8191

–  Immagine (footprint) di dimensioni minori •  Non comprende il SO

–  Maggiore facilità di condividere applicazioni cloud •  L’applicazione nel container è indipendente dall’ambiente di esecuzione

•  Svantaggi –  Minore flessibilità

•  Non si possono eseguire contemporaneamente kernel di differenti SO sulla stessa macchina fisica

–  Minore isolamento Valeria Cardellini - SDCC 2015/16

42

Virtualizzazione a livello di SO: esempi •  Alcuni prodotti per la virtualizzazione a livello

di SO: –  FreeBSD Jail –  LXC (LinuX Containers) –  OpenVZ (per Linux) –  IBM LPAR –  Solaris Zones/Containers –  Virtuozzo –  LXD

•  Built on top of LXC –  Docker

•  Built on top of LXC, we’ll analyze it later


43

What’s next for virtualized execution environments

•  We now examine two useful techniques, which can be applied to deploy and manage virtual clusters and virtual data centers –  Live migration of VMs –  Dynamic resizing of VMs


44

Migrazione

•  Migrazione di codice –  In sistemi omogenei –  In sistemi eterogenei

•  Migrazione di macchine virtuali –  Utile in cluster e data center virtuali per:

•  Consolidare l’infrastruttura •  Avere flessibilità nel failover •  Bilanciare il carico

–  Ma l’overhead di migrazione di VM non è trascurabile

–  La migrazione di VM in ambito WAN non è banale


45


Migrazione del codice •  Nei SD la comunicazione può non essere limitata al

passaggio dei dati, ma riguardare anche il passaggio di programmi, anche durante la loro esecuzione

•  Motivazioni per la migrazione del codice –  Bilanciare o condividere il carico di lavoro –  Risparmiare risorse di rete e ridurre il tempo di risposta

processando i dati vicino a dove risiedono –  Sfruttare il parallelismo –  Configurare dinamicamente il SD

46


Modelli per la migrazione del codice •  Processo composto da tre segmenti

–  Segmento del codice •  Le istruzioni del programma in esecuzione

–  Segmento delle risorse •  I riferimenti alle risorse esterne di cui il processo ha bisogno

–  Segmento dell’esecuzione •  Lo stato del processo (stack, PC, dati privati)

•  Alcune alternative per la migrazione –  Mobilità leggera o mobilità forte

•  Leggera: trasferito solo il segmento del codice •  Forte: trasferito anche il segmento dell’esecuzione

–  Migrazione iniziata dal mittente o avviata dal destinatario –  Nuovo processo per eseguire il codice migrato o clonazione

•  Esempi: migrazione di processi in MOSIX •  Argomento approfondito nel corso di Mobile Systems

and Applications

47


Migrazione del codice nei sistemi eterogenei

•  In ambienti eterogenei, la macchina di destinazione può non essere in grado di eseguire il codice –  La definizione del contesto di processo, thread e processore

è fortemente dipendente dall’hw, dal SO e dal sistema runtime

•  Quale soluzione per migrare in ambienti eterogenei? Migrare la macchina virtuale

48

Migrazione di VM •  Approcci per migrare istanze di macchine virtuali tra

macchine fisiche: –  Stop and copy: si spegne la VM sorgente e si trasferisce

l’immagine della VM sull’host di destinazione, ma il downtime può essere troppo lungo

•  L’immagine della VM può essere grande e la banda di rete limitata –  Live migration: la VM sorgente è in funzione durante la

migrazione


49

Migrazione live di VM •  Prima di avviare la migrazione live

–  Fase di setup: si identifica l’host di destinazione (ad es. tramite strategie di load balancing, energy efficiency, server consolidation)

•  Occorre migrare memoria, storage e connessioni di rete in modo trasparente alle applicazioni in esecuzione sulla VM


50

Migrazione live di VM (2) •  Per migrare lo storage:

–  Uso di un sistema di storage di rete (SAN o più economico NAS (Network Attached Server)) oppure di un file system distribuito

–  In alternativa, in assenza di storage condiviso: il VMM che gestisce la VM sorgente salva tutti i dati della VM sorgente in un file di immagine, che viene trasferito sull’host di destinazione

•  Per migrare le connessioni di rete: –  La VM sorgente ha un indirizzo IP virtuale (eventualmente

anche un indirizzo MAC virtuale) •  L’hypervisor conosce il mapping

–  Se sorgente e destinazione sono sulla stessa sottorete IP, non occorre fare forwarding sulla sorgente

•  Invio di risposta ARP non richiesta da parte della destinazione per avvisare che l’indirizzo IP è stato spostato in una nuova locazione ed aggiornare quindi le tabelle ARP Va

leria

Car

delli

ni -

SD

CC

201

5/16

51

Migrazione live di VM (3) •  Per migrare la memoria:

1.  Fase di warm-up: il VMM copia in modo iterativo le pagine da VM sorgente a VM di destinazione mentre la VM sorgente è in esecuzione (pre-copy migration)

2.  Fase di stop-and-copy: la VM sorgente viene fermata e vengono copiate soltanto le pagine dirty •  Tempo di downtime: da qualche msec a qualche sec, in funzione

della dimensione della memoria, delle applicazioni e della banda di rete tra le VM

3.  Fasi di commitment e reactivation: la VM di destinazione carica lo stato e riprende l’esecuzione dei programmi; la VM sorgente viene rimossa (ed eventualmente spento l’host sorgente)

•  Tale approccio è detto pre-copy –  Lo stato della VM è copiato dalla sorgente alla destinazione

prima che l’esecuzione della VM riprenda a destinazione –  E’ la soluzione standard (ad es. in KVM)


52

VM live migration: overall process


53 Source: C. Clark et al., “Live Migration of Virtual Machines”, NSDI’05.

VM live migration: alternatives for memory •  Pre-copy cannot migrate in a transparent manner

workloads that are CPU and/or memory intensive 1. Alternative approach: post-copy

–  Post-copy moves the execution to the destination host at the beginning of the migration process and then transfers the memory pages in an on-demand manner as they are requested by the VM

2. Alternative approach: hybrid –  A special case of post-copy migration: post-copy preceded

by a limited pre-copy stage –  Idea: a subset of the most frequently accessed memory

pages is transferred before the VM execution is switched to the destination, so to reduce performance degradation after the VM is resumed

•  No standard implementation of post-copy and hybrid approaches in current hypervisors


54

Approaches for migrating memory


55

Courtesy of Cima Vojtech blog.zhaw.ch/icclab/performance-analysis-of-post-copy-live-migration-in-openstack/

Migrazione live di VM (4)

•  Vantaggi rispetto alla migrazione del codice –  Trasferimento consistente ed efficiente dello stato della VM –  La macchina fisica sorgente può essere spenta (server

consolidation)

•  Esempi: –  Supportata da alcune soluzioni di virtualizzazione già

considerate (KVM, Hyper-V, Xen, VirtualBox, OpenVZ, Virtuozzo)

–  Come prodotto extra: VMware vMotion

•  Supporto molto limitato della migrazione in reti WAN da parte dei prodotti commerciali


56

VM migration in WAN environments

•  So far we focused on VM migration within a single data center

•  How to achieve live migration of VMs across multiple geo-distributed data centers?


57

VM migration in WAN environments: storage •  Some approaches to migrate storage in a WAN

–  Shared storage •  Cons: storage access time can be too slow

–  On-demand fetching •  Transfer only some blocks on the destination and then fetch

remaining blocks from the source only when requested •  Cons: it does not work if the source crashes

–  Pre-copy/write throttling •  Pre-copy the disk image of the VM to the destination whilst

the VM continues to run, keep track of write operations on the source (delta) and then apply the delta on the destination

•  If the write rate at the source is too fast, use write throttling to slow down the VM so that migration can proceed


58

VM migration in WAN environments: network •  Some approaches to migrate network connections in

a WAN –  IP tunneling

•  Set up an IP tunnel between the old IP address at the source and the new VM IP address at the destination

•  Use the tunnel to forward all packets that arrive at the source for the old IP address

•  Once the migration has completed and the VM can respond at its new location, update the DNS entry with the new IP address

•  Tear down the tunnel when no connections remain that use the old IP address

•  Cons: it does not work if the source crashes –  Virtual Private Network (VPN)

•  Use MPLS-based VPN to create the abstraction of a private network and address space shared by multiple data centers

–  Software-Defined Networking •  Change the control plane, no need to change IP address!


59

Software-Defined Networking •  “Software-Defined Networking (SDN) is an emerging

network architecture where network control is decoupled from forwarding and is directly programmable” (www.opennetworking.org)

•  Characterized by four distinguished features


Reference: R. Jain and S.Paul, “Network Virtualization and Software Defined Networking for Cloud Computing: A Survey”, IEEE Comm., 2013.

–  Decoupling the control

plane from the data plane –  Centralization of the

control plane –  Programmability of the

control plane –  API standardization

•  OpenFlow is the first standard interface designed for SDN

60

Migration in container-based virtualization

•  Up to now we focused on live migration of virtual machines

•  What about live migration of containers? •  Still in its infancy

–  Most efforts in LXD and Virtuozzo


61

VM dynamic resizing •  Fine-grain mechanism with respect to VM migration or

VM allocation/deallocation –  Example: application running on a VM starts consuming a lot

of resources and the VM starts running out of RAM and CPU

resize the VM

•  Pros: more cost-effective and rapid than VM allocation/deallocation

•  Cons: not supported by all virtualization products and guest OSs

•  What can be resized without powering off and rebooting the VM? –  Number of CPUs –  Memory size


62

CPU resizing

•  To add or remove CPUs (without switching off the machine)

•  In Linux-based systems support for CPU hot-plug/hot-unplug (e.g., KVM hypervisor) –  Uses information in virtual file system sysfs (processor info in

/sys/devices/system/cpu) –  /sys/devices/system/cpu/cpuX for cpuX (X=0, 1, 2, …) –  To turn on cpuX: echo 1 > /sys/devices/system/cpu/cpuX/online –  To turn off cupX: echo 0 > /sys/devices/system/cpu/cpuX/online


63

Memory resizing •  Based on memory ballooning technique

–  In KVM: virtio_balloon driver


•  When balloon inflates –  swap area

–  out-of-memory (OOM) killer

•  When balloon deflates: more memory for the VM – the memory size cannot

exceed maxMemory

64


Case study: Xen •  Xen is the most notable example of paravirtualization

www.xenproject.org (developed at University of Cambridge) –  The hypervisor offers to the guest OS a virtual interface (hypercall

API) to whom the guest OS must refer to access the machine physical resources

–  Xen requires to the guest OS kernel support and drivers (now part of the Linux kernel as well as other operating systems)

–  Xen supports also hardware-assisted virtualization (called HVM) so that unmodified guest OSes (e.g., Windows) can be used

–  Xen is the foundation for many products and platform (e.g., Oracle VM and Qubes) and powers some of the largest IaaS providers (e.g., Amazon, GoGrid, Rackspace)

•  Pros –  Thin hypervisor model; open source; flexibility in management –  Minimal overhead (within 2.5%) with respect to the bare metal

machine without virtualization –  Can migrate VMs between different servers

65

Vale

ria C

arde

llini

- S

DC

C 2

015/

16

Case study: Xen (2) •  Goal of the Cambridge group (late 1990s): to design a VMM

capable of scaling to about 100 VMs running applications and services without any modifications to ABI

•  Linux, NetBSD, FreeBSD can operate as paravirtualized Xen guest OS running on x86, x86-64, Itanium, and ARM architectures

•  Xen domain: represents VM instance; ensemble of address spaces hosting a guest OS and applications running under the guest OS; runs on a virtual CPU -  Dom0 (or control domain): special domain devoted to execution of

Xen control functions and privileged instructions; contains the drivers for all the devices in the system

-  DomU (or unprivileged domain): user domain

•  Applications make system calls using hypercalls processed by Xen; privileged instructions issued by a guest OS are paravirtualized and must be validated by Xen

66

Xen for the x86 architecture

wiki.xen.org/wiki/Xen_Overview Valeria Cardellini - SDCC 2015/16

X86 hardware

Domain0 control interface

Virtual x86 CPU

Virtual physical memory Virtual network Virtual block

devices

Xen

Management OS

Xen-aware device drivers

Application Application Application

Guest OS


Guest OS



Guest OS


Dom0 DomU

67

Xen architecture and guest OS management

•  Xen hypervisor runs in the most privileged mode and controls the access of guest OS to underlying hw –  Domains are run in ring 1 –  Applications in ring 3


68

Dom0 components •  XenStore: a Dom0 process that provides an information

storage space shared between domains –  Supports a system-wide registry and naming service –  Implemented as a hierarchical key-value storage –  A watch function informs listeners of changes of the key in storage

they have subscribed to –  Communicates with guest VMs via shared memory using Dom0

privileges

•  Toolstack: responsible for creating, destroying, and managing the resources and privileges of VMs –  To create a new VM, a user provides a configuration file describing

memory and CPU allocations and device configurations –  Toolstack parses this file and writes this information in XenStore –  Takes advantage of Dom0 privileges to map guest memory, to load a

kernel and virtual BIOS and to set up initial communication channels with XenStore and with the virtual console when a new VM is created


69

CPU schedulers in Xen •  Xen allows to choose among different CPU schedulers

to schedule domains (virtual CPUs or VCPUs) to run on physical CPUs –  A further scheduling level with respect to those provided by

OSes (scheduling of processes and scheduling of user-level threads within processes)

•  Scheduling algorithm goals: –  To make sure that domains get "fair” share of CPU

•  Proportional share algorithm: allocates CPU in proportion to the number of shares (weights) assigned to VCPUs

–  To keep the CPU busy •  Work-conserving algorithm: does not allow the CPU to be idle when

there is work to be done –  To schedule with low latency


Riferimento: L. Cherkasova, D. Gupta, A. Vahdat, “Comparison of the three CPU schedulers in Xen”, ACM SIGMETRICS Performance Evaluation Review, 2007.

70

CPU schedulers in Xen (2) •  We’ll examine three algorithms: BTV, SEDF, Credit

scheduler –  Credit scheduler: current default and recommended choice

•  Borrowed Virtual Time (BTV) –  Proportional share, work-conserving scheduler based on the

concept of virtual time –  The scheduler selects the domain (Dom for short) with the

smallest virtual time first; each domain receives a share of CPU in proportion to its weight

•  Setting of different domain weights, for example –  Dom1: weight 1 (20%) –  Dom2: weight 3 (80%)

–  Cons: lack of non-work-conserving mode


71

CPU schedulers in Xen (3) •  Simple Earliest Deadline First (SEDF)

–  Variant of a well-known real-time scheduling algorithm –  The scheduler selects the VCPU with the closest deadline (i.e., the

lastest time the VCPU can be run to meet its deadline) –  Each domain Domi specifies its CPU requirements with tuple (si, pi,

xi), where slice si and period pi together represent the CPU share that Domi requests and are used to calculate the deadline

•  Example: si=10 ms and pi=100 ms; this VCPU can be scheduled as late as 90 ms into the 100 ms period and still meet its deadline

•  Domi will receive at least si units of time in each period pi; xi is a boolean indicating whether Domi is eligible to receive extra CPU time other than its slice if the CPU is idle (work-conserving mode)

–  Cons: no global load balancing among multiple CPUs because SEDF maintains per-CPU queues and schedules the queues on individual CPUs

•  CPU1: Dom1 scheduled and 80% of CPU usage •  CPU2: Dom2 scheduled and 80% of CPU usage •  Dom3 with 30% of CPU usage cannot be scheduled because each CPU has only

20% of available CPU share

Vale

ria C

arde

llini

- S

DC

C 2

015/

16

72

CPU schedulers in Xen (4) •  Credit scheduler

–  Proportional fair share and work-conserving scheduler –  Each domain is assigned a weight and optionally a cap

(tunable parameters) •  Weight: relative CPU allocation per domain •  Cap: absolute limit on the amount of CPU time a domain can

use. If cap is 0, then VCPU can receive any extra CPU; non-zero cap limits the amount of CPU a VCPU receives

•  The scheduler transforms the weight into a credit allocation for each VCPU; as a VCPU runs, it consumes credits

–  For each CPU, the scheduler maintains a queue of VCPUs, with all the under-credit VCPUs first, followed by the over-credit VCPUs; the scheduler picks the first VCPU in the queue

–  Automatically load balances VCPUs across physical CPUs on SMP host

•  Before a CPU goes idle, it will consider other CPUs in order to find any runnable VCPU; this approach guarantees that no CPU idles when there is runnable work in the system

Valeria Cardellini - SDCC 2015/16 wiki.xen.org/wiki/Credit_Scheduler

73

Case study: Docker •  Lighweight, open and secure container-based

virtualization –  Containers include the application and all of its

dependencies, but share the kernel with other containers –  Containers run as an isolated process in userspace on the

host operating system –  Containers are also not tied to any specific infrastructure


74

Case study: Docker (2)

•  Docker is built in Go language •  Docker is a higher-level platform built on LXC •  LXC uses cgroups (control groups) Linux kernel

feature and namespaces •  cgropus allows resource management for groups of control:

resource limitation, prioritization, accounting and control •  namespaces provides a private, restricted view towards

certain system resources within a container (i.e., a form of sandboxing)


75

Case study: Docker (3)

•  Snapshot of the OS and apps into a Docker image and then image deployment on other Docker hosts –  Target machine must be Docker-enabled –  Enables the distribution of applications with their runtime

environment: a Docker image incorporates all the dependencies and configuration necessary for it to run, eliminating the need to install packages and troubleshoot dependencies

•  Dockers adds to LXC –  Portable deployment across machines –  Versioning, i.e. git-like capabilities –  Component reuse –  Shared libraries, e.g., https://hub.docker.com


76

Docker: performance comparison

Hypervisors vs. lightweight virtualization: a performance comparison (IC2E 2015 Workshops) •  Performance comparison: KVM (hypervisor) vs. LXC,

Docker and OSV (lightweight virtualization solutions) –  Docker runs over LXC –  OSV runs over KVM

•  What is OSV? –  An open-source dedicated operating system designed

exclusively for the Cloud –  In contrast to container-based virtualization solutions, OSV is

intended to be run on top of a hypervisor (e.g., KVM, Xen) –  It achieves the isolation benefits of hypervisor-based systems,

but avoids the overhead of the guest OS OSv - Optimizing the Operating System for Virtual Machines


77

Docker: performance comparison (2)

•  Performance comparison of CPU, memory, disk I/O, and network

•  KVM performance has dramatically improved during the last few years –  Disk I/O efficiency can represent still a bottleneck for some

types of applications

•  Level of overhead introduced by containers can be considered almost negligible –  But versatility and ease of management is paid in terms of

security

•  Network efficiency represent an important open issue for all solutions (especially UDP traffic)


78


Case study: PlanetLab •  PlanetLab: sistema distribuito collaborativo

caratterizzato da virtualizzazione distribuita www.planet-lab.org –  Ampio insieme di macchine sparse su Internet

•  Più di 1300 nodi in oltre 700 siti –  Usato come testbed per sperimentare sistemi ed

applicazioni distribuite su scala planetaria in un ambiente reale

–  Alcuni esempi di sistemi ed applicazioni testati su PlanetLab •  Distributed Hash Table: Chord, Tapestry, Pastry, Bamboo •  Content Distribution Network: CoDeeN, CoBlitz (ora Akamai

Aura) •  Virtualizzazione ed isolamento (Denali) •  Multicast di livello applicativo (Scribe) •  Misure di rete (Scriptroute, I3)

79

Riferimento: L. Peterson, T. Roscoe, “The design principles of PlanetLab”, Operating Systems Review, 40(1):11-16, Jan. 2006.


Case study: PlanetLab (2) •  Organizzazione di un nodo PlanetLab

–  2002-2012: versione modificata di Linux per eseguire sullo stesso nodo molteplici vserver tra loro isolati

–  Da fine 2012 migrazione a LXC: isolamento tra container

•  Un’applicazione in PlanetLab viene eseguita in una slice della piattaforma -  Slice: insieme di vserver/container in esecuzione su nodi

diversi, su ciascuno dei quali l’applicazione riceve una frazione di risorse del nodo; una sorta di cluster virtuale

•  Virtualizzazione distribuita: insieme distribuito di VM che sono gestite dal sistema come un’entità singola -  Programmi appartenenti a slice diverse, ma in esecuzione

sullo stesso nodo, non interferiscono gli uni con gli altri -  Si possono eseguire simultaneamente molteplici

esperimenti

80

Storage virtualization •  Decouple the physical organization of the storage from its

logical representation –  “Storage virtualization means that applications can use storage

without any concern for where it resides, what the technical interface is, how it has been implemented, which platform it uses, and how much of it is available” (R. van der Lans)

•  Two primary types of storage virtualization –  Block level

•  The virtualized storage system aggregates multiple network storage devices into a single block-level substrate and then presents to users a logical space for data storage and handles the process of mapping it to the actual physical location

–  File level •  Eliminates the dependencies between the data accessed at the

file level and the location where the files are physically stored


81

Storage virtualization: SAN •  The most common solution for implementing block-level

storage virtualization is through a Storage Area Networks (SAN) –  SAN uses a network-accessible device through a large

bandwidth connection to provide storage facilities

Vale

ria C

arde

llini

- S

DC

C 2

015/

16

•  Fiber Channel (FC): high-speed network technology primarily used to connect storage –  Requires special-purpose

cabling –  For high performance

requirements •  Internet SCSI (iSCSI): IP-based

protocol for linking data storage facilities –  Uses existing network

infrastructures –  For moderate performance

requirements 82

Network virtualization •  A process of abstraction which separates logical

network behavior from the underlying physical network resources –  Virtualizable network resources: Network Interface Card

(NIC), L2 switch, L2 network, L3 router, L3 network

•  A method of combining the available resources in a network by splitting up the available bandwidth into channels –  Each channel is independent and can be assigned (or

reassigned) to a server or device in real-time

•  The idea is that virtualization disguises the true complexity of the network by separating it into manageable parts


83

Network virtualization (2)

•  At the physical machine level (also internal network virtualization): using VMM features –  Goal: to create a “network in the box” –  Various options, including Network Address Translation

(NAT)

•  Among multiple systems (also external network virtualization): using virtual LAN (VLAN) technology and “intelligent” (layer 3) switches –  VLAN: a group of hosts with a common set of requirements

that communicate as if they were attached to the same broadcast domain, regardless of their physical location


84

Virtual cluster •  Virtual cluster nodes: either physical or virtual machines

–  The VMs in a virtual cluster are interconnected logically by a virtual network across several physical networks

•  VMs can be replicated in multiple servers to promote distributed parallelism, fault tolerance, and disaster recovery

•  Size (number of nodes) of a virtual cluster: can grow or shrink dynamically

•  Failure of a physical node may disable some VMs installed on that node; but failure of VMs will not pull down the host system

•  Issue to address: to efficiently store the large number of VM images –  VM images by hypervisors are large (typically 1-30 GB in size) –  Contained-based virtualization helps in reducing the image size


85

Cloud platform with virtual clusters


86

Virtual cluster based on application partitioning


87

Desktop virtualization

•  Desktop virtualization abstracts the desktop environment available on a user device (PC/laptop, tablet, smartphone) in order to provide access to it using a client/server model –  The desktop environment is stored in the Cloud that provides

a highly-available infrastructure and ensures the accessibility and persistence of data

•  Pros: high availability, persistence, accessibility (same desktop environment accessible from everywhere) and ease of management

•  Cons: security issues •  Examples: Citrix XenDesktop, Microsoft Desktop

Virtualization, Oracle Virtual Desktop Infrastructure, Amazon WorkSpaces


88

Software Defined Systems •  The new IT buzzword: Software Defined Systems (SDS)

–  Systems that have added software components which help abstract their actual IT equipment and other layers

–  Such separation provides a great opportunity for system administrators to more easily construct and managing their systems through flexible software layers

–  VMM is one classical example

•  Software Defined Systems include: –  Software Defined Networking (SDN) –  Software Defined Storage –  Software Defined Servers (i.e., virtualization of the execution

environment) –  Software Defined Data Centers (SDDC)

•  Extending virtualization to all resources of a data center –  Software Defined Clouds (SDCloud) –  Software Defined Security (SDSec)


89

Documents

Virtualizzazione - University of Rome "Tor Vergata" Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica Corso di Sistemi Distribuiti