Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Ganeti, the New and Arcaneganeti's best kept secrets, and exciting new developments
Ganeti Eng Team - GoogleLinuxCon Japan 2014 - 2 Feb 2014
Introduction to GanetiA cluster virtualization manager, in one slide
What is Ganeti?
Manage clusters 1-200 of physical machines, divided in nodegroupsDeploy Xen/KVM/LXC virtual machines on them
Controlled via command line, REST, web interfaces
·
·
Live migrationResiliency to failure (DRBD, Ceph, SAN/NAS, ...)Cluster balancingEase of repairs and hardware swaps
-
-
-
-
·
4/53
Newest featuresDevelopment status
2.10The very stable release
Improved upgrade procedure "gnt-cluster upgrade"CPU Load in hail/hbal (GSOC project)Hotplug support (KVM)RBD storage direct access (KVM)Better Openvswitch support (GSOC project)
·
·
·
·
·
6/53
2.11The latest stable release
Faster instance movesGlusterFS supporthsqueeze (achieve maximum cluster compaction)
·
·
·
7/53
2.12 and futureThe next stable release(s)
Jobs as processesNew install modelMore secure master candidatesBetter container support (GSOC)Resource reservation/Extra parallelizationGeneric conversion between disk templates (GSOC)
·
·
·
·
·
·
8/53
Monitoring daemonWhat's going on in your cluster?
Monitoring a clusterThe old school way
Cluster
Master
Node
Instance
Storage
NICsMonitoring
System
OtherSystems
10/53
Monitoring a clusterUsing the monitoring daemon
ClusterMonitoring
System
OtherSystems
Monitoring Daemons
11/53
What is the monitoring daemon?
Provides information:
design doc: design-monitoring-agent.rst
about the cluster state/healthliveread-only
·
·
·
12/53
More details
HTTP daemonReplying to REST-like queries
Providing JSON replies
Running on every node (Not: only master-candidates, VM-enabled)Additionally: mon-collector: quick 'n dirty CLI tool
·
·
Actually, GET only·
·
Easy to parse in any languageAlready used in all the rest of Ganeti
·
·
·
·
13/53
Data collectors
provide data to the deamonone collector, one reportone collector, one category:
two kinds: performance reporting, status reportingnew feature: stateful data collectors
·
·
·
storage, hypervisor, daemon, instance-
·
·
14/53
Data collectorsWhat data can be retrieved right now?
Now:
Soon(-ish):
instance status (Xen only) (category: instance)diskstats information (storage)LVM logical volumes information (storage)DRBD status information (storage)Node OS CPU load average (no category, default)
·
·
·
·
·
instance status for KVM (instance)Ganeti daemons status (daemon)Hypervisor resources (hypervisor)Node OS resources report (default)
·
·
·
·
15/53
The report format
{ "name" : "TheCollectorIdentifier", "version" : "1.2", "format_version" : 1, "timestamp" : 1351607182000000000, "category" : null, "kind" : 0, "data" : { "plugin_specific_data" : "go_here" }}
JSON
name: the name of the plugin. Unique string.version: the version of the plugin. A string.format_version: the version of the data format of the plugin. Incrementalinteger.timestamp: when the report was produced. Nanoseconds. Can be zero-padded.
·
·
·
·
16/53
Status reporting collectors: report
They introduce a mandatory part inside the data section.
"data" : { ... "status" : { "code" : <value> "message: "some summary goes here" }}
JSON
<value>:by increasing criticality level·
0: working as intended1: temporarily wrong. Being auto-repaired2: unknown. Potentially dangerous state4: problems. External intervention required
·
·
·
·
17/53
How to use the daemon?
Accepts HTTP connections on node.example.com:1815
GET requests to specific addressesEach address returns different info according to the API
·
Not authenticated: read onlyJust firewall, or bind on local address only
·
·
·
·
/ (return the list of supported protocol version)/1/list/collectors/1/report/all/1/report/[category]/[collector_name]
18/53
Configuration Daemon (confd)How's your cluster supposed to look like?
Before confd
Configuration only available on master candidatesFew selected values replicated with ssconf
Need for a way to access config from other nodes
·
·
Small pieces of config in text files on all the nodesDoesn't scale
·
·
·
ScalableNo single point of failure (so, no RAPI)
·
·
20/53
What does confd do?
Provides information from config.dataRead-onlyDistributed
Optional
·
·
·
Multiple daemons running on master candidatesAccessible from all the nodes through confd protocolResilient to failures
·
·
·
·
21/53
What info does it provide?
Replies to simple queries:
PingMaster IPNode roleNode primary IPMaster candidates primary IPsInstance IPsNode primary IP from Instance primary IPNode DRBD minorsNode instances
·
·
·
·
·
·
·
·
·
22/53
confd protocolGeneral description
UDP (port 1814)keyed-Hash Message Authentication Code (HMAC) authentication
Timestamp
Queries made to any subset of master candidatesTimeoutMaximum number of expected replies
·
·
Pre-shared, cluster wide keyGenerated at cluster-initRoot-only readable
·
·
·
·
Checked (± 2.5 mins) to prevent replay attacksUsed as HMAC salt
·
·
·
·
·
23/53
Confd protocolRequest/Reply
request request request request request
24/53
Confd protocolRequest/Reply
reply (v: 56)
timeout
reply (v: 57)(enough replies)
reply (v: 57) reply (v: 57)
25/53
confd protocolRequest
plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f"}
CONFD
plj0: fourcc detailing the message content (PLain Json 0)hmac: HMAC signature of salt+msg with the cluster hmac key
·
·
26/53
confd protocolRequest
plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f"}
CONFD
msg: JSON-encoded query·
protocol: confd protocol version (=1)type: What to ask for (CONFD_REQ_* constants)query: additional parametersrsalt: response salt == UUID identifying the request
·
·
·
·
27/53
confd protocolReply
plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af"}
CONFD
salt: the rsalt of the queryhmac: hmac signature of salt+msg
·
·
28/53
confd protocolReply
plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af"}
CONFD
msg: JSON-encoded answer·
protocol: protocol version (=1)status: 0=ok; 1=erroranswer: query-specific replyserial: version of config.data
·
·
·
·
29/53
Ready-made clients
The protocol is simple, but clients are simpler
Ready to use confd clients·
Python
Haskell
·
lib/confd/client.py·
·
Since Ganeti 2.7src/Ganeti/ConfD/Client.hssrc/Ganeti/ConfD/ClientFunctions.hs
·
·
·
30/53
Expanding confd capabilities
Currently not so many queries are supportedEasy to add new ones
·
·
Just add a new query type in the constants list...and extend the buildResponse function(src/Ganeti/Confd/Server.hs to reply to it in the appropriate way
·
·
31/53
Ganeti and NetworksHow do your instances talk to the world?
Some slides contributed by Dimitris Aragiorgis <[email protected]>·
current nics: MAC + IP + link + mode
NIC configuration
Management
mode=bridged uses brctl addifHooks can deal with firewall rules, and moreExternal systems needed for DHCP, IPv6, etc.
·
·
·
Which VMs are on the same collision domain?Which IP is free for a new VM to use?
·
·
33/53
gnt-network overview
manage collision domains for your instanceseasy way to assign IPs to instances
keep existing per-nic flexibilityhide underlying infrastructurebetter networking overview
·
·
If resources are shared in multiple clusters, allocation must be doneexternally
-
·
·
·
34/53
gnt-network: Who does what?
masterd: config.data integrity
external scripts and hooks: ping vm1.ganeti.example.com
abstract network infrastructure: network + netparams per nodegroupIP uniqueness inside network: IP pool management
encapsulate network information in NIC opjects: RPC
·
·
bitarray, TemporaryReservationmanager, Locking-
·
use exported environment provided by nodedbrctl, iptables, ebtables, ip rule, etc.update external dhcp/DNS server entrieslet VM act unaware of the "situation" (dhclient, etc.)
·
·
·
·
35/53
gnt-network + external scripts
gnt-network alone is nothing more than a nice config.datasnf-network: node level scripts and hooksnfdhcpd: node level DHCP server based on NFQUEUE
·
·
·
36/53
snf-networknode level scripts and hooks
overrides Ganeti default scripts (kvm-ifup, vif-ganeti)looks for specific tag types in NIC's networkapplies corresponding rulescreated nfdhcpd binding filesprovides hook to update DNS entries
·
·
·
·
·
37/53
nfdhcpdnode level DHCP server based on NFQUEUE
listens on specific NFQEUEupdates its leases db
mangles DHCP requests and replies based on it's dbresponds to RS and NS for IPv6 auto-configuration
·
·
inotify on specific directory for binding files-
·
·
38/53
gnt-networkExamples
Create and connect a new network
Create an instance inside this network
gnt-network add --network 192.168.1.0/24 --gateway 192.168.1.1 --tags nfdhcpd net1gnt-network connect net1 bridged prv0
gnt-instance add --net 0:ip=pool,network=net1 ... inst1gnt-instance info inst1gnt-network info net1
39/53
gnt-network + snf-*Examples
Use snf-network and nfdhcpd
Test connectivity
apt-get install snf-network nfdhcdpdiptables -t mangle -A PREROUTING -i prv+ -p udp -m udp --dport 67 \ -j NFQUEUE --queue-num 42ip addr add 192.168.1.1/24 dev prv0
gnt-instance reboot inst1ping 192.168.1.2
40/53
References
snf-network: http://code.grnet.gr/git/snf-networknfdhcpd: http://code.grnet.gr/git/snf-nfdhcpd
·
·
41/53
Ganeti ExtStorage InterfaceMore options for your data
Some slides contributed by Constantinos Venetsanopoulos <[email protected]>·
State before the ExtStorage Interface
Non-mirrored templates: plain, fileInternally mirrored templates: drbdExternally mirrored templates: sharedfile, rbd, blockdev, diskless
·
·
·
43/53
Ganeti and external SAN/NAS applicances
Instance disks residing inside an external SAN/NAS appliance visible by allGaneti nodes (e.g. NetApp, EMC, IBM)
Instances should be able to migrate/failover to any node that can access theappliance.
Ganeti should integrate with external SAN/NAS appliances in a generic way,independent of the appliance itself in the easiest possible way from theadmin's perspective.
·
·
·
44/53
Introducing the 'ExtStorage Interface'
A simple interface inspired by the Ganeti OS interfaceTo plug an appliance to Ganeti, we need a corresponding 'ExtStorage provider'which is a set of scripts residing under a directory.e.g. /usr/share/ganeti/extstorage/provider1/
·
·
·
45/53
ExtStorage provider methods
Every ExtStorage provider should provide the following methods:
Create a disk on the applianceRemove a disk from the applianceGrow a disk on the applianceAttach a disk to a given Ganeti nodeDetach a disk from a given Ganeti nodeSetInfo on a disk (add metadata)Verify the provider's supported parameters
·
·
·
·
·
·
·
46/53
ExtStorage provider scripts
The methods are implemented in the corresponding 7 executable scripts, usingappliance-specific tools:
attach returns a block device path on success
Input via environment variables, e.g. VOL_NAME, VOL_SIZE
# ls -l /usr/share/ganeti/extstorage/provider1
createremovegrowattachdetachsetinfoverify
47/53
The new 'ext' template
Introduce a new externally mirrored disk template: extIntroduce a new disk option: provider
·
·
48/53
Using the interfaceExample
Assuming two appliances visible by a Ganeti cluster and their two ExtStorageproviders installed on all Ganeti nodes:
/usr/share/ganeti/extstorage/emc/*/usr/share/ganeti/extstorage/ibm/*
# gnt-instance add -t ext --disk=0:size=2G,provider=emc
# gnt-instance add -t ext --disk=0:size=2G,provider=emc \ --disk=1:size=1G,provider=emc \ --disk=2:size=10G,provider=ibm
# gnt-instance modify --disk 3:add,size=20G,provider=ibm
# gnt-instance migrate testvm1
# gnt-instance migrate n- nodeX.example.com testvm1
49/53
ExtStorage Interface dynamic parameters
Support for dynamic passing of arbitrary parameters to ExtStorage providersduring instance creation/modification per-disk:
The above parameters will be exported to the ExtStorage provider's scripts asenvironment variables:
# gnt-instance add -t ext --disk=0:size=2G,provider=emc,param1=value1,param2=value2 --disk=1:size=10G,provider=ibm,param3=value3,param4=value4
# gnt-instance modify --disk 2:add,size=3G,provider=emc,param5=value5
EXTP_PARAM1 = str(value1)EXTP_PARAM2 = str(value2)...
50/53
The new 'gnt-storage' client
Inspired by gnt-os:
# gnt-storage diagnose# gnt-storage info
51/53
Some images borrowed / modified from Lance Albertson, Iustin Pop,