42
Aug 2014 HTCondor Overview 1 glideinWMS Training HTCondor Overview by Igor Sfiligoi, UC San Diego

glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Embed Size (px)

Citation preview

Page 1: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 1

glideinWMS Training

HTCondor Overviewby Igor Sfiligoi, UC San Diego

Page 2: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 2

Overview

● These slides present a HTCondor overview, with high level views of– Deamons involved

– Communication paths

– Scalability considerations

Page 3: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 3

HTCondor DaemonsThe basics

Collector

Schedd Startd

Negotiator

Submit side Execute sideGlue

Page 4: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 4

HTCondor DaemonsThe basics + the master

Collector

Schedd Startd

Negotiator

Submit side Execute sideGlue

Master Master

Master

Page 5: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 5

HTCondor Daemons

● One startd per (logical) compute resource– Can handle multiple CPUs

● One schedd per submit node– Can handle multiple users

● Collector has the list of all other daemons● Negotiator matches user jobs to machine slots● Master starts all other processes

– Will ignore it in the rest of the talk

Page 6: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 6

Communication flow

Collector

Schedd Startd

Negotiator

Push:I am here andthese are my properties

One ClassAdx slot

Push:I am (still) here

Pull:Send me the listof idle jobs

Page 7: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 7

Claiming protocol

● Startds keep their state current in the Collector– By periodically pushing updates

(every 5 mins by default)

● On a matchmaking cycle, the Negotiator will – Pull the startd slot list from the collector

● In Unclaimed state only (unless preemption enabled)

– Pull the job list from the schedds

– Create a priority list of matches

– Send the matches to relevant schedds

Page 8: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 8

Claiming protocol

● The schedd will contact the startd– Once the connection is accepted,

the schedd owns that slots

● The schedd will spawn a shadow– Which takes over the connection

– The schedd moves on to other business

● Similarly, the startd spawns a starter– And advertise a Claimed state

Page 9: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 9

Communication flow

Collector

Schedd Startd

Negotiator

Claimed/Idle

Page 10: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 10

HTCondor DaemonsStage 2

Collector

Schedd Startd

Negotiator

Shadow Starter

A shadow and a starter are created for every running job

Claimed/Busy

Page 11: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 11

HTCondor DaemonsStage 2

Collector

Schedd Startd

Negotiator

Shadow Starter

If the network connection is lost, either side can re-establish it.

Claimed/Busy

Page 12: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 12

HTCondor Daemons

● The shadow takes ownership of a running job– One per job

● The starter takes ownership of a claimed slot– One per slot

● Together they babysit the two sides until the jobs is done and the slot can be un-claimed

● Corollary:– Each schedd node will have O(10k) shadows

Page 13: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 13

Claiming protocol

● Once the job terminates– The starter goes away

– The schedd will send another job to the startd,unless

● The lease has expired● There are no more suitable jobs

– The existing shadow can be reused● But does not need to

● If the schedd does not send a new job– Startd goes into UnClaimed state

Page 14: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 14

Matchmaking and latency

● The Negotiator pulls the startd slot list from the Collector– In a single transaction

● The Negotiator pulls the job list from the schedds– Basically, one at a time!– But it does cluster similar jobs together at Schedd level– The idea being that it will not ask for more, if either the

user runs out of priority or no more slots are available● Negotiator thus sensitive to Network latencies

– Matching Schedds far away may be limited by network latency not Negotiator CPU use

Page 15: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 15

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

Mutual authentication betweenStartd and Collector

using x509 whitelisting

Mutual authentication betweenStartd and Collector/Negotiatorusing x509 whitelisting

Negotiator

Negotiator and Collectorco-located, FS auth

Page 16: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 16

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

After initial handshake,use shared secret

After initial handshake,use shared secret

Negotiator

Negotiator and Collectorco-located, FS auth

Full x509 expensive, used only on daemon restartand periodically once every few days

Page 17: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 17

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

Startd also sendsshared secretfor matchmaking

Page 18: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 18

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

Negotiator only authorizeduser of Startd's shared secrets

Negotiator

Startd shared secretsent on job match

Schedd may get many secrets,one per matched job

Page 19: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 19

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

Use given shared secret for auth

No other credentialsin play

Page 20: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 20

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

Shadow StarterShadow and starter inherit the socket

Also inherit shared secret, for reconnect

Page 21: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 21

Security considerationsThe glidein use case

Collector

Schedd Startd

Collector center of all trust

If Startd goes inUnclaimed state,a new secret iscreated and sent

Page 22: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 22

Security cost and scalabilityThe glidein use case

Collector

Schedd Startd

x509 too expensive for a single central service(both due to CPU use, and network latency issues)

Mutual authentication betweenStartd and Collector

using x509 whitelisting

Mutual authentication betweenStartd and Collector/Negotiatorusing x509 whitelisting

Glideins can startat 10+Hz rate

Page 23: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 23

Security cost and scalabilityThe glidein use case

Collector

Schedd Startd

Spread the load over multiple child CollectorsChild collectors forward all ads

Mutual authentication betweenStartd and child Collector

using x509 whitelistingNew Scheddsjoin rarely

Collector

Co-located,thus cheap

x N

Collector...

Randomly pick oneof many child Collectors

and then stick with it

Page 24: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 24

Network/Firewall considerations

Collector

Schedd Startd

Negotiator

Shadow Starter

HTCondor is conceptually a Peer-to-Peer systemEveryone talks to everyone

Page 25: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 25

Network/Firewall considerations

Collector

Schedd Startd

Shadow Starter

Execute nodes often behind firewalls and/or NATs

Page 26: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 26

Network/Firewall considerations

Collector

Schedd Startd

CCB protocol creates a tunnelCollector implements the CCB

✓ Startd->Collectorcommunicationstill direct

A separatechannel overlong livedTCP socket

Page 27: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 27

Network/Firewall considerations

Collector

Schedd Startd

CCB protocol creates a tunnelCollector implements the CCB

✓CCB delivers messages

to the startd

Page 28: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 28

Network/Firewall considerations

Collector

Schedd Startd

CCB protocol creates a tunnelCollector implements the CCB

✓Only callback requestgoes through CCB Startd opens

a long livedTCP connectionto the Schedd

All further communicationon that channel from there on

Page 29: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 29

Network/Firewall considerations

Collector

Schedd Startd

CCB protocol creates a tunnelCollector implements the CCB

Shadow and Starterinherit this socket

Shadow Starter

Page 30: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 30

CCB and scalability

Collector

Startd

A single central service cannot really handle all the loadProcesses usually limited to O(1k) sockets

We can haveO(10k+) glideins

Page 31: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 31

CCB and scalability

Collector

Startd

No real need to use a single CCB,could use any number of dedicated CCB Collectors

Collector

x N

Collector...

The standard strategy is to just piggy-back on

the “Child Collector” paradigm

Randomly pick oneof many CCBs

and then stick with it

Page 32: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 32

CCB and scalability

Collector

Schedd Startd

✓Only callback requestgoes through CCB Startd opens

a long livedTCP connectionto the Schedd

The Schedd now needs to accept incoming connectionsDefault HTCondor mechanism of “one port x connection” does not scale

(only ~30k usable ports in IP)

Page 33: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 33

CCB and scalability

Collector

Shared_Port_Daemon Startd

✓Only callback requestgoes through CCB

Startd opens a long livedTCP connection to the shared_port_daemon

HTCondor added shared_port_daemon to multiplex requests on a single port

Schedd

Specifying that itis for the schedd

Page 34: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 34

CCB and scalability

Collector

Shared_Port_Daemon Startd

Socket is movedto the schedd ✓

HTCondor added shared_port_daemon to multiplex requests on a single port

Schedd

Same node,local UNIX command

Page 35: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 35

CCB and scalability

Collector

Shared_Port_Daemon StartdSocket is movedto the schedd

Can be used by starter to contact the Shadow, too

Shadow

Same node,local UNIX command

Starter

Page 36: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 36

CCB and scalability

Collector

Startd

Starter also accepts incoming connectionsThus needs a CCB connection

Starter

✓Plus there is normallyone for the Master, too

And there isone Starterper slot

x N

Starter

Page 37: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 37

CCB and scalability

Collector

Startd

Adding a shared_port_daemon will cutnumber of CCB connections to exactly one

Starter

x N

Starter

Shared_Port_Daemon

Route incomingrequests to appropriatedaemon

Page 38: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 38

High Availability setup

Collector

Schedd Startd

Negotiator

Collector

x N

Collector...

Central Manager Node

Using a single CM node risky; if it dies, the pool dies with it.

Having multiple child collectorsdoes not help

Page 39: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 39

High Availability setup

Collector

Schedd Startd

Negotiator

Collector

x N

Collector...

Central Manager Node

HTCondor allows for 2 or more CM nodes

Schedds and Startds talk to all of themIncluding one CCB per CM node

Collector

Negotiator

Collector

x N

Collector...

Central Manager Node

Page 40: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 40

High Availability setup

Collector

Schedd Startd

Negotiator

Collector

x N

Collector...

Central Manager Node

There can be only one active Negotiator,to make user priority decision

HAD daemons maintain only one alivewith others in standby mode

Collector

Negotiator

Collector

x N

Collector...

Central Manager Node

HADHAD

Page 41: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 41

High Availability setup

Collector

Schedd Startd

Negotiator

Collector

x N

Collector...

Central Manager Node

Schedd “HA” typically just “partition the jobs between many schedds”Temporary Schedd downtimes result in other schedds taking over the slots

HAD daemons maintain only one alivewith others in standby mode

Collector

Negotiator

Collector

x N

Collector...

Central Manager Node

HADHAD

No real needfor Startd HA

Schedd

x M...

True Schedd HApossible, butrequires shared FS

Page 42: glideinWMS Training - UCSDTier2 · PDF fileAug 2014 HTCondor Overview 2 Overview These slides present a HTCondor overview, with high level views of – Deamons involved – Communication

Aug 2014 HTCondor Overview 42

The end