Upload
igor-sfiligoi
View
456
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.
Citation preview
UCSD Jan 17th 2012 Frontend Internals 1
glideinWMS Training @ UCSD
glideinWMS frontendInternals
by Igor Sfiligoi (UCSD)
UCSD Jan 17th 2012 Frontend Internals 2
Refresher - Glideins
● A glidein is just a properly configured Condor execution node submitted as a Grid job● glideinWMS
provides automation Collector
Negotiator
Central manager
Submit node
Schedd
Execution node
Startd
Job
Submit node
Submit node
glideinWMS
GlobusGlobus
CREAMExecution nodeglidein
Execution nodeglidein
Execution nodeglidein
glidein
UCSD Jan 17th 2012 Frontend Internals 3
Refresher – Glidein Frontend
● The frontend monitors the user Condor pool,does the matchmaking and requests glideins● Factory a slave
Factory node
Condor
Factory
Frontend node
Frontend
CREAM
Globus
Submit node
Submit node
Central manager
Execution nodeglidein
Execution nodeglidein
Worker node
glideinMonitorCondor
Requestglideins
Submitglideins
MatchStartd
Job
Configure Condor G.N.
UCSD Jan 17th 2012 Frontend Internals 4
Refresher - Cardinality
● N-to-M relationship● Each Frontend can talk to many Factories● Each Factory may serve many Frontends
Startd
Glidein Factory
ScheddUser job
Collector
Negotiator
VO Frontend
StartdUser job
ScheddCollector
Negotiator
VO Frontend
StartdUser jobGlidein Factory
UCSD Jan 17th 2012 Frontend Internals 5
Frontend architecture
● The frontend is composed of:● The Condor daemons● The glideinWMS frontend proper● Condor client – to talk to the factories● Web server – deliver code and data to glideins
+ monitoring
● The glideinWMS frontend itself composed of:● Group processes – do the real work● Master frontend – controls the others and
aggregates monitoring
UCSD Jan 17th 2012 Frontend Internals 6
Frontend arch - Picture
Frontend node
Factory
Frontend
EntryGroup Group
Spawn
...
Factory
glidein
WebServer
Submit node
Submit node
Central manager
Frontend Domain
UCSD Jan 17th 2012 Frontend Internals 7
Condor processes
● Explained in enough detail in previous talk● Will not repeat myself
Collector
Negotiator
Central manager
Submit node
Schedd
Submit node
Submit node
UCSD Jan 17th 2012 Frontend Internals 8
Frontend processes
● Real work performed by Group process● glideinFrontendElement.py● One process x Group
● They are controlled by master Frontend● glideinFrontend.py● Starts the other processes● Aggregates monitoring
Frontend ==Frontend Group
in the rest of the talk
UCSD Jan 17th 2012 Frontend Internals 9
Frontend role
● The VO frontend is the brain of a glideinWMS-based pool● Like a site-level “negotiator”
Factory node
Frontend
Submit node
Submit node
Central manager
MonitorCondor
Requestglideins
Match
VO domain Findidle jobs
Findentries
Match
Requestglideins
Factory node
UCSD Jan 17th 2012 Frontend Internals 10
Reminder - Two level matchmaking
● The frontend triggers glidein submission● The “regular” negotiator matches jobs to glideins
Collector
Negotiator
Central manager
Submit node
Schedd
Execution node
Startd
Job
Factory
GlobusGlobus
CREAMExecution nodeglidein
Execution nodeglidein
Execution nodeglidein
glidein
Frontend
UCSD Jan 17th 2012 Frontend Internals 11
Matchmaking logic
● The Frontend matchmaking policy is implemented centrally● By the VO admin – not by the users
● It can use the attributes from both the job and Factory ClassAds
● Should be kept in sync with Negotiator policy● Which is not centralized● One way to define in the glidein START expression● Unfortunately, one python expression other ClassAds
UCSD Jan 17th 2012 Frontend Internals 12
Example matchmaking logic
● Frontend
job.has_key("DESIRED_Sites") &&glidein["attrs"].get("GLIDEIN_Site") in job["DESIRED_Sites"].split(",")
● Negotiator (via glidein START)
GLIDECLIENT_Start = stringListMember(GLIDEIN_Site, DESIRED_Sites,",")=?=True
More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html
UCSD Jan 17th 2012 Frontend Internals 13
Communication Protocol
● No listen sockets● All communication one way (Frontend->Factory)
● Each Factory provides a Collector● Communication based on ClassAds● All security implemented in the Collector
● Use standard cmdline tools for communication● condor_status and condor_advertise
UCSD Jan 17th 2012 Frontend Internals 14
Protocol sequence
● Polling loop● Read Factory ClassAds from all factory Collectors● Match against jobs● Advertise own existence and requests
● Frontend sends 4 types of info● Own identity● Glidein submission regulation instructions● Glidein parameters● Pilot Proxy
UCSD Jan 17th 2012 Frontend Internals 15
Glidein submission regulation
● The glideinWMS glidein request logicis based on the principle on “constant pressure”● Frontend Group requests a certain number of
“idle glideins” in the factory queue at all times● It does not request a specific number of glideins
● This is done due to the asynchronous nature of the system● Both the factory entries and the frontend groups are
in a polling loop and talk to each other indirectly
UCSD Jan 17th 2012 Frontend Internals 16
Glidein requests
● Frontend matches job attrs against entry attrs● It then counts the matched idle jobs● A fraction of this number becomes the
“pressure requests” (up to 1/3)● This number is then capped (~20)● The attribute in the ClassAd is
ReqIdleGlideins
● The Frontend also advertisesReqMaxRunningGlideins● Emergency break
UCSD Jan 17th 2012 Frontend Internals 17
Scaling back
● The Frontend can also request that existing glideins in the Factory queues are removedReqRemoveExcess● NO – Default, never remove● WAIT – Remove any glidein not yet at a site● IDLE – Remove any glidein that has not started yet● ALL – Remove all glideins
● Frontend pretty conservative● Only requests removal if no user jobs in the queues
UCSD Jan 17th 2012 Frontend Internals 18
Parameters
● Frontend can send attributes to glideins:● Dynamically – as parameter in the ClassAd● Statically – as entry in a config file
● Attributes typically static● Current Frontend implementation does not really
have much support for dynamicity
UCSD Jan 17th 2012 Frontend Internals 19
Pilot proxy delegation
● Pilot proxy is encrypted with factory pub key● Then published in the ClassAd● Only owner of priv. key can decrypt it
● However● Must make sure we are talking to a trusted Factory!
– not just anyone providing a pub key● More details in a few slides
Factory node
Collector
Entry
Frontend node
Frontend
Get key
Deliver proxy(encrypted)
Globusglidein
glidein
Useproxy
UCSD Jan 17th 2012 Frontend Internals 20
Pilot proxy selection
● A Frontend must have at least one pilot proxy● But can have more than one
● Many proxies can be used for priority reasons● When competing with non-pilot submission● Want to have as many proxies as users served
● Proxy selection plugin based
UCSD Jan 17th 2012 Frontend Internals 21
Pilot proxy plugins
● Several standard plugins● ProxyFirst – Only the first listed● ProxyAll – All listed● ProxyUserCardinality – First N, with N=#users● ProxyUserMapWRecycling – N, with pilot-to-user mapping
● VO admin could implemented his own, if desired
Most used
UCSD Jan 17th 2012 Frontend Internals 22
Factory ClassAd
UCSD Jan 17th 2012 Frontend Internals 23
Frontend ClassAd
UCSD Jan 17th 2012 Frontend Internals 24
Frontend node
Frontend
Security - Authorization
● Mutual authorization● The frontend admin decides
which Factories to talk to● The factory admin decides
which Frontends to serve● Based on x509 Dns
● Both sides have whitelists Factory node
Collector
Factory
Frontend node
Frontend
Factory node
Collector
Factory
Authentication basedon GSI/x509
Frontend needs a service proxy
UCSD Jan 17th 2012 Frontend Internals 25
Trusting the factory key
● It is all just ClassAds!● Anyone can publish a ClassAd and declare to be a factory
● However, Factory Collector knows who published it● And advertises it as the attribute AuthenticatedIdentity
● Cannot be faked by the client
● Frontend has a whitelistof trusted factories
Collector
Factory
Frontend
Frontend
a1b1c1ID1
a2b2c2ID2
a3b3c3ID3
UCSD Jan 17th 2012 Frontend Internals 26
Security handles
● As we said, mutual authentication with Factory● Frontend provides (and Factory whitelists)
● Service Proxy to talk to Factory Collector● Frontend Security name● Proxy Security Class
● Frontend whitelists (obtained from Factory admins)
● Factory Collector DN● Own mapping @Factory● Factory mapping @Factory
One set per factory collector
One per pilot proxy
One setfor wholeFrontend(all Groups)
UCSD Jan 17th 2012 Frontend Internals 27
Security within the VO domain
● Frontend process, Collector and schedds often not on the same node● Need network security
● All processes must whitelist each other● Again, GSI based
Frontend
Schedd
Schedd
Collector/Negotiator
MonitorCondor
Could be even over WANCMS setup has nodesin CA, IL and Europe
UCSD Jan 17th 2012 Frontend Internals 28
THE END
UCSD Jan 17th 2012 Frontend Internals 29
Pointers
● The official project Web page ishttp://tinyurl.com/glideinWMS
● glideinWMS development team is reachable [email protected]
● OSG glidein factory at UCSDhttp://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactoryhttp://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html
UCSD Jan 17th 2012 Frontend Internals 30
Acknowledgments
● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI
● The glideinWMS factory operations at UCSD is sponsored by OSG
● The funding comes from NSF, DOE and the UC system