45
Zach Miller ([email protected]) Todd Tannenbaum ([email protected]) Dan Bradley ([email protected]) Igor Sfiligoi ([email protected]) University of Wisconsin-Madison http://www.cs.wisc.edu/condor CHEP 2009, Prague Flexible Session Management in a Distributed System

Flexible Session Management in a Distributed System

  • Upload
    miracle

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Flexible Session Management in a Distributed System. Condor Communication Layer: CEDAR. CEDAR – Condor External DAta Representation C++ API for network sockets Cross-platform data representation Special attention to UDP issues Flexibility with private networks Strong focus on security. - PowerPoint PPT Presentation

Citation preview

Page 1: Flexible Session Management in a Distributed System

Zach Miller ([email protected])Todd Tannenbaum ([email protected])

Dan Bradley ([email protected])Igor Sfiligoi ([email protected])

University of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor

CHEP 2009, Prague

Flexible Session Management in a

Distributed System

Page 2: Flexible Session Management in a Distributed System

CHEP 2009, Prague 2www.cs.wisc.edu/condor

Condor Communication Layer: CEDAR

› CEDAR – Condor External DAta Representation C++ API for network sockets Cross-platform data representation Special attention to UDP issues Flexibility with private networks Strong focus on security

Page 3: Flexible Session Management in a Distributed System

CHEP 2009, Prague 3www.cs.wisc.edu/condor

Basic CEDAR Security Features

› Authentication

› Encryption

› Integrity Checks

› Credential Mapping

› Authorization Policy

Page 4: Flexible Session Management in a Distributed System

CHEP 2009, Prague 4www.cs.wisc.edu/condor

Basic CEDAR Security Features

› Autonegotiation of supported features

Client wants:KERBEROS, NTSSPI, 3DES,MD5

Server wants:KERBEROS, GSI, OPENSSL, 3DES, BLOWFISH,MD5

Page 5: Flexible Session Management in a Distributed System

CHEP 2009, Prague 5www.cs.wisc.edu/condor

Basic CEDAR Security Features

› Autonegotiation of supported features

Policy: KERBEROS3DESMD5

Page 6: Flexible Session Management in a Distributed System

CHEP 2009, Prague 6www.cs.wisc.edu/condor

Strong Authentication

› Strong Authentication can be expensive! PKI (OpenSSL and GLOBUS) is

relatively CPU-intensive KERBEROS hits the KDC All require network round trips

Page 7: Flexible Session Management in a Distributed System

CHEP 2009, Prague 7www.cs.wisc.edu/condor

Strong Authentication

› Network round trips in a distributed grid environment (like glideinWMS) may be over a wide area network having relatively large latency

Hi! I want to talk

OK, let’s try GSI

Here you go…

0.1s

0.1s

0.1s

Page 8: Flexible Session Management in a Distributed System

CHEP 2009, Prague 8www.cs.wisc.edu/condor

Strong Authentication

› In a single-threaded client blocking on network, all of those 0.1s quickly become a problem for scalability

› Solutions? Don’t block on network.

• recent progress, but more to do When something is expensive, cache it

• fortunately, already supported

Page 9: Flexible Session Management in a Distributed System

CHEP 2009, Prague 9www.cs.wisc.edu/condor

Security Session Cache

› Session is a semi-permanent information exchange which is set up and torn down

› Setup is costly, but done only once

› Resuming is done often, but is faster

› Tearing down is either explicit or based on expiration times

Page 10: Flexible Session Management in a Distributed System

CHEP 2009, Prague 10www.cs.wisc.edu/condor

Session Management

› Session Set UpI want to talk

OK, let’s authenticate

Authentication

0.1s

0.1s

0.2s

Page 11: Flexible Session Management in a Distributed System

CHEP 2009, Prague 11www.cs.wisc.edu/condor

Session Management

› Authentication results in secure key exchangeI want to talk

OK, let’s authenticate

Authentication

secret key:0x3E42

secret key:0x3E42(secret key is actually 192 bits)

Page 12: Flexible Session Management in a Distributed System

CHEP 2009, Prague 12www.cs.wisc.edu/condor

› The secret key is associated with a session ID

Call this session 1234

Session Management

I want to talk

OK, let’s authenticate

Authentication

sess 12340x3E42

sess 12340x3E42

Page 13: Flexible Session Management in a Distributed System

CHEP 2009, Prague 13www.cs.wisc.edu/condor

› Resuming a session• Sends the ID and uses the secret key

Use session 1234Here is my request, encrypted (with 0x3E42) to prove who I am.

Session Management

sess 12340x3E42

sess 12340x3E42

Here is your response

encrypted (with 0x3E42)to prove who I am.

Page 14: Flexible Session Management in a Distributed System

CHEP 2009, Prague 14www.cs.wisc.edu/condor

› The session is stored by both sides

Session Management

sess 1234key: 0x3E42authn: KERBEROSuser: [email protected]: cobalt.cs.wisc.eduauthz: ALLOW *.wisc.eduvalid until: 2009.03.24.17.50.00

Page 15: Flexible Session Management in a Distributed System

CHEP 2009, Prague 15www.cs.wisc.edu/condor

Basic Condor Operation

› User submits a job

› Condor schedules the job on an execute node

› Submit point connects to execute node and sends job

› Job runs to completion

› Execute node returns results

Page 16: Flexible Session Management in a Distributed System

CHEP 2009, Prague 16www.cs.wisc.edu/condor

High Level Condor Operation

Central Manager

Job Submit Point

ExecuteNode

(with authentication turned on)

Page 17: Flexible Session Management in a Distributed System

CHEP 2009, Prague 17www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Red lines represent authenticated connections

Execute node advertises itself.

High Level Condor Operation

(with authentication turned on)

Page 18: Flexible Session Management in a Distributed System

CHEP 2009, Prague 18www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

User submitsa job.

% condor_submit work.sub

High Level Condor Operation

(with authentication turned on)

Page 19: Flexible Session Management in a Distributed System

CHEP 2009, Prague 19www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Condor performs matchmaking.

Red lines represent authenticated connections

High Level Condor Operation

(with authentication turned on)

Page 20: Flexible Session Management in a Distributed System

CHEP 2009, Prague 20www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Submit node sends job to execute node

Red lines represent authenticated connections

High Level Condor Operation

(with authentication turned on)

Page 21: Flexible Session Management in a Distributed System

CHEP 2009, Prague 21www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Job runs tocompletion

High Level Condor Operation

(with authentication turned on)

Page 22: Flexible Session Management in a Distributed System

CHEP 2009, Prague 22www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Execute node sends job results back

Red lines represent authenticated connections

High Level Condor Operation

(with authentication turned on)

Page 23: Flexible Session Management in a Distributed System

CHEP 2009, Prague 23www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Job complete!

High Level Condor Operation

(with authentication turned on)

repeat ad-infinitum, reusingcached security sessions

Page 24: Flexible Session Management in a Distributed System

CHEP 2009, Prague 24www.cs.wisc.edu/condor

Problems Crossing the Atlantic› Experience in CMS CCRC-08 with

glideinWMS (dynamic Condor pool on top of grid): Even with sessions, authentication cost is killing

performance. Why didn’t we notice this in previous scale tests

at Fermilab?! Because network latency adds significantly to

cost of authentication in Condor (blocking, single-threaded).

Page 25: Flexible Session Management in a Distributed System

CHEP 2009, Prague 25www.cs.wisc.edu/condor

Larger Scale View

Job Submit Point

Execute NodesCentral Manager

Page 26: Flexible Session Management in a Distributed System

CHEP 2009, Prague 26www.cs.wisc.edu/condor

Larger Scale View

Job Submit Point

Execute Nodes

100,000 jobs

Central Manager

User submits 100,000 jobs

Page 27: Flexible Session Management in a Distributed System

CHEP 2009, Prague 27www.cs.wisc.edu/condor

Larger Scale View

Job Submit Point

Execute Nodes

100,000 jobs

Central Manager

Condor schedulesjobs and passes security session info for each match

Page 28: Flexible Session Management in a Distributed System

CHEP 2009, Prague 28www.cs.wisc.edu/condor

Larger Scale View

Job Submit Point

Execute Nodes

100,000 jobs

Central Manager

Send jobs to execute nodes

Lots of authentications!

Page 29: Flexible Session Management in a Distributed System

CHEP 2009, Prague 29www.cs.wisc.edu/condor

Meeting

› Igor: I will never give up on you guys (yet)

› Dan: all blocking network operations MUST be exterminated!

› Todd: don’t unwind all our code; use cooperative threads

› Miron: guys, listen

Page 30: Flexible Session Management in a Distributed System

CHEP 2009, Prague 30www.cs.wisc.edu/condor

The Plan

› The Central Manager authenticates both the submit point and the execute nodes

› Using this trust relationship, the Central Manager can help establish a security session between the two, good for the duration of the match.

Page 31: Flexible Session Management in a Distributed System

CHEP 2009, Prague 31www.cs.wisc.edu/condor

Integrated Security Sessions and Matchmaking

Central Manager

Job Submit Point

ExecuteNode

Page 32: Flexible Session Management in a Distributed System

CHEP 2009, Prague 32www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Red lines represent authenticated connections

Execute node advertises itself AND match session info

match sess 1278key: 0x72A9…

Integrated Security Sessions and Matchmaking

(encrypted)

Page 33: Flexible Session Management in a Distributed System

CHEP 2009, Prague 33www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

User submitsa job.

% condor_submit work.sub

Integrated Security Sessions and Matchmaking

Page 34: Flexible Session Management in a Distributed System

CHEP 2009, Prague 34www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Condor schedulesthe job AND passes match session infoto submitter

Red lines represent authenticated connections

sess 1278key: 0x72A9…

Integrated Security Sessions and Matchmaking

(encrypted)

Page 35: Flexible Session Management in a Distributed System

CHEP 2009, Prague 35www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Submit node sends job to execute node

Red line here represents resuming the match session, saving an authentication

Integrated Security Sessions and Matchmaking

sess 1278key: 0x72A9…

sess 1278key: 0x72A9…

Page 36: Flexible Session Management in a Distributed System

CHEP 2009, Prague 36www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Job runs tocompletion

Integrated Security Sessions and Matchmaking

Page 37: Flexible Session Management in a Distributed System

CHEP 2009, Prague 37www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Execute node sends job results back

Red line here represents resuming the match session, saving an authentication

Integrated Security Sessions and Matchmaking

sess 1278key: 0x72A9…

sess 1278key: 0x72A9…

Page 38: Flexible Session Management in a Distributed System

CHEP 2009, Prague 38www.cs.wisc.edu/condor

Central Manager

Job Submit Point

ExecuteNode

Job complete!

Integrated Security Sessions and Matchmaking

Page 39: Flexible Session Management in a Distributed System

CHEP 2009, Prague 39www.cs.wisc.edu/condor

What about Central Manager?

› Communication between submit node and execute node is now cheaper

› But Central Manager still has to authenticate everyone (at least once).

Page 40: Flexible Session Management in a Distributed System

CHEP 2009, Prague 40www.cs.wisc.edu/condor

Execute Node Ads

Job Submit Point

Execute NodesCentral Manager

many authentications

Page 41: Flexible Session Management in a Distributed System

CHEP 2009, Prague 41www.cs.wisc.edu/condor

Two Ideas

› Easy: 2-tier ClassAd collection done

› Hard: remove blocking network operations and/or use threads in progress

Page 42: Flexible Session Management in a Distributed System

CHEP 2009, Prague 42www.cs.wisc.edu/condor

2-tier CollectorExecute Nodes

Central Manager

Authentication workload distributed across multiple collectors.

sub-collectors

Big benefit, even withcollectors all on justone machine.

Page 43: Flexible Session Management in a Distributed System

CHEP 2009, Prague 43www.cs.wisc.edu/condor

Can we Cross the Atlantic?

› glideinWMS tests with one submit node: Before (condor 7.1.2)

• max 4,000 jobs/day• (500 simultaneously running)

After (condor 7.3.1)

• 200,000 jobs/day• (22k-25k simultaneously running)• now limited by port usage, not scheduler

throughput

Page 44: Flexible Session Management in a Distributed System

CHEP 2009, Prague 44www.cs.wisc.edu/condor

Conclusion› Security sessions essential in Condor

› 2-tier collector works

› Establishing security sessions through matchmaking is a big win. one submit node much more convenient than

50

Page 45: Flexible Session Management in a Distributed System

CHEP 2009, Prague 45www.cs.wisc.edu/condor

Conclusion

› Efficient delegation and caching of trust can be an important optimization in distributed systems.