52
[email protected] Grid Infrastructu re

Grid Infrastructure

Embed Size (px)

DESCRIPTION

Grid Infrastructure. [email protected]. What is it ?. SERVERS. Clients. IT all about IT. Hardware utilization. SOA & Web services. Decompose processing into services Each service works independently Main components: Universal Description, Discovery and Integration - PowerPoint PPT Presentation

Citation preview

Page 1: Grid Infrastructure

[email protected]

Grid Infrastructure

Page 2: Grid Infrastructure

What is it ?

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3

SERVERS

Clients

Page 5: Grid Infrastructure

SOA & Web services

• Decompose processing into services

• Each service works independently

• Main components:– Universal Description, Discovery and Integration– Simple Object Access Protocol – Web Services Description Language

• W3C standard

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 6

Page 6: Grid Infrastructure

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 7

Page 7: Grid Infrastructure

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 8

Page 8: Grid Infrastructure

THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson)

• Google grid• Microsoft's live.com • Yahoo!• Amazon.com• eBay• Salesforce.com

Well, that's O(5) ;)

Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five)

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9

Page 9: Grid Infrastructure

Scaling• Scale-up

– Add more resources within the system– Does not requires changes in the applications– Limited extension– Singe point of failure

• Scape-out– Add more systems– Architecture dependent (needs change of code)– Economically

• Howto ?– Split the operation into groups– Perform each group on a different machine

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10

Page 10: Grid Infrastructure

How fast can parallelization be ?

• Let:– α be the proportion of the process that can not be

parallelized.– P – number of processors– S – System speedup

Amdhals law:

S = 1 / (α + (1- α ) / P )

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11

Page 11: Grid Infrastructure

Cluster types• High availability

– Active-Active– Active-Passive– Heart beat

• Load Balancing Cluster– Round robin (weighted/non-weighted)– System status aware (session, cpu load, etc)

• Compute cluster– Queuing system (condor, hadoop, open-pbs, LSF, etc.)– Single system image (ScaleMP, SSI, Mosix, nomad,etc.)

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12

Page 12: Grid Infrastructure

Condor script ################# # Sample script # #################

Executable = /bin/hostname when_to_transfer_output = ON_EXIT_OR_EVICT Log = {file name}.log Error = err.$(Process) Output = out.$(Process) Requirements = substr(Machine,0,4)=="dopp"

&& ARCH=="X86_64" Arguments = +-u notification = Complete Universe = VANILLA Queue 10

Page 13: Grid Infrastructure

From a single PC to a Grid

Farm of PCs

Examples:

Seti@home

Africa@home

Example:

EGEE

Enterprise grid:Mutualization of resources in a company

Volunteer computing: CPU cycles made available by PC owners

Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)

Page 14: Grid Infrastructure

Batch to On-Line scale

gLite

&

Globus

Dedicated resources

PBS Torque

Utility computing

(Condor)hadoop

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15

Page 15: Grid Infrastructure

Key Cloud Services Attributes• Off-Site, Thirds-party provider• Access via Internet• Minimal/no IT skills required to “implement”• Provisioning - self-service requesting; near

real-time deployment; dynamic & fine-grained scaling

• Fine-grained usage-based pricing model• UI - browser and successors• Web services APIs as System Interface• Shared resources/common versions

Source: IDC, Sep 2008

Page 16: Grid Infrastructure

What is “Grid”

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17

Page 17: Grid Infrastructure

What is Grid Computing ?

Definition is not widely agreed

Foster & Kesselman:

• Computing resources are not administered centrally.

• Open standards are used.

• Non-trivial quality of service is achieved.

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18

Page 18: Grid Infrastructure

Other definitions

• "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner)

• "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements“ (Buyya )

• "a service for sharing computer power and data storage capacity over the Internet." (CERN)

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19

Page 19: Grid Infrastructure

Virtual Organization

• What’s a VO?– People in different organisations

seeking to cooperate and share resources across their organisational boundaries

• Why establish a Grid?– Share data– Pool computers– Collaborate

• The initial vision: “The Grid”• The present reality: Many “grids” • Each grid is an infrastructure

enabling one or more “virtual organisations” to share computing resources

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20

Institute A

VO1

Institute C

Institute B

Institute D

Institute E

VO2Institute F

Page 20: Grid Infrastructure

The Grid Metaphor

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21

GRID

MIDDLEWARE

Visualising

Workstation

Mobile Access

Supercomputer, PC-Cluster

Data-storage, Sensors, Experiments

Internet, networks

Page 21: Grid Infrastructure

Stand alone computer

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22

Hardware

Operating system

Application

Page 22: Grid Infrastructure

Stand alone computer

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23

Hardware

Operating system

Network stack

Application

Page 23: Grid Infrastructure

Stand alone computer

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24

Hardware

Operating system

Network stack

Grid Middleware

Application

Page 24: Grid Infrastructure

Middleware components – The batch approach

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 25

Information Information ServiceService

SE & CE info

Pu

blis

h

Input “sandbox” + Broker Info

ReplicaReplicaCatalogueCatalogueDataSets info

Logging &Logging &Book-keepingBook-keeping

Author.&Authen.

StorageStorageElementElement

ComputingComputingElementElement

Output “sandbox”

ResourceResourceBrokerBroker

Job Status

Job S

ub

mit

Even

t

Job

Qu

ery

Job

Stat

us

Input “sandbox”

Output “sandbox”

““User User interface”interface”

Page 25: Grid Infrastructure

UI

NetworkServer

Job Contr.

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

Characts.& status

Page 26: Grid Infrastructure

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

submitted

Job Status

UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)

Page 27: Grid Infrastructure

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

edg-job-submit myjob.jdlMyjob.jdl

JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;

submitted

Job Statu

s

Job Description Language(JDL) to specify job characteristics and requirements

Page 28: Grid Infrastructure

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Input Sandboxfiles

Jobwaiting

submitted

Job StatusNS: network daemon

responsible for acceptingincoming requests

Page 29: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

WM: acts to satisfy the request

Job

Workload manager

Page 30: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

Where must thisjob be executed ?

Page 31: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Matchmaker: responsible to find the “best” CE for a job

Page 32: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Where are (which SEs) the needed data ?

What is thestatus of the

Grid ?

Page 33: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

CE choice

Page 34: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

JobAdapter

Job Adapter: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, PFN, etc.)

Page 35: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job Controller: responsible for theactual job managementoperations (done via CondorG)

Job

submitted

waiting

ready

Page 36: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job

submitted

waiting

ready

scheduled

Page 37: Grid Infrastructure

“Compute element” – reminder!

Homogeneous set of worker nodes

Grid gate node

Local resource management system:Condor / PBS / LSF master

Globus gatekeeper

Job request

Info system

Logging

gridmapfile

I.S.

Logging

Page 38: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

submitted

waiting

ready

scheduled

running

“Grid enabled”data transfers/

accesses

Job

InputSandboxfiles

Page 39: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

Page 40: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

submitted

waiting

ready

scheduled

running

done

edg-job-get-output <dg-job-id>

Page 41: Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

cleared

Page 42: Grid Infrastructure

Job monitoring

UI

Log Monitor

Logging &Bookkeeping

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ComputingElement

RB node

LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB

LB: receives and stores job events; processes corresponding job status

Log ofjob events

edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>

Job status

Page 43: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 44

Approaches to Security: 1

The Poor Security House

Page 44: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 45

Approaches to Security: 2

The Paranoid Security House

Page 45: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 46

Approaches to Security: 3

The Realistic Security House

Page 46: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 47

Mapping certificate to local user

• Site use local accounting system

• Pool of users dedicated for the Grid

• Each user is mapped using gridmap file or VOMS

• Mapping can implement local policy on external users

Page 47: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 48

Certificate Request

Private Key encrypted on

local disk

CertificateRequest

Public Key

ID

Cert

User generatespublic/private

key pair.

User send public key to CA along

with proof of identity.

CA confirms identity, signs

certificate and sends back to user.

slide based on presentation given by Carl Kesselman at GGF Summer School 2004

Public

Page 48: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 49

Inside the Certificate

• Standard (X.509) defined format.

• User identification (e.g. full name).

• Users Public key.

• A “signature” from a CA created by encoding a unique string (a hash) generated from the users identification, users public key and the name of the CA. The signature is encoded using the CA’s private key. This has the effect of:– Proving that the certificate came from the CA.– Vouching for the users identification.– Vouching for the binding of the users public key to their

identification.

NameIssuer: CAPublic KeySignature

Page 49: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 50

Mutual Authentication

A sends their certificate;

B verifies signature in A’s certificate;

B sends to A a challenge string;

A encrypts the challenge string with his private key;

A sends encrypted challenge to B

B uses A’s public key to decrypt the challenge.

B compares the decrypted string with the original challenge

If they match, B verified A’s identity and A can not repudiate it.

AA BBA’s certificateA’s certificate

Verify CA signatureVerify CA signature

Random phraseRandom phrase

Encrypt with A’ s private keyEncrypt with A’ s private key

Encrypted phraseEncrypted phrase

Decrypt with A’ s public keyDecrypt with A’ s public key

Compare with original phraseCompare with original phrase

Page 50: Grid Infrastructure

Grid Operation and Security by Eddie Aronovich, Mar 2008 51

Proxy certificate

• Avoid passphrase re-enter by creating a proxy• Proxy consists of a new certificate and a private key• Proxy certificate contains the owner's identity (modified) • Remote party receives proxy's certificate (signed by

the owner), and owner's certificate. • Proxy certificate is life-time limited• Chain of trust from the CA to proxy through the owner

Page 51: Grid Infrastructure

Grids in Europe

www.eu-egi.eu

52EGEE08 Istanbul, Turkey

•www.eu-egi.eu

•Prof. Dieter KRANZLMUELLER , EGEE 08

Page 52: Grid Infrastructure

To be continued

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 53