17
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org The Use of Condor in the gLite Grid Middleware Erwin Laure Condor Week 14-15 March 2005

The Use of Condor in the gLite Grid Middleware

Embed Size (px)

DESCRIPTION

The Use of Condor in the gLite Grid Middleware. Erwin Laure Condor Week 14-15 March 2005. Contents. Overview on EGEE and gLite gLite and Condor Future plans. The EGEE Project. EU funded (2 years until March 2006) - PowerPoint PPT Presentation

Citation preview

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

The Use of Condor in the gLite Grid Middleware

Erwin Laure

Condor Week

14-15 March 2005

Condor Week 2005 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Contents

• Overview on EGEE and gLite

• gLite and Condor

• Future plans

Condor Week 2005 3

Enabling Grids for E-sciencE

INFSO-RI-508833

The EGEE Project• EU funded (2 years until March 2006)

• EGEE offers the largest production grid facility in the world open to many applications (HEP, BioMedical, generic)

• Existing production service based on LCG

• Next generation open source web-services middleware being re-engineered taking into account production/ deployment/ management needs

• Well-defined, distributed support structure to provide eInfrastructure that is available to many application domains

• Middleware Activity– Re-engineer and harden Grid middleware– Provide production quality middlewarewww.eu-egee.org

Network infrastructure

(GÉANT )

Op

era

tio

ns

, S

up

po

rt

an

d t

rain

ing

Collaborations

Global Grid

Condor Week 2005 4

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Activities

• 48 % service activities (Grid Operations, Support and Management, Network Resource Provision)

• 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development)

• 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation)

Emphasis in EGEE is on operating a productiongrid and supporting the end-users

Condor Week 2005 5

Enabling Grids for E-sciencE

INFSO-RI-508833

Country providing resourcesCountry anticipating joining

In LCG-2: 113 sites, 30 countries >10,000 cpu ~5 PB storage

Includes non-EGEE sites:• 9 countries• 18 sites

Computing Resources: Feb 2005

Condor Week 2005 6

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Grid Middlewareguiding principles

• Service oriented approach– Allow for multiple interoperable implementations

• Lightweight (existing) services – Easily and quickly deployable– Use existing services where possible

Condor, EDG, Globus, LCG, … • Portable

– Being built on Scientific Linux and Windows• Security

– Sites and Applications• Performance/Scalability & Resilience/Fault Tolerance

– Comparable to deployed infrastructure• Co-existence with deployed infrastructure

– Co-existence with LCG-2 and OSG (US) are essential for the EGEE Grid services

• Site autonomy– Reduce dependence on ‘global, central’ services

• Open source license

Condor Week 2005 7

Enabling Grids for E-sciencE

INFSO-RI-508833

From Development to Product

• Fast prototyping approach– Small scale testbed (initially CERN and

Wisconsin)

• Single out individual components for deployment on pre-production service (originally LCG-2/EGEE0 based)

• These components need to go through integration and testing

– To ensure they are deployable and basically work

LCG-2 (=EGEE-0)

prototyping

prototyping

product

20042004

20052005

LCG-3 EGEE-1

product

Condor Week 2005 8

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Services and Responsible Clusters

Grid AccessService

API

Access Services

JobProvenance

Job Management Services

ComputingElement

WorkloadManagement

PackageManager

MetadataCatalog

Data Services

StorageElement

DataManagement

File & ReplicaCatalog

Authorization

Security Services

AuthenticationAuditing

Information &Monitoring

Information & Monitoring Services

ApplicationMonitoring

Site Proxy

Accounting

JRA3 UK

CERN IT/CZ

Condor Week 2005 9

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Services for Release 1

Grid AccessService

API

Access Services

JobProvenance

Job Management Services

ComputingElement

WorkloadManagement

PackageManager

MetadataCatalog

Data Services

StorageElement

DataManagement

File & ReplicaCatalog

Authorization

Security Services

AuthenticationAuditing

Information &Monitoring

Information & Monitoring Services

Application

Monitoring

Site Proxy

Accounting

JRA3 UK

CERN IT/CZ

Focus on key services

Release date is M

arch 31st 2005

Condor Week 2005 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Condor and gLite

• Design team including representatives from Middleware providers (AliEn, Condor, EDG, Globus,…) including US partners produced middleware architecture and design.– Takes into account input and experiences from applications,

operations, and related projects – DJRA1.1 – EGEE Middleware Architecture (June 2004)

https://edms.cern.ch/document/476451/

– DJRA1.2 – EGEE Middleware Design (August 2004) https://edms.cern.ch/document/487871/

• Wisconsin is one of the sites of the development prototype– Using Condor pool as backend– Using Globus RLS

• Use VDT distribution of Condor and Globus

Condor Week 2005 11

Enabling Grids for E-sciencE

INFSO-RI-508833

WMS

•Condor-C

Condor Week 2005 12

Enabling Grids for E-sciencE

INFSO-RI-508833

The current gLite CE

• Collaboration of INFN, Univ. of Chicago, Univ. of Wisconsin-Madison, and the EGEE security activity (JRA3)

LSF PBS/Torque

Condor

Gatekeeper

LCASLCMAPS

WSS

CEMon

Condor-CBlahpd

NotificationsLaunch

Condor-CLaunch

Condor-C

Submitjob

Localbatchsystem

CE

Condor Week 2005 13

Enabling Grids for E-sciencE

INFSO-RI-508833

• Efficient and reliable data storage, movement, and retrieval on the infrastructure

• Storage Element– Reliable file storage (SRM based storage systems)– Posix-like file access (gLite I/O)– Transfer (gridFTP)

• File and Replica Catalog– Resolves logical filenames (LFN) to physical location of files

(URL understood by SRM) and storage elements– Hierarchical File system like view in LFN space– Single catalog or distributed catalog (under development)

deployment possibilities• File Transfer and Placement Service

– Reliable file transfer and transactional interactions with catalogs– Stork being evaluated

• Data Scheduler– Scheduled data transfer in the same spirit as jobs are being

scheduled taking into account e.g. network characteristics (collaboration with JRA4)

– Under development• Metadata Catalog

– Limited metadata can be attached to the File and Replica Catalog

– Interface to application specific catalogs have been defined

Data Management Services

SRM

I/O Gri

dFT

P

Storage Element

TransferAgent

FPS

Data Scheduler

FPSFPSFPSVOs

Site boundary

LocalCat

Catalog

MO

M

CatalogVOs

LocalCat

Catalog

MO

M

LocalCat

Catalog

MO

M

CatalogCatalog

Condor Week 2005 14

Enabling Grids for E-sciencE

INFSO-RI-508833

Evolutions foreseen in 2005

Here follows a list of main topics we still need to address, details andother topics need to be worked out with operations and applications

• WMS– WS Interface

Better support for bulk job submission

• CE– Head node monitoring

Guard and if necessary pause/resume services running on the head node

– SUDO service Currently one Condor-C instance needed per user Condor-C should run under a VO user and submit jobs via a sudo service to

the local batch system

– This is being done in collaboration with Condor and Globus

Condor Week 2005 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Evolutions foreseen in 2005

• Catalogs– Distributed and single deployment options

• FTS/FPS– Channel management– Clarify the role of Stork

• Data Scheduler– “broker” for data transfer “jobs”

• R-GMA Information system– Web service version

• Package Manager

Second major g

Lite re

lease

foreseen at the end of 2

005

Condor Week 2005 16

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• Contributions of Condor to gLite span whole process– Design – prototyping – product

• International collaborations in Grid middleware are essential– Grid middleware cannot be developed within a single corner

Effective exchange of ideas, requirements, solutions and technologies Coordinated development of new capabilities Open communication channels Early detection of differences and disagreements

– Condor link to OSG is very important to gLite– A common integration and testing project is being defined

• gLite process, in particular through the strong interactions between European (EGEE) and US (Condor and Globus) projects is a first step towards truly international collaborative middleware development

Condor Week 2005 17

Enabling Grids for E-sciencE

INFSO-RI-508833

More information

http://www.glite.org

http://cern.ch/egee-jra1