14
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

Embed Size (px)

Citation preview

Page 1: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

1

Bridging Clouds with CernVM:

ATLAS/PanDA example

Wenjing Wu2010-8-27

Page 2: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

2

Outline

ATLAS computing model (PanDA)

Extending ATLAS computing model to use Cloud computing resources

Challenges

Solution

Work Done

Page 3: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

3

1.Submit jobs to PanDA server

2.Pilots are submitted to work nodes

3.Pilot checks environment, fetch jobs from PanDA server

Storage ElementLogical File

Catalog

4.Pilot upload and register output files after job done

5.Pilot updates job status to PanDA server

6. PanDA server managers the final data transfer

PanDA - the Production and Distributed Analysis system for the ATLAS Experiment

Page 4: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

4

Extending ATLAS computing model to use Cloud Computing

resources What are Clouds (in nowadays common terms)?

Virtualized computing resources provided by academic and commercial institutions (e.g. CERN lxcloud, Amazon EC2)

The resources provided by users participating in volunteer computing projects (e.g. BOINC)

The goal:

Run ATLAS production jobs on Cloud Computing resources.

Page 5: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

5

Challenges!Transparency: users and production operators should not notice the difference

The whole set of Cloud resources should appear to PanDA server as just another Grid site

Credentials (which are essential for the functioning of PanDA pilot) can not be brought into the ‘untrusted’ environment (e.g. to the machines of the volunteers)

Page 6: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

6

Solve the challenge using CernVM

CernVMProvides a lightweight virtual machine

image containing the applications of LHC experiments

The application software is distributed through HTTP based content delivery network and is cached locally

Provides Co-Pilot: a framework for the delivery and execution of the workload on remote virtual machines

Page 7: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

7

Co-Pilot Job Manager

Co-Pilot Storage Manager

Storage ElementLogical File

Catalog

Co-Pilot Client

1. submit PanDA job

2. submit Co-Pilot job

3. Agent get a Co-Pilot job which launches the PanDA pilot

4. Pilot fetch PanDA job and runs it

5. uploads output to temporary storage after job finished

6. uploads and register output files

7 update job final status to PanDA server

Cloud resources provided through

VMs running Co-Pilot Agent

CernVM Co-Pilot

Integration!

Page 8: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

8

WorkDone (1)Setup CERNVM site (part of ATLAS Grid infrastructure)

Is a dynamic virtual cluster formed by virtual machines running CernVM Co-Pilot Agents

Is configured according to ATLAS computing conventions

Appears to ATLAS Grid central services as a Tier 2 site

Page 9: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

9

Work Done(2)Adaptation of PanDA Pilot:

Adding support for the heterogeneous structure of the software repository

Adding support for saving job output metadata and job status files

Development of Co-Pilot Storage Manager

A component running in the trusted environment and acting as a proxy between Co-Pilot agents and PanDA Grid services

Page 10: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

10

Page 11: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

11

Thanks!

Page 12: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

12

Solve the challenge using CernVM

CernVM Co-Pilot is to help to run ATLAS PanDA job in a non-credentialed computing environment.

CernVM Co-Pilot Components:

Co-Pilot client: submit jobs to Co-Pilot JobManager

Co-Pilot Server:

Co-Pilot Job Manager: dispatch jobs to Co-Pilot Agents

Co-Pilot Storage sManager: upload /register output files, change job status with credential

Co-Pilot Agent: runs the jobs on non-credentialed computer nodes

Page 13: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

13

Ingredients

CernVM

Provides an ultralight image for different hyper-visors

ATLAS software is distributed by CVMFS, cached locally

Co-Pilot

Co-Pilot Agent is distributed with CernVM image

schedule jobs to CernVM virtual clusters

Page 14: 1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu 2010-8-27

14

Co-Pilot Storage Manager

How CoPilot SM(Storage Manager) works?receives “JobDone” message from Co-Pilot agent(JobID is included)

SM calls the Co-Pilot_Data_Mover which extracts metadata of job output from pilot log, upload files to designated SE and register them to designated LFC catalog

SM verify the status of file uploading and registration

SM calls Co-Pilot_Job_Status_Updater which update the status to PanDA server(finished or failed)

Both Co-Pilot_Data_Mover and Co-Pilot_Job_Status_Updater are python scripts using libraries from pilot source code