WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies

WLCG Overview Board, September 3rd

2010P. Mato, P.Buncic

Use of multi-core and virtualization technologies

Two R&D projects stated early 2008 in PH Department (under White Paper Theme 3) ◦ WP8 - Parallelization of Software Frameworks to exploit Multi-core

Processors◦ WP9 - Portable Analysis Environment using Virtualization

Technology

Kick-off Workshop on April 15, 2008 1st Workshop on adapting applications and computing services

to multi-core and virtualization took place in June 2009◦ number of follow up actions were identified

2nd Workshop on June 21-22, 2010◦ The goals were to review the progress on the follow-up actions,

get new feedback from the experiments and set directions for the two R&D WPs

IntroductionIntroduction

2

WP8 - Multicore R&DWP8 - Multicore R&D The aim of the Multicore R&D project is to investigate

novel software solutions to efficiently exploit the new multi-core architecture of modern computers in our HEP environment

Motivation: ◦ industry trend in workstation and “medium range” computing

Activity divided in four “tracks”◦ Technology Tracking & Tools◦ System and core-lib optimization◦ Framework Parallelization◦ Algorithm Optimization and Parallelization

3

ActivitiesActivities Code optimizationCode optimization◦ Direct collaboration with INTEL experts established to help analyzing

and improve the code Exploiting event parallelism◦ Sharing data between processes to save memory◦ Simulate events in different threads using Geant4 ◦ Parallel analysis using PROOF Lite and GaudiPython

Algorithm parallelization◦ Ongoing effort in collaboration with OpenLab and Root teams to

provide basic thread-safe/multi-thread library components Random number generators, parallel minimization/fitting

algorithms, parallel/Vector linear algebra Deployment issues◦ Current batch/grid infrastructure has to be configured to support

multi-core/full node allocation

4

• Aims to provide a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS) Code check-out, edition, compilation, local

small test, debugging, … Grid submission, data access… Event displays, interactive data analysis, … Suspend, resume…

• Decouple application lifecycle from evolution of system infrastructure

• Reduce effort to install, maintain and keep up to date the experiment software (CernVM-FS)

• CernVM 1.x (SLC4) and CernVM 2.x (SLC5) released

• Small (200-350 MB) image

• Available for all popular hypervisors and on Amazon Cloud (EC2Available for all popular hypervisors and on Amazon Cloud (EC2))

5

WP9 - Virtualization R&DWP9 - Virtualization R&D

CernVM Users

~2200 different IP addresses

ATLAS67%

LHCB19%

CMS6%

OTHERS

6

ProxyServerProxyServer


CernVMCernVM

CernVMCernVM

CernVMCernVM

HTTPserverHTTPserver









Web scale using Web technology

Proxy and slave servers could be deployed on strategic locations to

reduce latency and provide redundancy

Collaboration with CMS/Frontier deployment: reusing 70+ already

deployed squid servers

+Commercial SimpleCDN as backup

7

Ideally the analysis activity should be a continuum in terms of tools, paradigms, software frameworks, models..◦ Identical analysis applications should be able to run the same way on a

desktop/laptop, a small cluster, a large cluster and the Grid

CernVM is a convenient tool in hands of our end users/physicists and lets them use experiment software frameworks on their laptops ◦ with little overhead◦ without need to continously download, install and configure new

versions of experiment software ATLAS, CMS, LHCb, LCD, NA61, TH…

Physicist seem to like CernVM on their laptops and on the Cloud◦ Can we have it on the Grid, please?

From laptop to Cloud and From laptop to Cloud and GridGrid

21/6/10Welcome and Introduction, P. Mato/CERN 8

Actions from the 1Actions from the 1stst Workshop Workshop

9

Multicore◦ Try submission of parallel jobs (multi-threaded, multi-process,

MPI) with the existing LSF infrastructure◦ Deploy multi-core performance and monitoring tools◦ Running multi-core jobs on the Grid

Virtualization◦ Transition of CernVM beyond the R&D phase◦ Using CernVM images in virtualized batch systems◦ Prototype an ‘lxcloud’ solution for submitting user jobs using the

EC2/Nimbus API◦ Establish procedures for creating trusted images (e.g. CernVM)

acceptable for Grid sites◦ Investigate scenarios for reducing the need for public IP

addresses on WNs

Feedback from the 2Feedback from the 2ndnd Workshop (1)Workshop (1)

10

Experiments requested access to whole nodes ◦ Allow them to test multi-threaded and multi-process applications that

are being developed, and also their pilot frameworks managing the correct mix of jobs to best optimize the entire node resources

◦ Taking responsibility for ensuring that the node is fully utilized

Issues◦ The implementation would require end-to-end changes from

the end-user submission framework to the local batch configuration and Grid middleware

◦ Adaptation of the accounting (memory and CPU) and monitoring tools

◦ Need for better handing of large files that will eventually result from larger parallel jobs


11

Service Virtualization◦ Virtualization of Services is in general well accepted and the

experience so far is very positive◦ In particular VO Boxes virtualization is already planed (at

CERN)

Worker Node Virtualization◦ The lxclud prototype development based on OpenNebula at

CERN-IT is encouraging

Issues◦ Questions about the lifetime of the Virtual Machines and the

need of an API to control them were raised◦ Experiments expressed some concern about the performance

loss on Virtual Machines, in particular for I/O operations to local disk

Generation of Virtual Machine Images◦ HEPiX policy document has been prepared establishing obligations for

people providing virtual images

There is general agreement that the experiment software should be treated independently from the base operating system

CernVM File System can be game changer◦ Could be deployed on standard worker nodes to solve problem of

software distribution◦ Experiments requested support for the CernVM File System (CVMFS) as

the content delivery mechanism for adding software to a VM image after it has been instantiated

LHCb and ATLAS requested IT to host and provide 24*7 support for the CernVM infrastructure

12


Progress since the Progress since the WorkshopWorkshop

13

Multicore◦ ATLAS, CMS and LHCb have all released "production-grade" parallel multi-

process applications.◦ Both ATLAS and LHCb are now testing submission of "full-node" jobs.

Virtualization◦ Separated release cycles for CernVM-FS and CernVM◦ Multi VO support for CernVM-FS (via automounter)◦ CernVM as job hosting environment

Tested ATLAS/Panda pilot running in CernVM/CoPilot on lxcloud prototype

Theory group applications (MC generators) are now running on BOINC/CernVM

Developed contextualization tools For EC2 API and compatible infrastructure HEPIX compatible

◦ Benchmarking and performance evaluation of PROOF is under way

Summary Both WP8 & WP9 are making very good progress in close

cooperation with experiments

Applications being developed to exploit new hardware architectures and virtualization technology impose new requirements on the computing services provided by the local computer centers or by the Grids◦ We need to be able to submit jobs that require “whole node”◦ CernVM infrastructure services must be suported 24*7

Experiment would like to be given an opportunity to test these new developments on local batch clusters and on the Grid

14

Documents

WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies