Upload
lillian-clarke
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
WLCG Overview Board, September 3rd
2010P. Mato, P.Buncic
Use of multi-core and virtualization technologies
Two R&D projects stated early 2008 in PH Department (under White Paper Theme 3) ◦ WP8 - Parallelization of Software Frameworks to exploit Multi-core
Processors◦ WP9 - Portable Analysis Environment using Virtualization
Technology
Kick-off Workshop on April 15, 2008 1st Workshop on adapting applications and computing services
to multi-core and virtualization took place in June 2009◦ number of follow up actions were identified
2nd Workshop on June 21-22, 2010◦ The goals were to review the progress on the follow-up actions,
get new feedback from the experiments and set directions for the two R&D WPs
IntroductionIntroduction
2
WP8 - Multicore R&DWP8 - Multicore R&D The aim of the Multicore R&D project is to investigate
novel software solutions to efficiently exploit the new multi-core architecture of modern computers in our HEP environment
Motivation: ◦ industry trend in workstation and “medium range” computing
Activity divided in four “tracks”◦ Technology Tracking & Tools◦ System and core-lib optimization◦ Framework Parallelization◦ Algorithm Optimization and Parallelization
3
ActivitiesActivities Code optimizationCode optimization◦ Direct collaboration with INTEL experts established to help analyzing
and improve the code Exploiting event parallelism◦ Sharing data between processes to save memory◦ Simulate events in different threads using Geant4 ◦ Parallel analysis using PROOF Lite and GaudiPython
Algorithm parallelization◦ Ongoing effort in collaboration with OpenLab and Root teams to
provide basic thread-safe/multi-thread library components Random number generators, parallel minimization/fitting
algorithms, parallel/Vector linear algebra Deployment issues◦ Current batch/grid infrastructure has to be configured to support
multi-core/full node allocation
4
• Aims to provide a complete, portable and easy to configure user environment for developing and running LHC data analysis locally and on the Grid independent of physical software and hardware platform (Linux, Windows, MacOS) Code check-out, edition, compilation, local
small test, debugging, … Grid submission, data access… Event displays, interactive data analysis, … Suspend, resume…
• Decouple application lifecycle from evolution of system infrastructure
• Reduce effort to install, maintain and keep up to date the experiment software (CernVM-FS)
• CernVM 1.x (SLC4) and CernVM 2.x (SLC5) released
• Small (200-350 MB) image
• Available for all popular hypervisors and on Amazon Cloud (EC2Available for all popular hypervisors and on Amazon Cloud (EC2))
5
WP9 - Virtualization R&DWP9 - Virtualization R&D
CernVM Users
~2200 different IP addresses
ATLAS67%
LHCB19%
CMS6%
OTHERS
6
ProxyServerProxyServer
ProxyServerProxyServer
CernVMCernVM
CernVMCernVM
CernVMCernVM
HTTPserverHTTPserver
HTTPserverHTTPserver
ProxyServerProxyServer
HTTPserverHTTPserver
HTTPserverHTTPserver
ProxyServerProxyServer
HTTPserverHTTPserver
HTTPserverHTTPserver
ProxyServerProxyServer
Web scale using Web technology
Proxy and slave servers could be deployed on strategic locations to
reduce latency and provide redundancy
Collaboration with CMS/Frontier deployment: reusing 70+ already
deployed squid servers
+Commercial SimpleCDN as backup
7
Ideally the analysis activity should be a continuum in terms of tools, paradigms, software frameworks, models..◦ Identical analysis applications should be able to run the same way on a
desktop/laptop, a small cluster, a large cluster and the Grid
CernVM is a convenient tool in hands of our end users/physicists and lets them use experiment software frameworks on their laptops ◦ with little overhead◦ without need to continously download, install and configure new
versions of experiment software ATLAS, CMS, LHCb, LCD, NA61, TH…
Physicist seem to like CernVM on their laptops and on the Cloud◦ Can we have it on the Grid, please?
From laptop to Cloud and From laptop to Cloud and GridGrid
21/6/10Welcome and Introduction, P. Mato/CERN 8
Actions from the 1Actions from the 1stst Workshop Workshop
9
Multicore◦ Try submission of parallel jobs (multi-threaded, multi-process,
MPI) with the existing LSF infrastructure◦ Deploy multi-core performance and monitoring tools◦ Running multi-core jobs on the Grid
Virtualization◦ Transition of CernVM beyond the R&D phase◦ Using CernVM images in virtualized batch systems◦ Prototype an ‘lxcloud’ solution for submitting user jobs using the
EC2/Nimbus API◦ Establish procedures for creating trusted images (e.g. CernVM)
acceptable for Grid sites◦ Investigate scenarios for reducing the need for public IP
addresses on WNs
Feedback from the 2Feedback from the 2ndnd Workshop (1)Workshop (1)
10
Experiments requested access to whole nodes ◦ Allow them to test multi-threaded and multi-process applications that
are being developed, and also their pilot frameworks managing the correct mix of jobs to best optimize the entire node resources
◦ Taking responsibility for ensuring that the node is fully utilized
Issues◦ The implementation would require end-to-end changes from
the end-user submission framework to the local batch configuration and Grid middleware
◦ Adaptation of the accounting (memory and CPU) and monitoring tools
◦ Need for better handing of large files that will eventually result from larger parallel jobs
Feedback from the 2Feedback from the 2ndnd Workshop (2)Workshop (2)
11
Service Virtualization◦ Virtualization of Services is in general well accepted and the
experience so far is very positive◦ In particular VO Boxes virtualization is already planed (at
CERN)
Worker Node Virtualization◦ The lxclud prototype development based on OpenNebula at
CERN-IT is encouraging
Issues◦ Questions about the lifetime of the Virtual Machines and the
need of an API to control them were raised◦ Experiments expressed some concern about the performance
loss on Virtual Machines, in particular for I/O operations to local disk
Generation of Virtual Machine Images◦ HEPiX policy document has been prepared establishing obligations for
people providing virtual images
There is general agreement that the experiment software should be treated independently from the base operating system
CernVM File System can be game changer◦ Could be deployed on standard worker nodes to solve problem of
software distribution◦ Experiments requested support for the CernVM File System (CVMFS) as
the content delivery mechanism for adding software to a VM image after it has been instantiated
LHCb and ATLAS requested IT to host and provide 24*7 support for the CernVM infrastructure
12
Feedback from the 2Feedback from the 2ndnd Workshop (3)Workshop (3)
Progress since the Progress since the WorkshopWorkshop
13
Multicore◦ ATLAS, CMS and LHCb have all released "production-grade" parallel multi-
process applications.◦ Both ATLAS and LHCb are now testing submission of "full-node" jobs.
Virtualization◦ Separated release cycles for CernVM-FS and CernVM◦ Multi VO support for CernVM-FS (via automounter)◦ CernVM as job hosting environment
Tested ATLAS/Panda pilot running in CernVM/CoPilot on lxcloud prototype
Theory group applications (MC generators) are now running on BOINC/CernVM
Developed contextualization tools For EC2 API and compatible infrastructure HEPIX compatible
◦ Benchmarking and performance evaluation of PROOF is under way
Summary Both WP8 & WP9 are making very good progress in close
cooperation with experiments
Applications being developed to exploit new hardware architectures and virtualization technology impose new requirements on the computing services provided by the local computer centers or by the Grids◦ We need to be able to submit jobs that require “whole node”◦ CernVM infrastructure services must be suported 24*7
Experiment would like to be given an opportunity to test these new developments on local batch clusters and on the Grid
14