49
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ HEPiX Report Helge Meinhard, Zhechka Toteva, Jérôme Caffaro / CERN-IT Technical Forum/Computing Seminar 20 May 2011

HEPiX Report

  • Upload
    sage

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

HEPiX Report. Helge Meinhard, Zhechka Toteva, Jérôme Caffaro / CERN-IT Technical Forum/Computing Seminar 20 May 2011. Outline. Meeting organisation,Oracle session, site reports, storage (Helge Meinhard) IT infrastructure; networking and security (Zhechka Toteva) - PowerPoint PPT Presentation

Citation preview

Page 1: HEPiX Report

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

HEPiX Report

Helge Meinhard, Zhechka Toteva, Jérôme Caffaro / CERN-IT

Technical Forum/Computing Seminar20 May 2011

Page 2: HEPiX Report

Outline

• Meeting organisation,Oracle session, site reports, storage (Helge Meinhard)

• IT infrastructure; networking and security (Zhechka Toteva)

• Computing; cloud, grid and virtualisation (Jérôme Caffaro)

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 3: HEPiX Report

HEPiX

• Global organisation of service managers and support staff providing computing facilities for HEP

• Covering all platforms of interest (Unix/Linux, Windows, Grid, …)

• Aim: Present recent work and future plans, share experience, advise managers

• Meetings ~ 2 / y (spring in Europe, autumn typically in North America)

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 4: HEPiX Report

HEPiX Spring 2011 (1)

• Held 02 – 06 May at Gesellschaft für Schwerionenforschung (GSI), Darmstadt/Germany– 6 or so heaviest confirmed elements all found there

(including Darmstadtium, Hessium, …)– Some atomic physics– New project: FAIR, most intense antiproton source

known in universe– Good local organisation

• Walter Schön is a master in managing expectations– Nice auditorium, one power socket per seat– Darmstadt: medium-size town, due to its technical

university much dominated by (male) students• Good beer, food not to leave you hungry

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 5: HEPiX Report

HEPiX Spring 2011 (2)

• Format: Pre-defined tracks with conveners and invited speakers per track– Fewer, but wider track definitions than at previous meetings– Still room for spontaneous talks – either fit into one of the

tracks, or classified as ‘miscellaneous’– Again proved to be the right approach; extremely rich,

interesting and packed agenda– Judging by number of submitted abstracts, no real hot spot: 10

infrastructure, 8 Grid/clouds/virtualisation, 7 storage, 6 computing, 4 network and security, 2 miscellaneous… plus 16 site reports

– Some abstracts submitted late, planning difficult• Full details and slides:

http://indico.cern.ch/conferenceDisplay.py?confId=118192

• Trip report by Alan Silverman available, too http://cdsweb.cern.ch/record/1350194

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 6: HEPiX Report

HEPiX Spring 2011 (3)

• 84 registered participants, of which 14 from CERN– Belleman, Caffaro, Cano, Cass, Costa, Gonzalez

Lopez, Gonzalez Alvarez, Meinhard, Salter, Schwickerath, Silverman, Sucik, Toteva, Wartel

– Other sites: Aachen, Annecy, ASGC, BNL, CC-IN2P3, CEA, Cornell, CSC Helsinki, DESY Hamburg, DESY Zeuthen, Diamond Light Source, Edinburgh U, FNAL, GSI, IHEP Beijing, INFN Padova, INFN Trieste, KISTI, KIT, LAL, Linkoping U, NIKHEF, Prague, PSI, St. Petersburg, PIC, RAL, SLAC, Umea U, Victoria U

– Compare with Cornell U (autumn 2010): 47 participants, of which 11 from CERN

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 7: HEPiX Report

HEPiX Spring 2011 (4)

• 54 talks, of which 13 from CERN– Compare with Cornell U: 62 talks, of which 19 from

CERN– Compare with Berkeley: 62 talks, of which 16 from

CERN

• Next meetings:– Autumn 2011: Vancouver (October 24 to 28)

• 20th anniversary of HEPiX: dinner on Thursday night, special talks on Friday morning; founders and ex-regular attendees invited

– Spring 2012: Prague (date to be decided, probably 2nd half of April)

– Autumn 2012: Asia (venue and date to be decided)

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 8: HEPiX Report

Oracle/SUN policy concerns

• Recent observations:– Significantly increased HW prices for Thor-style

machines– Very significantly increased maintenance fees for Oracle

(ex-Sun) software running on non-Oracle hardware• Sun GridEngine, Lustre, OpenSolaris, Java, OpenOffice,

VirtualBox, …

– Very limited collaboration with non-Oracle developers– Most Oracle software has already got forked as open-

source projects– (At least) two Oracle-independent consortia around

Lustre

• HEP labs very concerned

HEPiX report – Helge.Meinhard at cern.ch – 03-Dec-2010

From HEPiX Autumn 2010 report

Page 9: HEPiX Report

Oracle session (1)

• 3 high-level representatives, 2 talks, a lot of discussion– No convergence

• Oracle Linux– Oracle position it as no. 2 OS behind Solaris– Freely available... but not the updates!

• Open Source at Oracle– A lot of slides and words, little concrete– OpenOffice development stopped, Lustre de facto

stopped, GridEngine development will continue as closed source for paying customers. Oracle sitting on the names...

– No change of strategy to ask much more money for software if it runs on non-Oracle HW

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 10: HEPiX Report

Oracle session (2)

• Reactions– A bit of dismay about these talks...– Lustre and GridEngine taken over by small

companies contributing to the development (of open-source branch) and providing commercial support

– Most people seem to trust these small companies

– Less despair than in Cornell, but no better view of Oracle

– “It takes 20 years to build up a good reputation… and about 5 minutes to lose it”

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 11: HEPiX Report

Site reports (1): Hardware

• CPU servers: same trends– 12...48 core boxes, more AMD than a year ago, 2...4 GB/core– Quite a number of problems reported with A-brand suppliers

and their products• Disk servers

– Thumper/Thor not pursued by anybody– DDN, Nexsan, white boxes taking over– Disk servers with 48 drives mentioned

• 24 drives in system, 24 drives in SAS-linked expansion chassis– Problems with some controllers, swapped for different brand

• Problem space is rather a full circle…

• Tapes– Not much change, some sites testing T10kC– LTO mostly (in professional robots) very popular

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 12: HEPiX Report

Site reports (2): Software

• OS– Solaris being phased out in a number of places– AIX mentioned at least once– Windows 7 being rolled out in a number of places– Some sites more advanced with SL6, but quite some

work to adapt it to existing infrastructures

• Storage– Lustre: strong growth, largest installations reported ~

4 PB (GSI to grow to 7 PB this year)– GPFS in production use in some places, others

hesitating due to price tag– AFS at DESY with similar problems as at CERN

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 13: HEPiX Report

Site reports (3): Software (cont’d)

• Batch schedulers– Some (scalability?) problems with PBSpro / Torque-

MAUI– Grid Engine rather popular

• Virtualisation– With reference to CERN's positive experience, RAL

started investigating HyperV and SCVMM

• Service management– FNAL migrating from Remedy to Service-now– NIKHEF using Topdesk Enterprise– CC-IN2P3 with dedicated QA manager (his task

includes pushing for a configuration database)

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 14: HEPiX Report

Site reports (4): Infrastructure

• Infrastructure– Cube design for FAIR: low-cost, low-energy– Incidents at FNAL

• Configuration management– Quattor doing well, cvmfs enjoying a lot of

interest (and a steep ramp-up)– Puppet mentioned a number of times

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 15: HEPiX Report

Site reports (5): Miscellaneous

• Data preservation mentioned• FNAL: offering support for Android and iOS

devices• RedHat Cluster used for load-balancing• Mail: NIKHEF moved spam filtering etc. to

commercial provider• Exchange 2003 replacement at DESY: several

options, mostly free and/or open-source• Curiosities

– IHEP is using... Castor 1, even a locally modified version

– St Petersburg: Mail service running on a virtual machine

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 16: HEPiX Report

Storage session (1)

• Gluster evaluation (Yaodong Cheng / IHEP)– Scalable open-source clustered file systems– No support for file-based replication– Looks interesting, but not stable enough yet

• Virtualised approach to mass storage (Dorin Lobontu / KIT)– Hide differences (implementation details)

between mass storage– Replace different physical libraries by a single

virtual one that pretends to represent a library towards the application

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 17: HEPiX Report

Storage session (2)

• Lustre at GSI (Thomas Roth / GSI)– Pretty much success story, but there were

stumbling clocks on the way– Upgrade to 1.8 / migration to new hardware not

easy

• File system evaluation (Jiri Horky / Prague)– Proposal for a benchmark more typical of HEP's

work styles

HEPiX report – Helge.Meinhard at cern.ch – 20-May-2011

Page 18: HEPiX Report

Computing Facilities

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF

HEPiX, Darmstadt, 2nd – 6th MayIT Infrastructure

Networking & Security

20th June 2011

Page 19: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Drupal at CERN (Juraj Sucik)

• Drupal 7 since Jan 2011, 60 web sites, positive feedback

• Centrally managed IT infrastructure– load balancer, web servers, nfs server, ldap servers and

mysql servers, lemon monitoring

• CERN Integration– SSO Authentication and E-groups based role schema– WebDav, Backup service, Web services, official theme

• Modules for CDS Records and Indico Events• Tuning & Testing, issues with D7, studying NAS and

Varnish, plans for migration of CERN sites

HEPiX, May 2011, ITInfra, Sec&Net - 19

Page 20: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Indico (José B. González López)

• Domains – Event Organization, Data repository, Booking– Started as EU project in 2002, 1st release for CHEP 2004, >

90 institutions

• Current state– Quattor managed SLC5, 3rd party systems– Data increases with the years: > 130K events

• Evolution during the last 3 years– Need to improve usability and performance– New features: Room booking; Common interface for Video

services; Chat Room; Paper Reviewing

• Future– Provide a good QoS– Cover the full events lifecycle, introduce timetable

drag&drop; live sync HEPiX, May 2011, ITInfra, Sec&Net - 20

Page 21: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Invenio (Jérôme Caffaro)

• Domains– Integrated Digital Library / Repository software for managing

documents in HEP– 1M records, > 700 collections, > 150 bibliographic formatting

templates, > 100 submission workflows, ~18k search queries and >200 new documents per day

• Modular architecture - Supporting blocks for building workflows; Standardized interfaces

• CDS: Books, PhotoLab, Preprints, Videos, Bulletin– Integrated with Aleph, SSO, e-groups, SLS, Quattor, Foundation

and EDH, Indico

• Future plans– Integration with INSPIRE, Drupal, arXiv.org, GRID, Git– Improve the search interfaces, new submission workflows, extend

the load balancing

HEPiX, May 2011, ITInfra, Sec&Net - 21

Page 22: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF FAIR 3D Tier-0 Green-IT Cube

Talk by Volker Lindenstruth• FAIR Computing needs ~300K cores and 40 PB

Storage

• Novel cooling method of back to back racks– Water cooling of the hot air at the back of the rack

– Injecting the cooled air in the next rack

• Loewe-CSC prototype at Goethe U. – Requires 1/3 of the conventional cooling power

– PUE 1.062 to 1.082 at 450 KW

• Extended idea to make a cube– Layers of racks mounted on steel bars

• 20 m tall, houses 890 racks

– Approved the next day, plan to start in 2012

HEPiX, May 2011, ITInfra, Sec&Net - 22

Page 23: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF CC-IN2P3 (Pascal Trouvé )

• Latest improvements of the current computing room– Power: transformers 3 X 1 MVA; Diesel generator 1.1 MVA, 2nd

low voltage distribution panel, etc– Cooling: 3rd unit 600kW->1800kW (max allowed)– Hazards: enhanced monitoring and fire safety systems– Infrastructure: high density confined racks, inRaw cooled racks,

PDUs

• New computing room (Phased commisioning)– 2011: 600 KW, 50 racks (3 MW total) -> 2019: 3.2 MW, 216 racks

(9 MW total)– All services (chilled water tank, chiller units, cold water pump

stations) delivered from the ceiling + elevator (3 tons)– Phase 1 (18 months) – May– Phase 2 (October 2011) – 3rd chiller, power redundancy– First years: only worker nodes; later: offices, outreach center

HEPiX, May 2011, ITInfra, Sec&Net - 23

Page 24: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF CERN CC (Wayne Salter)

• Current capacity: 2.5MW -> 2.9MW (240kW critical)• Local Hosting (17 racks up to 100kW) – July 2010

– Business continuity; experience with remote hosting– Sysadmins’ and SMs’ interventions needed– Longer than foreseen, but positive experience

• On-going improvements of CC– Improve efficiency of cooling systems

• CC Upgrade to 3.5 MW (600 kW critical) - Nov 2012– Restore the redundancy for UPS systems– Secure cooling for critical equipment (UPS)

• Remote hosting – Spring 2011: go on or not– Studies for a new CC; a proposal from NO for a remote CC– June 2010: Call for interest – how much for 4 MCHF/year?– 23+ proposals; several > 2MW; visits

HEPiX, May 2011, ITInfra, Sec&Net - 24

Page 25: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Service-Now (Zhechka Toteva)

• Motivation for change– Follow ITIL best practices– Service oriented tool for the end users– Modular, flexible, platform independent

• GS and IT departments collaboration– Common Service catalogue, Incident and Request

fulfillment processes

• Service-now – Development started October 2010; 1st Release 15

February 2011– A lot of customization, configuration and integration

• Foundation, SSO, GGUS

• Plans– Migrate existing workflows from Remedy– Change management HEPiX, May 2011, ITInfra, Sec&Net - 25

Page 26: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF SLinux Status (Troy Dawson)

• S.L. 6.0 released – March 2011– Record Downloads in March

• 1.3M i386; 1.0M x86_64• Nov 2010 - for the first time x86_64 exceeded i386

– Downloads • Per nb of files – Brazil, Portugal, Germany

• Per MB – China, Russian Federation, Japan

• S.L. 4.9– final release - April 2011, security – Feb 2012

• S.L. 5.6 - beta - May 2011• Bugs in glibc and problems with the installer

• Survey for the usage– https://www.surveymonkey.com/s/SLSurvey

HEPiX, May 2011, ITInfra, Sec&Net - 26

Page 27: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CFVersion Control at CERN (Alvaro Gonzales)• 3 version control services:1 SVN and 2 CVS

– CVS - Planned shutdown for 2013; only 10 projects remain– LCGCVS – Shutdown ongoing– SVN - Started in Jan 2009

• Cluster, High availability, DNS load balancing, AFS for repositories, NFS for Trac

• CVS not maintained anymore and strong user request

• SVN usage is growing (~300 active projects)• Web tools: WEBSVN, TRAC, SVNPlot:• Still some load balancing issues

• Future plans– v1.6, mirroring, statistics, VMs, conf. tools for librarians

• Twiki– > 9K users, 107K topics, 1.2M accesses, 58K updates

HEPiX, May 2011, ITInfra, Sec&Net - 27

Page 28: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF CERNVM-FS (Ian Collier)

• Virtual software installation by means of an HTTP File System– Squid for multi-level caching

• Performance tests at PIC very impressive• In production for LHCb and ATLAS• Core service supported at CERN• Replicas in RAL, BNL and CERN

HEPiX, May 2011, ITInfra, Sec&Net - 28

Page 29: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Secure messages (Owen Singe)

• Advantages of signing– Simple, easy, open-source– SSL Connections not required for data integrity– Allows message processing asynchronously – Allows message to be verified later

• Disadvantages of Signing and Encryption compared with SSL– Not interactive messages, may require numbering

• Disadvantages of encrypting– Can be decrypted only by one certificate

• Suggested AMQP– Example in Python

HEPiX, May 2011, ITInfra, Sec&Net - 29

Page 30: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Security Upgrade (Romain Wartel)

• Computer security objectives– Service availability, Data integrity and confidentiality,

Reputation

• Reputation– Trust, Fragile, Cooperation

• Four attacks since the last HEPiX:1 Web and 3 SSH• Identity Federation Workshop

– https://indico.cern.ch/conferenceTimeTable.py?confId=129364

• Rootkits and rootkits checkers• Strategy

– Patching, user ACL, in-dept monitoring, incident mgmt

• Virtualization– Virtualized infrastructure < - > Virtualized Payload (not yet!)

HEPiX, May 2011, ITInfra, Sec&Net - 30

Page 31: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF Host-base intrusion detection

Talk by Bastien Neuburger• Intrusion detection systems

– Network-based, Host-based, Hybrids

• OSSEC used in GSI– http://www.ossec.net/– Sensors for file integrity, process output, kernel level– Provides intrusion prevention mechanism– Available for Windows– Can analyse firewall logs: Cisco PIX, FWSM– Architecture - Server/agent– Event->Predecoding->Decoding->Rule Matching->Result– Can be used for monitoring

HEPiX, May 2011, ITInfra, Sec&Net - 31

Page 32: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF HEPiX IPv6 WG (Dave Kelsey)

• Projected exhaustion dates for IPv4 – IANA – 09 June 2011; RIR – 22 January 2012– APNIC run out on 15 Apr 2011; RIPE NCC expected soon

• Questionnaire in Sep 2010, talks at Cornell HEPiX– 18 sites answered; 12 offered people

• Work has now started– Members; Email list; 1st meeting held the week before– DESY made the first test networks; testing application– CERN testbed available for the summer

• Plans– Need to rewrite applications, check security, monitoring

tools– Ideally use the 2013 shutdown for initial deployment– First meeting after the World IPv6 day on the 8th of June

HEPiX, May 2011, ITInfra, Sec&Net - 32

Page 33: HEPiX Report

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

CF

Thanks Alan!

HEPiX, May 2011, ITInfra, Sec&Net - 33

Thanks all!

Page 34: HEPiX Report

Computing Track

Page 35: HEPiX Report

Batch Monitoring and Testing Investigating needs and solutions to monitor batch jobs 3'700 nodes, 180'000 jobs/day Currently Lemon-based. But:

Node-based instead of job-based Want to keep raw data and do advanced correlation Want more advanced views, dynamic reports Target not only admin: also users and service desk

Designing a flexible (lego-like) architecture But rely on what already exists (experience/tools)

First prototype expected in 6 months

Page 36: HEPiX Report

Selecting a new batch system at CC-IN2P3 Replacing in-house developed batch system “BQS”

Missing functionalities, cost of maintenance, etc. Surveyed in 2009 systems used in HEP (*) Defined and evaluated criteria for the selection:

Scalability, robustness, sharing, scheduler, AFS management, ability to limit job's consumption, interface to the grid and administration, etc.

LSF and SGE: best candidates. Selected SGE in Feb 2010

* Results of survey not available publicly: can be requested by email

Page 37: HEPiX Report

GridEngine setup at CC-in2p3

Scalability and robustness testing to Tune the configuration Anticipate possible problems related to intensive usage

Bombarding test instance to test scalability Permanent flow of jobs during 3 days, 80 nodes to

validate robustness No issue found with scalability and robustness. However

some other issues have been identified, and their solutions /workaround have been discussed. For eg.

Loss of AFS token upon server restart

Page 38: HEPiX Report

OpenMP Performance on Virtual Machines Benchmarking OpenMP programming model on virtual

machines: multiprocessors systems with shared memory Results as expected: more cores, better performance But:

In one sample application performance went down with more cores

=> Profiling using ompP showed that OpenMP introduced overhead when merging results at the end of parallel execution.

=> “Barrier” implementation in GCC poor => Code optimization is necessary to take advantage

Page 39: HEPiX Report

CMS 64bit transition and multicore plans CMS software is now fully 64 bits Up to 30% performance gains. Memory overhead 25% to

30% Plans for multi-core architectures

Costly in terms of memory if not addressed (i.e. one process per core)

Strategy: share non-mutable common code and data across processes, by forking once app-specific code is reached 34GB down to 13GB on test machine “Whole-node scheduling” assumed necessary by the

lecturer Multi-threading: not worth given the current sequential

decomposition of the algorithms

Page 40: HEPiX Report

Performance Comparison of Multi and Many-Core Batch Nodes

Performance increased by adding more cores. Testing if HEP applications benefit from additional cores Check possible bottlenecks

Benchmarked systems using HS06 (HEPSpec06, based on SPEC)

HS06 scales well with number of CPU cores Job throughput also scales with number of cores Biggest bottleneck being insufficient memory bandwidth

Page 41: HEPiX Report

Cloud, grid and virtualization track

Page 42: HEPiX Report

Moving virtual machines images securely between sites

Analysis and design of a procedure to transfer images securely

Non-repudiation of an image list. Endorsed images. Possibility to revoke. Unmodified images.

Solution: Endorser signs an image list (x509) The site Virtual Machine Image Catalogue subscribes to list

Definition of the metadata to describe an image Two implementations already exist. CERN is already

successfully using such technique.

Page 43: HEPiX Report

Virtualization at CERN: a status report CERN Virtual Infrastructure (based on Microsoft's System

Center Virtual Machine Manager) status: Deployed CVI 2.0: improved stability and functionalities Hypervisors upgraded to Win 2008 R2 SP2 Dynamic Memory allocation (Win only now) SLC6 templates Growing adoption (1250VMs, 250 Hypervisors. 2x VMs since Nov

2010) Linux VMs Integration Components (stable, 500 VMs)

CERN cloud 96 virtual batch node in production (OpenNebula and ISF)

Looking forward to SLC6, bringing better support for OpenNebula

No decision taken yet regarding which product to adopt Public cloud interface prototype (OpenNebula) Looking at OpenStack

Page 44: HEPiX Report

StratusLab Marketplace for Sharing Virtual Machine Images

EU funded project aiming at providing an open source platform to search/provide correctly prepared / secure images

Machine image creation seen as barrier to cloud adoption. Sharing seen as necessary

Simple REST APIs

Page 45: HEPiX Report

Operating a distributed IaaS Cloud for BaBar MC production and user analysis

Infrastructure-as-a-Service (IaaS) Clouds Project funded by Canada for HEP legacy data project

(running BaBar code for next 5-10 years) Suitable for other projects such Canadian Advanced

Network for Astronomical research Adopted solution: Condor + Cloud scheduler Experience:

Monitoring difficult Debugging users VM difficult Different EC2 APIs implementations

Page 46: HEPiX Report

FermiGrid Scalability and Reliability Improvements

Meta-facility: provide grid infrastructure at Fermilab See slides

Page 47: HEPiX Report

Adopting Infrastructure as Code to run HEP applications

Idea: simplify deployment and operations of application and infrastructure by describing everything in code

One language to describe infrastructure and applications One API for every infrastructure element API to access monitoring data A control system Prerequisite: IaaS enabled by virtualization technology

Presented use case deployment of a virtual computing cluster to run HEP application for ALICE. Sample code in the slides.

Page 48: HEPiX Report

The (UK) National Grid Service Cloud Pilot

Studying interest in IaaS clouds from higher education, with real users and real cloud

2 year project, using Eucalyptus 200 users, 23 chosen as case studies 3 use cases: ATLAS, Distr. Scient. Computing Msc,

Geographic Data Library Sufficient interest: yes Suitability of IaaS: yes Ease of use: no

Training and support are important And better software would exist

Page 49: HEPiX Report

HEPiX VWG Status Report

Working areas Image generation policy*(up-to-date VM, security patch,

etc.) Image exchange. (Currently inter-site exchange lacks.

StratusLab? VS CVMFS?) Includes Image expiry/revocation

Image contextualisation Configure to interface with local infrastructure (only!)

Mutliple hypervisor support KVM and Xen dominate. Procedure to create VMs for

both (less interest in Xen?)

* http://www.jspg.org/wiki/Policy_Trusted_Virtual_Machines