21
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009

UKI-SouthGrid Overview and Oxford Status Report

  • Upload
    rupert

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

UKI-SouthGrid Overview and Oxford Status Report. Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009. SouthGrid Tier 2. The UK is split into 4 geographically distributed tier 2 centres SouthGrid comprise of all the southern sites not in London. - PowerPoint PPT Presentation

Citation preview

Page 1: UKI-SouthGrid Overview  and Oxford Status Report

UKI-SouthGrid Overview and Oxford Status Report

Pete GronbechSouthGrid Technical Coordinator

HEPIX 2009 Umea, Sweden26th May 2009

Page 2: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20092

SouthGrid Tier 2

• The UK is split into 4 geographically distributed tier 2 centres

• SouthGrid comprise of all the southern sites not in London

• New sites likely to join

Page 3: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20093

UK Tier 2 reported CPU

– Historical View to present

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

Jan-08

Feb-08

Mar-08

Apr-08

May-08

Jun-08

Jul-08

Aug-08

Sep-08

Oct-08

Nov-08

Dec-08

Jan-09

Feb-09

Mar-09

Apr-09

K SPEC int 2000 hours

UK-London-Tier2

UK-NorthGrid

UK-ScotGrid

UK-SouthGrid

Page 4: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20094

SouthGrid SitesAccounting as reported by

APEL

0

100000

200000

300000

400000

500000

600000

Jan-08

Feb-08

Mar-08

Apr-08

May-08

Jun-08

Jul-08

Aug-08

Sep-08

Oct-08

Nov-08

Dec-08

Jan-09

Feb-09

Mar-09

Apr-09

K SPEC int 2000 hours

JET

BHAM

BRIS

CAM

OX

RALPPD

Page 5: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20095

Site Upgrades in the last 6 months

• RALPPD Increase of 640 cores (1568KSI2K) +380TB• Cambridge 32 cores (83KSI2K) + 20TB• Birmingham 64 cores on pp cluster and 128 cores HPC

cluster which add ~430KSI2K• Bristol original cluster replaced by new quad cores

systems 16 cores + increased share of the HPC cluster 53KSI2k + 44TB

• Oxford extra 208 cores 540KSI2K + 60TB• Jet extra 120 cores 240KSI2K

Page 6: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20096

New Total Q109SouthGrid

999.55501

  

6332743

160972

60455

55120

90728

1.5483

Totals

 

RALPPD

Oxford

Cambridge

Bristol

Birmingham

EDFA-JET

Storage (TB)

CPU (kSI2K)

GridPP

% of MoU CPU % of MoU Disk

   

304.35% 142.86%

96.77% 343.75%

469.07% 230.77%

592.68% 363.64%

329.63% 374.56%

   

377.47% 314.31%

Page 7: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20097

Site Setup Summary

Site Cluster/s Installation Method

Batch System

Birmingham

Dedicated & Shared HPC

PXE, Kickstart, CFEngine. Tarball for HPC

Torque

Bristol Small Dedicated & Shared HPC

PXE, Kickstart, CFEngine. Tarball for HPC

Torque

Cambridge Dedicated PXE, Kickstart, custom scripts

Condor

JET Dedicated Kickstart, custom scripts

Torque

Oxford Dedicated PXE, Kickstart, CFEngine

Torque

RAL PPD Dedicated PXE, Kickstart, CFEngine

Torque

Page 8: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20098

Oxford Central Physics

• Centrally supported Windows XP desktops (~500)• Physics wide Exchange Server for email

– BES to support Blackberries

• Network services for MAC OSX – Astro converted entirely to Central Physics IT services (120 OSX

systems)– Started experimenting with Xgrid

• Media services– Photocopiers/printers replaced – much lower costs than other

departmental printers.

• Network– Network is too large. Looking to divide into smaller pieces – better

management and easier to scale to higher performance.– Wireless – introduced EDUROAM on all physics WLAN base stations.– Identified problems with 3com 4200G switch which caused a few

connections to run very slowly. Now fixed.– Improved network core and computer room with redundant pairs of

3com 5500 switches.

Page 9: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 20099

Oxford Tier 2 ReportMajor Upgrade 2007

• Lack of decent Computer room with adequate power and A/C held back upgrading our 2004 kit until Autumn 07

• 11 systems, 22 servers, 44 cpus, 176 cores. Intel 5345 Clovertown cpu’s provide ~430KSI2K, 16GB memory for each server. Each server has a 500GB SATA HD with IPMI remote KVM cards.• 11 servers each providing 9TB usable storage after RAID 6, total ~99TB, 3ware 9650-16ML controller.

• Two racks, 4 Redundant Management Nodes, 4 APC 7953 PDU’s, 4 UPS’s

Page 10: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200910

Oxford Physics now has two Computer Rooms

• Oxford’s Grid Cluster initially housed in the departmental Computer room late 2007

• Later moved to the new shared University room at Begbroke (5 miles up the road)

Page 11: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200911

Oxford Upgrade 2008

• 13 systems, 26 servers, 52 cpus, 208 cores. Intel 5420 Harpertown cpu’s provide ~540KSI2K, 16GB Low Voltage FBDIMM memory for each server. Each server has a 500GB SATA HD.

• 3 servers each providing 20TB usable storage after RAID 6, total ~60TB, Areca Controllers

More of the same but better!

Page 12: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200912

Nov 2008 Upgrade to the Oxford Grid Cluster at Begbroke Science

Park

Page 13: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200913

• Newer generation Intel Quads take less power• Tested using one cpuburn process per core on both

sides of a twin killing a process every 5 minutes.

Electrical Power consumption

Busy 645WIdle 410W

Busy 490WIdle 320W

Intel 5345Intel 5420

Page 14: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200914

Electricity Costs*

• We have to pay for the electricity used at the Begbroke Computer Room:

• Cost in electricity to run old (4 years) Dell nodes is ~£8600 per year. (~79 KSI2k)

• Replacement cost in new twins is ~£6600 with electricity cost of ~£1100 per year.

• So saving of ~£900 in the first year and £7500 per year there after.

• Conclusion is, its not economically viable to run kit older than 4 years.

* Jan 2008 figures

Page 15: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200915

IT related power saving

• Shutting down desktops when idle– Must be idle, logged off, no shared printers or disks, no

remote access etc.– 140 machines regularly shut down– Automatic power up early in the morning to apply patches

and get ready for user (using Wake-On-LAN)

• Old cluster nodes removed/replaced with more efficient servers

• Virtualisation reduces number of servers and power.• Computer room temperatures raised to improve A/C

efficiency (from 19C to 23-25C)• Windows 2008 server allows control of new power

saving options on more modern desktop systems

Page 16: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200916

CPU Benchmarking HEPSPEC06

hostname cpu type memory no of cores hepspec06 hepspec06/core

node10 2.4GHZ zeon 4GB 2 7 3.5

node10 2..4GHz 4GB 2 6.96 3.48

t2wn61 E5345 2.33GHz 16GB 8 57.74 7.22

pplxwn16 E5420 2.5GHz 16GB 8 64.88 8.11

pplxint3 E5420 2.5GHZ 16GB 8 64.71 8.09

These figures match closely with those published on http://www.infn.it/CCR/server/cpu2006-hepspec06-table.jpg

Page 17: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200917

Roughly equal share between LHCb and ATLAS for CPU hours.

ATLAS runs many short jobs. LHCb longer jobs.

Cluster occupancy approx 70% so still room for more jobs.

Local contributionTo Atlas MC storage

Cluster Usage at Oxford

Page 18: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200918

In

out

Oxford recently had its network link rate capped to 100mbs

This was as a result of continuous 300-350mbs traffic caused by CMS commissioning stress testing.

As it happens this test completed at the same time as we were capped, so we passed the test, and current normal use is not expected to be this high

Oxfords Janet link is actually 2 * 1gbit links which had become saturated.

Short term solution is to only rate cap JANET traffic to 200mbs which doesn’t impact on normal working (for now)

all other on site traffic remains at 1gbs.

Long term plan is to upgrade the JANET link to 10gbs within the year.

200mbps

Page 19: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200919

gridppnagios

Have setup a nagios monitoring site for the UK which several other sites use to get advanced warnings of failures.

https://gridppnagios.physics.ox.ac.uk/nagios/

https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo

Page 20: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200920

The End

• But Since some of you may remember my old pictures of computer rooms with parquet floor and computers running in basements without A/C

• Some pictures showing the building of Oxford Physics Local Infrastructure computer room.

Page 21: UKI-SouthGrid Overview  and Oxford Status Report

SouthGrid Status May 200921

Local Oxford DWB Computer room

Completely separate from the Begroke Science park a computer room with 100KW cooling and >200KW power is being built. ~£150K Oxford Physics Money.

Local Physics department Infrastructure.

Was completed Sept 2007.

This will relieve local computer rooms and house T2 equipment until the Begbroke room is ready. Racks that are currently in unsuitable locations can be re housed.