20
UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Embed Size (px)

Citation preview

Page 1: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

UKI-SouthGrid Overview

Pete GronbechSouthGrid Technical Coordinator

GridPP 25 - Ambleside25th August 2010

Page 2: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Seven(-teen) Sisters

Page 3: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 20103

UK Tier 2 reported CPU

– Historical View to present

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

Jun-09 Jul-09 Aug-09

Sep-09

Oct-09 Nov-09

Dec-09

Jan-10 Feb-10

Mar-10

Apr-10 May-10

Jun-10 Jul-10

K SPEC int 2000 hours

UK-London-Tier2

UK-NorthGrid

UK-ScotGrid

UK-SouthGrid

Page 4: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

Jun-09 Jul-09 Aug-09

Sep-09

Oct-09

Nov-09

Dec-09

Jan-10 Feb-10

Mar-10

Apr-10 May-10

Jun-10 Jul-10

K SPEC int 2000 hours

JET

BHAM

BRIS

CAM

OX

RALPPD

SouthGrid August 20104

SouthGrid SitesAccounting as reported by

APEL

Sites Upgrading to SL5 and recalibration of published SI2K values

Page 5: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 20105

Site Resources

HEPSPEC06

CPU (kSI2K) converted from

HEPSPEC06 benchmarks Storage (TB)

1772 442 1.5

3344 836 166

1836 459 110

2268 567 120

3564 891 199

12928 3232 633

0    

25712 6248 1181.5

Site

EDFA-JET

Birmingham

Bristol

Cambridge

Oxford

RALPPD

 

Totals

Page 6: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Gridpp3 h/w generated MoU for 2010,11,12

2010 TB 2011 TB 2012 TB

bham 179 95 124

bris 22 27 35

cam 108 135 174

ox 203 255 328

RALPPD 364 440 583

2010 HS06 2011 HS06 2012 HS06

bham 1450 2,119 2724

bris 661 1,173 1429

cam 1148 1,445 1738

ox 2034 2,483 2974

RALPPD 6499 13109 16515

Page 7: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 20107

JET

• Stable operation, (SL5 WNs)• Could handle more opportunistic LHC work

1772HS06

1.5TB

Page 8: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 20108

Birmingham

• Just purchased 40TB Storage

– total storage to 10TB + 6*20 + 2*40 = 210 TB in a week or two

• Two new 64 bit servers– (SL5) Site BDII + monitoring

VMs– (SL5) DPM head node

• Everything (except mon) is SL5

• Both clusters have dual lcg-CE/CreamCE front ends

• Sluggish response/instabilities with GPFS on Shared Cluster– Installed 4TB NFS mounted file

server for experiment software/middleware/user areas

Taken on someone else's proprietary (non SL5) smart phone. He couldn't get signal in there either.

Page 9: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 20109

Birmingham

Page 10: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Bristol LCG StoRM SE with gpfs, 102TB 90% full of CMS data

• StoRM developers are finishing testing 1.5.4 on SL5 64bit, plan to provide 1.5.4 both for slc4 ia32 and sl5 x86_64 to Early Adopters this month (August). Bristol is waiting for stable well-tested StoRM v1.5 SL5 64-bit release . In the meantime Bristol's StoRM v1.3 (32-bit on SL4) working very well!

On 1Gbps network, getting good bandwidth utilization Servers (StoRM & gridftp) very responsive despite

load:

Page 11: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Prior WN: Intel XEON 2.0GHz; Dec2009 new WN: AMD 2.4GHz each AMD WN = 2 x 1TB drive, part of 1 disk = WN space

Dr Metson experimenting with HDFS using rest of 1 disk + 2nd disk, working with INFN on possibility of StoRM on top of HDFS

Also experimenting with using Hadoop to process CMS data

In Other News... Swingeing IT staff cuts being planned at U Bristol (and

downgrades for those few remaining) Started planning that SouthGrid will take over Bristol

LCG Site Admin from April 2011 Consolidate & reduce PP servers so Astro admin can

inherit PP Staff will best-effort support Bristol AFS server (IS

won't)

HDFS with StoRM

Page 12: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 201012

Bristol

• Plan to try to run the ce’s and other control nodes on Virtual machines using an identical setup to Oxford, to enable remote management.

• The StoRM SE on GPFS will be run by Bob Cregan on site.

Page 13: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 201013

Cambridge

• 32 cores CPU installed April 2010: bought from GridPP3 tranche 2.

• Server to host several virtual machines (BDII, Mon, etc.) just delivered.

• Network upgraded last November to provide gigabit ethernet to all GRID systems.

• Storage is still 140TB; CPU will be increased due to the purchase in the first point.

• Atlas production is the main VO running on this site.• Investigating current under utilisation, possible

Accounting issues?

Page 14: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 201014

RALPP

• We believe we are now through all the messing about with air conditioning, with our machine room now running on the refurbished/upgraded AC plant. Happy days, all except for the leaks shortly after they turned it on!

• We've been running well below nominal capacity for most of this year, but are pretty much back now.

• Joining with the Tier 1 for the tender process.• Testing argus and glexec• RGMA and site BDII now moved to SL5 VMs• Working on setting up a test instance of

dCache, working with the Tier 1, using Tier 2 hardware.

Page 15: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 201015

Oxford

• Last 6 months cluster running with very high utilisation.

• Completed the tender for new kit and placed orders in July. Unfortunately the orders had to be cancelled due to manufacturing delays on the particular motherboard we ordered and a pricing problem. Now re-evaluating all suppliers with updated quotes.

• New Argus server installed. (Report by Kashif)– ‘Installing Argus was easy and  configuring was also OK once I

understood the basic concept of policies but it took me a considerable time because of a bug in Argus which is partly due to  old style of host certificate issued by UK CA. The same issue was responsible for gridpp voms server problem. I have reported this to UK CA.

– Argus uses glexec on the WN, it is being tested the glexec installed on t2wn41.

– Details on gridpp wiki http://www.gridpp.ac.uk/wiki/Oxford’

• Oxford has become an early adopter for CREAM and ARGUS.

Page 16: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 201016

Grid Cluster setup CREAM ce & pilot setup

t2ce02

CREAM

Glite 3.2 SL5

T2wn41glexec

enabled

t2argus02

t2ce06

CREAM

Glite 3.2 SL5

T2wn40 -87

Oxford

Page 17: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Oxford Dashboard

SouthGrid August 201018Thanks to Glasgow for the idea /

code

Page 18: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Oxford’s Atlas dashboard

SouthGrid August 201019

Page 19: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

SouthGrid August 201020

Conclusions

• SouthGrid sites utilisation generally improving• Many had recent upgrades for hardware using

Gridpp3 second tranche, others putting out tenders, some delays following issues with vendor at Oxford

• RALPPD back to full strength following AC upgrade• Monitoring for production running improving• Concerns over reduced manpower at sites as we

move into GridPP 4

Page 20: UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th August 2010

Future Meetings

• Look forward to GridPP 26 in Sheffield next April• If you look in the right places the views are as good as

here in the lakes.