24
Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboratio n Board 24 th April 2008

Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

Embed Size (px)

Citation preview

Page 1: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

Ian BirdLCG Project Leader

WLCG Collaboration Issues

WLCG Collaboration Board24th April 2008

Page 2: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 2

Strategic Issues

A number of aspects of WLCG where we see the need for some structuring of dialogue with the Tier 2 federations: Reliabilities Accounting Resource pledges/installed capacity Milestones

Other issues that are arising: Engagement in EGI/NGI (etc) for future infrastructures Resource procurement schedules/delays/process

General aspects of Tier 2 coordination/information flow: Information from MB, engagement in GDB

Technical points – how to discuss with Tier 2s: Move to SL5/6; pilot jobs; fabric monitoring/tools; what tools do Tier 2s

miss? What is the voice of the Tier 2’s ?

Page 3: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 3

CPU Usage Jan-Feb 2008

CERNBNLTRIUMFFNALFZK-GRIDKACNAFCC-IN2P3RALASGCPICNDGFNL-T1Tier 2

Recent grid use

Across all grid infrastructures

Preparation for, and execution of CCRC’08 phase 1 Move of simulations to Tier 2s

Tier 2: 54%

CERN: 11%

Tier 1: 35%

Federations not yet reporting:FinlandIndia (IN-INDIACMS-TIFR)

NorwaySwedenUkraine

Page 4: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 4

Accounting for Tier-2s (1) Test reporting took place in summer 2007 and formal reporting

started from September 2007. Monthly reports are now produced, circulated for comment and

published on the LCG Project Planning website. Currently the 52 of the 57 Federations are reporting accounting

data over a total of 107 sites: Changes still being signaled for site names therefore situation not

yet fully stable Some Federations provided pledge information from 2008 onwards

and will be included in the reporting from April Follow-up required with Finland, India, Norway, Sweden and

Ukraine to include them in the accounting reporting Slide 5 shows the global picture of reporting by country from

September 2007-February 2008. Slides 6 and 7 show the comparison of MoU pledge with CPU

provided split according to size of pledge.

Sue Foffano – CERN-IT-4

Page 5: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 5

Accounting for Tier-2s (2)

Sue Foffano – CERN-IT-5

Page 6: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 6

Accounting for Tier-2s (3)

Sue Foffano – CERN-IT-6

Page 7: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 7

Accounting for Tier-2s (4)

Sue Foffano – CERN-IT-7

What we don’t see here is the installed capacity

Page 9: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 9

Computing Resource Pledge Responsibilities

Following the pledge revision exercise of Autumn 2007 a reminder of the process is felt necessary.

Autumn C-RRB meeting each Federation is expected to provide: Firm commitment to pledge values for the following year Planned pledge values for the subsequent 4 years

Spring C-RRB meeting each Federation is expected to: Confirm that pledge values for the current year are installed

and running a production service, or explain any problems for the current year or changes for future years

2 weeks before the next C-RRB on 11/11/08 the following is therefore required: Confirmed 2009 pledge values (confirmation of already

communicated value, or revised upwards) Planned pledge values 2010-2013 inclusive (confirmation or

revision of already communicated values, + 2013)

Sue Foffano – CERN-IT-9

Page 10: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 10

Tier 0/Tier 1 Site reliability

Target: Sites 91% & 93% from December 8 best: 93% and 95% from December

See QR for full status

Sep 07 Oct 07 Nov 07 Dec 07 Jan 08 Feb 08

All 89% 86% 92% 87% 89% 84%

8 best 93% 93% 95% 95% 95% 96%

Above target (+>90% target)

7 + 2 5 + 4 9 + 2 6 + 4 7 + 3 7 + 3

Follow up process in MB over many months with individual sites

Page 11: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 11

Tier 2 Reliabilities

Reliabilities published regularly since October

In February 47 sites had > 90% reliability

Overall Top 50% Top 20% Sites

76% 95% 99% 89100

For the Tier 2 sites reporting:

For Tier 2 sites not reporting, 12 are in top 20 for CPU delivered

Sites Top 50%

Top 20%

Sites>90%

%CPU 72% 40% 70%Jan 08

How do we address this?

Page 13: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 13

How should the federations be reported- weighted?

Page 14: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 14

Reliability reporting

Currently (Feb 08) All Tier 1 and 100 Tier 2 sites report reliabilities

Recent progress: MB set up group to Agreement on equivalence of NDGF tests with those used at EGEE

and all other Tier 1 sites – now in production at NDGF Should also be used for Nordic Tier 2 sites

Similar process with OSG (for US Tier 2 sites): tests only for CE so far, agreement on equivalence, tests are in production, publication to SAM in progress

Missing – SE/SRM testing Expect full production May 2008 (new milestone introduced)

Important that we have all Tier 2s regularly tested and reporting

Important that we have correct Tier 2 federation contact to follow up these issues

Page 15: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 15

Reporting

Urgent now that: Remaining Tier 2 federations start reporting on reliabilities and

accounting Follow up monthly in checking the published data – we have to

understand if there are problems in the process If the site names are wrong – please tell us what they should be (and

how they map to the physical site host names) Resource installation

We need to gather also information about installed resources at Tier 2s Follow up process:

For Tier 1s this was done monthly in the MB, site by site – was manageable but slow; with Tier 2s this process is unwieldy (110+ sites)

Need a contact person for each federation, and would be far more convenient to have a contact for each country

Page 16: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

WLCG April 2008: Tier 0 and 1 Resources 16

Updated Resource Status Summary for May CCRC’08

• For 5 May not all sites will now have their full 2008 cpu pledges available, a total of 28648 KSi2K (9600 KSi2K more than in 1Q2008 but a drop of 8000 from Feb plans) . Largest missing sites are +2500 KSi2K at NL-T1 due November 2008, +1700 KSi2K at CNAF due June, +1300 KSi2K at US-CMS due end May and +3400 KSi2K at US-ATLAS due early June.

• For disk and tape many sites will catch up later in the year as need expands: 2008 disk

requirements are 23 PB and 12.4 PB are expected to be available for 5 May (3 PB more than in 1Q2008 but a drop of 3.1 from Feb plans) while 2008 tape requirements are 24 PB and 13.6 PB are expected to be available for 5 May (4.8 PB more than in 1Q2008 but a drop of 1.4 PB from Feb plans).

• Disk and tape storage for May full scale dress rehearsal run of CCRC’08 are probably better modelled by requiring 55% (accelerator efficiency) times 30/100 (days running) of the increased resource requirements for 2008/9 over those of 2007/8 so 2.8 PB of disk and 3 PB of tape. Globally not a problem but some sites will not be able to fully contribute to the May CCRC if this model is correct.

• These requirements are to be modified with the specific April 2008 experiment requirements to be given in the next talks.

Page 17: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

WLCG April 2008: Tier 0 and 1 Resources 17

Summary of Disk Space Plans

• As usual the most critical resource:– ASGC: Last 300 TB delivery end June– CC-IN2P3: Last 880 TB planned for September– FZK: Last 650 TB planned for October (600 ALICE, 50 CMS)– CNAF: Last 730 TB planned for June/July– NDGF: Grow as needed reaching last 700 TB by Autumn– NL-T1: Add 800 TB by end May and last 1450 TB in November– PIC: Last 370 TB planned for early June.– RAL: Last 800 TB in acceptance, ready for end May.– TRIUMF: Full pledge for May CCRC– US-ATLAS: Add 1200 TB by end May and last 1000 TB in October– US-CMS: Full pledge for May CCRC

Page 18: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 18

Resource procurement

This risks to be a major problem in the coming years Important to work around the procurement processes so that we can be

ready for the accelerator running each year

Has been a problem for almost all Tier 1s. Is this also an issue for Tier 2s?

18

Page 19: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 19

Milestones

The project has mostly had formal milestones associated with the project, Tier 0, Tier 1s

It is now time to start to impose milestones on the Tier 2s for specific issues: E.g. Reliability, resource installation, etc.

Again, will be important to have the appropriate technical coordinators to report and follow up on these issues

Page 20: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 20

Communication

Apart from the issues raised above,

How are the Tier 2s kept informed, and does it work? Flow of information from Management Board, - do Tier 2s read the

minutes? Is everyone engaged in the GDB (or even aware that they can be)?

How can we structure the communication with the great number of Tier 2 sites, so that we can have a workable process to communicate problems and follow up (in both directions)??

How can we aggregate Tier 2 status to report in LHCC/OB/RRB/CB etc? Today it is extremely difficult to get an overview of Tier 2 status and

problems

Page 21: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 21

Miscellaneous technical issues

Move to new versions of the OS – SL5/SL6 Pilot jobs/glexec – is it OK for sites to deploy this now? Fabric monitoring –

do Tier 2s do this sufficiently? Do they have the tools? Security tools? – are sites appropriately protected? What tools do Tier 2s miss?

How do Tier 2s keep abreast of these developments? Should participate in the GDB Is more needed?

Page 22: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 22

Comments on EGI design study

Goal is to have a fairly complete blueprint in June Main functions presented to NGIs in Rome workshop in March

Essential for WLCG that EGI/NGI continue to provide support for the production infrastructure after EGEE-III We need to see a clear transition and assurance of appropriate levels of

support; Transition will be 2009-2010 Exactly the time that LHC services should not be disrupted

Concerns: NGIs agreed that a large European production-quality infrastructure is a

goal Not clear that there is agreement on the scope Reluctance to accept level of functionality required

Tier 1 sites (and existing EGEE expertise) not well represented by many NGIs

WLCG representatives must approach their NGI reps and ensure that EGI/NGIs provide the support we need

These comments apply equally to Tier 2s

- they really need to engage with the NGI in their countries

Page 23: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 23

EGI/NGI cont.

While WLCG should work hard to make sure that the EGI design study goes in the right direction,

Strategically the project must be prepared to plan for a fall-back

Tier 1s were questioned in the OB – all replied that they had some plan in place if there were no EGI/NGI Albeit with a potential reduction in what they could contribute

We need to start thinking about what the Tier 2s can do It will be clear in June whether the EGI_DS blueprint provides what

we need

Put together a group to begin to look at fallback plans for Tier 2s?

Page 24: Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008

[email protected] 24

Summary

A number of aspects of WLCG where we see the need for some structuring of dialogue with the Tier 2 federations:

General aspects of Tier 2 coordination/information flow: Information from MB, engagement in GDB

Technical points: Move to SL5/6; pilot jobs; fabric monitoring/tools; what tools do Tier 2s

miss? What is the voice of the Tier 2’s ?

Do we need a group to start looking at Tier 2 fallback plans if EGI_DS does not deliver? And what is the situation in US with OSG?