23
LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE networking support) on behalf of the Ops WG LHCOPN meeting, 2009-01-15, Berlin

LHCOPN operational working group report

  • Upload
    limei

  • View
    24

  • Download
    1

Embed Size (px)

DESCRIPTION

LHCOPN operational working group report. Guillaume Cessieux (FR-CCIN2P3 / EGEE networking support) on behalf of the Ops WG LHCOPN meeting, 2009-01-15, Berlin. Background. LHCOPN meeting in Copenhagen 2008-10-16/17 Test procedures for backup paths agreed Feedbacks requested - PowerPoint PPT Presentation

Citation preview

Page 1: LHCOPN operational working group report

LHCOPN operational working group report

Guillaume Cessieux (FR-CCIN2P3 / EGEE networking support)on behalf of the Ops WG

LHCOPN meeting, 2009-01-15, Berlin

Page 2: LHCOPN operational working group report

Background

• LHCOPN meeting in Copenhagen 2008-10-16/17– Test procedures for backup paths agreed– Feedbacks requested– Roadmap for implementation needed

• One LHCOPN Ops meeting in mid December– http://indico.cern.ch/conferenceDisplay.py?confId=44050

– Very productive (20 actions + 15 actions for GGUS)

GCX - LHCOPN meeting - 2009-01-15 2

Page 3: LHCOPN operational working group report

Agenda

• The operational model itself– Main feedbacks reviewed– Main changes on the model– Areas of weaknesses

• Implementation status and updates– Tools– Roadmap– Pending & next steps

GCX - LHCOPN meeting - 2009-01-15 3

Page 4: LHCOPN operational working group report

1- THE OPERATIONAL MODELhttps://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel

GCX - LHCOPN meeting - 2009-01-15 4

Page 5: LHCOPN operational working group report

Ops model in one slide

• Federated with key responsibilities on T1s– On top of what currently exists

• Information centralised (twiki & GGUS)

GCX - LHCOPN meeting - 2009-01-15 5

Site A Site BNREN A * NREN B NREN C

LHCOPN TTS(GGUS) All sites

12

3

Users

4

Page 6: LHCOPN operational working group report

Overview of sites feedbacks

GCX - LHCOPN meeting - 2009-01-15 6

Site Remark

CA-TRIUMF No major issue

CH-CERN Ops wg member

DE-KIT Ops wg member

ES-PIC Ops wg member

FR-CCIN2P3 Ops wg member

IT-INFN-CNAF No answer

NDGF

NL-T1

TW-ASGC No clear agreement

UK-T1-RAL Ops wg member & confirmed

US-FNAL-CMS No answer

US-T1-BNL No answer

Page 7: LHCOPN operational working group report

• Fear of additional load for small events– Wise thresholds [> 1 hour || > 5 times an hour]

• Lack of accuracy of the Ops model on twiki– Initially a high level view– Point us what is still not enough detailed

• Many details– Open tickets and then investigate - or the contrary– Flexible model

GCX - LHCOPN meeting - 2009-01-15 7

Sites feedbacks

Page 8: LHCOPN operational working group report

Network providers feedbacks• Where is the E2ECU?• Hard to understand the twiki

– Balance between complexity and accuracy

• Low robustness • Federated model cannot work seriously in a stable mode• Inappropriate way to operate such a network• Hot potatoes, cost, distributed ownership of trouble• “You are not prepared for the worst”

– Responsibilities will be highlighted based on cost model

GCX - LHCOPN meeting - 2009-01-15 8

Page 9: LHCOPN operational working group report

Grid feedbacks

GCX - LHCOPN meeting - 2009-01-15 9

• Communication channel to the Grid to be studied— Different user communities to be targeted— Grid data contacts to be nominated

• Performance issues are very important

Page 10: LHCOPN operational working group report

Changes on the model (1/2)• Much vagueness removed

– Reasonable, major, suitable...

• Notification: No longer all sites but affected ones

• Sample common use cases provided– https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsModelUseCases

• Quality assessment by CH-CERN– When?– Infrastructure and operation

• Suitable data to be availableGCX - LHCOPN meeting - 2009-01-15 10

Page 11: LHCOPN operational working group report

Changes on the model (2/2)

• Responsibilities highlighted– Outages on links between T0 and T1s are of

responsibility of T1s (which ordered the link) – Responsibility for outages on T1-T1 links are being

studied (should be mapped from existing contract by studying costs model: who pays what, where)

– Responsibility for GGUS' ticket is on the unique site which the ticket is assigned to

• « You take responsibilities for what you ordered »

GCX - LHCOPN meeting - 2009-01-15 11

Page 12: LHCOPN operational working group report

Areas of weaknesses

• Robustness to be really ensured– Will sites play the game?– Is quality assessment a sufficient way to be

protected from passivity of sites?

• Grid interactions– They have to provide us clear communication

channels

GCX - LHCOPN meeting - 2009-01-15 12

Page 13: LHCOPN operational working group report

2 –IMPLEMENTATION

GCX - LHCOPN meeting - 2009-01-15 13

Page 14: LHCOPN operational working group report

Tools status (1/4)

• Global information repository= CERN twiki https://twiki.cern.ch/twiki/bin/view/LHCOPN/WebHome

– Deeply reorganised– With private part

• TTS access details, statistics reports…

• Change management database will be into– https://twiki.cern.ch/twiki/bin/view/LHCOPN/ChangeManagementDatabase

– Acts as LHCOPN’s technical logbookGCX - LHCOPN meeting - 2009-01-15 14

Global web repository(Twiki)

Operational procedures

Operational contacts

Technical information

Change management DB

Statistics reports

Page 15: LHCOPN operational working group report

Tools status (2/4)

• LHCOPN trouble ticket system= GGUS dedicated helpdesk

– Access previously opened to the ops working group• First review done and requests sent 2008-12-15• Group certificate?

– Really taking shape– Next release = first production usable release

• 2009-02-01

GCX - LHCOPN meeting - 2009-01-15 15

LHCOPN TTS(GGUS)

Page 16: LHCOPN operational working group report

Tools status (3/4)• Around GGUS

– 15 pending actions• Details but also key things for production use

– E-mail reminders• A weekly reminder of GGUS tickets assigned to a site and

opened• A weekly reminder of GGUS tickets submitted by a site and

still opened

– E-mail notifications• By default only to impacted sitenames and site which the

ticket is assigned (if different)• More notification options: No notification or to all

GCX - LHCOPN meeting - 2009-01-15 16

Page 17: LHCOPN operational working group report

Tools status (4/4)

• LHCOPN Planning/Calendar - Ongoing– Automatic export of GGUS tickets in open

iCalendar standard format (.ics)– And a web instance of the calendar

GCX - LHCOPN meeting - 2009-01-15 17

Page 18: LHCOPN operational working group report

Other

• New link IDs for “hidden” links that can deeply affect the LHCOPN

– DE-KIT-I-II-LHCOPN-001, CH-CERN-I-II-LHCOPN-001, IT-INFN-CNAF-MIL-BOL-LHCOPN-001TW-ASGC-AMS-TPE-LHCOPN-001, TW-ASGC-AMS-CHI-LHCOPN-001, TW-ASGC-CHI-TPE-LHCOPN-001

• Key dependencies: Monitoring– Soon trustable?– ASPDrawer – BGP monitoring

• Deploy it fully, hosted by CERN, integrated within MDM? – cf. tomorrow’s talk

– DownCollector’s LHCOPN flavourhttps://ccenoc.in2p3.fr/DownCollector/

GCX - LHCOPN meeting - 2009-01-15 18

Page 19: LHCOPN operational working group report

Proposed roadmap for implementation

GCX - LHCOPN meeting - 2009-01-15 19

2009

63 4 71 2 5 8 109

Pro

duct

ion

vers

ion

Mod

el c

ompu

lsor

y

Janu

ary’

s LHC

OPN

mee

ting

First

pub

lic re

leas

e of

LHCO

PN T

TS

April

’s LH

COPN

mee

ting

Key i

mpr

ovem

ents

and

adj

uste

men

ts o

f the

mod

elJu

ly’s L

HCOP

N m

eetin

g

First

com

plet

e as

sess

men

t and

fina

l adj

uste

men

ts

LHC

star

tup

Tria

l ver

sion

Mod

el o

ptio

nal

Fina

l pro

duct

ion

vers

ion

Mod

el c

ompu

lsor

y

11

Page 20: LHCOPN operational working group report

Next steps

• Gather GGUS accesses details– Table to be filled on twiki

https://twiki.cern.ch/twiki/bin/view/LHCOPN/TTSdetails

• “Test” tickets, notifications and twiki accesses• Dissemination around the ops model?

– Presentation and “training”?– Target: 12 router operators?– Define KPI

GCX - LHCOPN meeting - 2009-01-15 20

Page 21: LHCOPN operational working group report

PendingOps model:• Finalise implementation, test, disseminate, assess,

improveTools:• GGUS production usable release (2009-02-01)

– And accesses

• CalendarOthers:• Monitoring, quality assessment, unified

authentication

GCX - LHCOPN meeting - 2009-01-15 21

Page 22: LHCOPN operational working group report

Conclusion• Model itself

– Complex high level view, but flexible– Robustness to be ensured– Need commitment from sites

• Can drive improvement of the model

• Implementation– Tools taking shape– Tighten schedule to match potential LHC start-up

GCX - LHCOPN meeting - 2009-01-15 22

Page 23: LHCOPN operational working group report

Questions & discussion

GCX - LHCOPN meeting - 2009-01-15 23