AMOD report 24 – 30 September 2012

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

Fernando H. Barreiro Megino

CERN IT-ES

Workload

Data transfers

High number of transfer failures

caused by a few NL

> 1M files a day

Tue25 - High load on PanDA Servers

• Average time for DQ2+LFC registration increased dramatically causing high load on PanDA Servers

• Some LFC timings in the logs indicated that the registration slowness was in DQ2

CC writer 1

CC writer 2Number of sessions open on ADCR3 instance. Mostly by ATLAS_LFC_W user

Tue25 - High load on PanDA Servers

• Other observations that came up during the investigation• Some improvements on the LFC client are going to be

discussed during “DB technical meeting on the LFC” on Wednesday 3rd Oct

• PanDA server LFC registration should be activated for all sites in order to avoid individual registrations by the pilot

• aCT registers in bursts without bulk methods: In the LFC logs we saw 4k accesses over 1 hour and only 7 access over another hour

• There were 2 SS machines serving the DE cloud (i.e. the same sites twice) with similar configuration

Thu27- SS callbacks to dashboard piling up

• Initially we thought it was exclusively due to the CERN network intervention• After checking the logs we have seen slow callbacks before the

intervention on different SS machines• D. Tuckett is checking the situation

Other incidents and downtimes

• Monday• New PanDA proxy had not been updated on PanDA Monitor machines (

Savannah: 97737)• INFN-T1 scheduled downtime for ~1 hour

• Tuesday• RAL 6h upgrade to CASTOR 2.1.12-10. Alastair set UK cloud brokeroff

on previous evening

• Thursday• CERN network intervention to replace some switches. Services under

risk were CASTOR, EOS, elog and dashboard. Smooth intervention - NTR.

• Friday• BNL to ASGC transfer errors. Being investigated by both sides during the

weekend. ASGC FTS is blocked to access BNL SRM and routing path is changed. (GGUS:86537)

Other incidents and downtimes (2)

• Sunday • PVSS DCS replication with large delays due to high insertion rate.

DCS expert had to be called on Sunday• RAL had failing jobs due to put errors and transfer errors –

including T0 export. Caused by problem with Stager databases and resolved during Sunday late evening(GGUS:86552)

• Saturday• SS-SARA had CRITICAL errors.

MySQL DB corruption? Problem to be understood by DDM experts.

Acknowledgements

• Except for occasional highlights it has been a very quiet week

• Thanks a lot to • ADCoS expert&shifters, and to the

Comp@P1 shifter for the good work• experts of the different components and

sites for the quick reaction• Alessandro, Ueda for their support

EGI-InSPIRE

Backup slides

NL transfer errors

AMOD report 24 – 30 September 2012

Documents

Amod Bansal

Faizana Amod SIDA E SARCOMA DE KAPOSI EM CRIANÇAS EM ÀFRICA

September 24, 2012

September 24 2009

AMOD GPS Photo Tracker AGL3080

My DNA Amod Kumar IAS Sitapur E-Governance Initiative P2

September 24, 2013

Mauldin September 24

MONDAY, SEPTEMBER 24

Thursday, September 24

JOURNEY OF AMOD G-19 EXT. LIGNITE MINE, RAJPARDI …

Amod Exports - Company Profile -01-06-16

e-Governance effort An e-effort to empower common man ? 11 th September, 2007 at IIM Lucknow By Amod Kumar Special Secretary I.T., UP Govt

24 September quiz

September 24, 2014

September 24, 2010

September 24, 1964

My Dna Amod Kumar Ias Sitapur E Governance Initiative P3

AMOD Report December 3-9, 2012

September 20, 2010 – September 24, 2010