29
IT-SDC : Support for Distributed Computing XRootD Monitoring Report A.Beche D.Giordano

XRootD Monitoring Report A.Beche D.Giordano

  • Upload
    toni

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

XRootD Monitoring Report A.Beche D.Giordano. Outlines. Talk 1: XRootD M onitoring Dashboard Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future work Summary Talk 2: Beyond XRootD monitoring HTTP/WebDAV integration. - PowerPoint PPT Presentation

Citation preview

Page 1: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC : Support for Distributed Computing

XRootD Monitoring ReportA.Beche

D.Giordano

Page 2: XRootD  Monitoring Report A.Beche D.Giordano

2IT-SDC

Outlines Talk 1: XRootD Monitoring Dashboard

Context Dataflow and deployment model Database: storage & aggregation User interface & use cases Open issues & future work Summary

Talk 2: Beyond XRootD monitoring HTTP/WebDAV integration

10 – April - 14A.Beche – Federated Workshop

Page 3: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

01-JUL-1

2

01-SEP-12

01-NOV-12

01-JAN-13

01-MAR-13

01-MAY-13

01-JUL-1

3

01-SEP-13

01-NOV-13

01-JAN-14

01-MAR-14

05

1015202530354045

Number of sites reporting

# si

tes

XRootD federation monitoring

Activity started during summer 2012 4 sites for FAX, 11 for AAA

Monitoring data increased accordingly

July 2012 March 2014

AAA 606k 43M

FAX 15k 22M

10 – April - 14A.Beche – Federated Workshop 3

Page 4: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Why monitoring ? Understand data flows to estimate

data traffic

Provide information for efficient operations

Identify access patterns and propose data placement strategies

10 – April - 14A.Beche – Federated Workshop 4

Page 5: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Raw

Stats

10 m

inut

es

XRootD monitoring dataflow

FederationGLED

Collector Consumer

WEBAPI

DashboardUI

Externalapplications

real time

UDP

stomp

stomp

asynchronous

ActiveMQ

10 – April - 14A.Beche – Federated Workshop 5

Page 6: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

GLED Deployment model

050

100150200

EOS monitoring data rate

Hz

05

101520

Federation monitoring data rate

Hz

AMQ @ CERNShared cluster

5 nodesAAA

UCSD (16Hz)

EOS CMSCERN (10Hz)

EOS ATLASCERN (150Hz)

FAX USSLAC (9Hz)

FAX EUCERN (1 site)

10 – April - 14A.Beche – Federated Workshop 6

AAA EU

Page 7: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Consolidated dataflow Two usage of these raw data:

Dashboard monitoring XRootD popularity

Now share the same database: Storage optimization Consistency guaranteed

10 – April - 14A.Beche – Federated Workshop 7

Page 8: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

AAA~300 GB

~1B records

Database

FAX~600 GB

~2B records

Daily insert2 GB / 6M rows

Storage Raw, statistics, metadata Tables daily partitioned, no global

indexes

2012-02

2012-04

2012-06

2012-08

2012-10

2012-12

2013-02

2013-04

2013-06

2013-08

2013-10

2013-12

2014-020

100200300400500600700

Database usage growth*

GB

* Indexes excluded

10 – April - 14A.Beche – Federated Workshop 8

Page 9: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Database Raw data aggregation:

Done using PL/SQL procedures Events are unordered Stateless: Full re-computation of touched

bins each time Compute stats from raw data in 10 min bins Aggregate 10 min stats in daily bins

10 – April - 14A.Beche – Federated Workshop 9

Page 10: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Aggregation methods

2pm 3pm 4pm 5pm 6pm 7pm

Transfers 1 0 0 2 1Bytes 10 0 0 15 20

Transfers 1 (1) 1 (0) 2 (0) 3 (2) 1 (1)

Bytes 8 1 14 (9+6) 15 (1+9+5) 5

Easy method

Tran

sfer

s

Adopted method

10 – April - 14A.Beche – Federated Workshop 10

Page 11: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Visualization Interface

10 – April - 14A.Beche – Federated Workshop 11

Page 12: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Pre-defined set of views

10 – April - 14A.Beche – Federated Workshop 12

Page 13: XRootD  Monitoring Report A.Beche D.Giordano

Alexandre Beche 13IT-SDC

Matrix Example

27 May 2013

Matrix showing the remote IO CNAF served in the last hour

• # operations• # bytes • Averaged throughput

Page 14: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Use case exampleUnderstand site access patterns

1. Which sites are reading from FNAL

2. Zoom to a specific site to understand which users are reading

3. Understand which files are read by a user

1

23

2

10 – April - 14A.Beche – Federated Workshop 14

Page 15: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Data popularity XRootD monitoring provides information

about file access patterns: Including non official collections (ie: user

files) Contribute to simplify and make more efficient

the usage of disk resources

Popularity data analytics built on this information: Adopted already for CMS-EOS will be extended to full AAA

10 – April - 14A.Beche – Federated Workshop 15

Page 16: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Archive recommendation for CMS-EOS

Help to manage the disk space of EOS including user space No central bookkeeping system

Unused files: created > 4 months ago, no access in the last 3 months: ~500 TB of space occupied and not used <=> 30% of total for these areas

10 – April - 14A.Beche – Federated Workshop 16

%TB

Page 17: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Open issues Server should provide their site name.

CMS: only 5 sites (to be followed) ATLAS: Done

GLED Collector improvements: Reliability of the service:

Recover time, can be long due to time difference GLED should be operated as a production service

Multi-VOs sites: Discrimination will happened at GLED level

10 – April - 14A.Beche – Federated Workshop 17

Page 18: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Future work Strong requirement from ATLAS to

understand efficiency: Need the concept of error / failure How XRootD server could be instrumented to report it?

Topology Resolution will be based on the new “server site” field

Data-mining activity (2 years of data ~ 1TB)

10 – April - 14A.Beche – Federated Workshop 18

Page 19: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Application usage

20

10

30

15

FAX AAA

10 – April - 14A.Beche – Federated Workshop 19

Page 20: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

HTTP Federation is coming HTTP protocol will be used in the future

XRootD servers can be accessed See Fabrizio’s presentation on xrdhttp

Two kind of accesses: Pure HTTP access (through Apache) HTTP gate to XRootD server

Can’t be monitor in the same way

10 – April - 14A.Beche – Federated Workshop 20

Page 21: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Monitoring XRootD access protocol

XRootD 4 will now reports the user protocol: All the monitoring chain needs to be

updated Dashboard DB and UI are fully readyHTTP

XRootD

10 – April - 14A.Beche – Federated Workshop 21

Page 22: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Site

GLEDcollector

ActiveMQ

JOB

XRootD Federation

XRoo

tD

Site

SE

HTTP/WebDAV federation monitoring

10 – April - 14A.Beche – Federated Workshop 22

Page 23: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Site

GLEDcollector

ActiveMQ

JOB

XRootD Federation

XRoo

tD

Site

SE

HTTP Federation

Site

HTTP/WebDAV federation monitoring

10 – April - 14A.Beche – Federated Workshop 23

Page 24: XRootD  Monitoring Report A.Beche D.Giordano

24IT-SDC

Site

GLEDcollector

ActiveMQ

JOB

JOB

XRootD Federation HTTP Federation

XRoo

tDXrd

HTTP

SiteSite

SE

29 November 2013Alexandre Beche - ITTF

HTTP/WebDAV federation monitoring

Page 25: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Site

GLEDcollector

ActiveMQ

JOB

JOB

JOB

XRootD Federation HTTP Federation

XRoo

tDXrd

HTTP

Apache

SiteSite

SE

HTTP/WebDAV federation monitoring

10 – April - 14A.Beche – Federated Workshop 25

Page 26: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Site

GLEDcollector

ActiveMQ

JOB

JOB

JOB

XRootD Federation HTTP Federation

XRoo

tDXrd

HTTP

Apache

SiteSite

SE

?

HTTP/WebDAV federation monitoring

10 – April - 14A.Beche – Federated Workshop 26

Page 27: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Summary Lots of effort has been put in XRootD monitoring

workflow and dashboard in the last 2 years Reliable system achieved Lots of use cases covered

HTTP Monitoring already started Will require a lot of effort to reach XRootD monitoring level First prototype for pure HTTP monitoring will be ready by

autumn thanks to DPM team

10 – April - 14A.Beche – Federated Workshop 27

Page 28: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Credits Andreeva Julia Cons Lionel Giordano Domenico Saiz Pablo Tadel Matevz Tuckett David Vukotic Ilija The AAA and FAX deployment team ….

10 – April - 14A.Beche – Federated Workshop 28

Page 29: XRootD  Monitoring Report A.Beche D.Giordano

IT-SDC

Useful links AAA Dashboard

http://dashb-cms-xrootd-transfers.cern.ch FAX Dashboard:

http://dashb-atlas-xrootd-transfers.cern.ch CHEP materials

https://indico.cern.ch/abstractDisplay.py?abstractId=101&confId=214784 https://

indico.cern.ch/getFile.py/access?contribId=94&sessionId=6&resId=0&materialId=slides&confId=214784

https://indico.cern.ch/getFile.py/access?contribId=265&sessionId=6&resId=1&materialId=slides&confId=214784

Xbrowse framework: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/XbrowseFramework

Thanks for your attention

10 – April - 14A.Beche – Federated Workshop 29