26
Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

Embed Size (px)

Citation preview

Page 1: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

Open Science Grid

The OSG Accounting System:GRATIA

byPhilippe Canal (FNAL) & Matteo Melani (SLAC)

Mumbai, India CHEP2006

Page 2: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 2

What is Accounting? (in the Grid context)

Grid accounting is the process of maintaining a (consistent) Grid-wide

view of VO members' resource utilization.[1]

[1] Accounting in Grid Environments, by Peter Gardfjäll, Department of Computing Science, Umeå University ,Sweden

Page 3: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 3

Why do we want an accounting system?

Resource providers (SLAC, Fermilab…) want to perform cost-benefits analysis

Resource providers wants to improve planning

Resource providers want better security

Resource providers want to improve QoS (priorities, debugging…)

Support a Grid “Economic Model”

Page 4: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 4

What is the real problem (solution)?

Nobody talked about “Grid economy”

Do we really want an Accounting system?

Or maybe a monitoring system will do?

Lets look at accounting and monitoring

Page 5: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 5

Accounting vs. Monitoring

A monitoring system:

Purpose: monitoring system health, debugging, system profiling

Gathers state information about the system resources

Collects system events. It works like a DAQ system: as

close as possible to the system, as less intrusive as possible

Quasi Real-time to real-time

An accounting system:

It keeps track of resources usage

It links a users’ service requests with the resources consumed to satisfied that requests

It has accounts, banks, “currency” and support an economic model (policies)

“After the facts”

Page 6: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 6

For Example: Monitoring at SLAC

What do we monitor:

Network Switches, routers status Internet

Mbytes/sec in/out

Computer Clusters Batch systems, NFS and AFS

servers, databases servers

Storage Space Disks usage, HPPS

Some metrics we use: CPU utilization, Memory Disk usage, Disk I/O Various Networking metrics

(Mbytes in/out of switches, routers, servers…)

Some primitive job submission results (LSF)

We use a lot of monitoring tools and infrastructure: Ganglia, Nagios, OpenView, SNTP tools, Monalisa…

Page 7: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 7

For Example: Accounting at SLAC?

The monitoring system cannot link resource usage to users/groups

Maybe by looking into the logs and correlating the events…but a lot

of work

Accounting infrastructures and tools ala Ganglia or Nagios do not

exist

Basically we cannot (yet) fully link a user name with a precise set of

computing resource usage metrics

Page 8: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 8

What I think we should track

Job submission: Priority in the batch queue CPU-time Wall clock time Memory usage

Storage Disk usage, Tape storage usage Storage class (to be defined)

Network data transfer Network speed Quantity of data transferred

Special software usage, Operator/Administrator services…maybe later

Page 9: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 9

Goals

Track services and resources usage per grid user after the fact Focus on quality, integrity and security of the information

Accounting Information easily available to people (web interface)

and to applications (Web Services)

Build a system that is simple to manage (install, configure and

upgrade) and to extends (well defined APIs)

Based on well proven and standard (industrial strength)

technologies

However we do not cover (but keep in mind) User charging system,

Resources or services pricing

Support for an economic model for resource allocation

Page 10: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 10

System Properties

Interoperability The Accounting System should leverage existing standards to maximize interoperability with

other Grids and Accounting Services.

Fault Tolerance Reduce and flag data loss.

Resilient to communication failures over LAN and WAN.

resilient to the failure of one of its component. Security

Guarantees integrity and non–repudiation of the accounting records at the site level. Uses secure communication channels (mutual authentication, message integrity,

confidentiality) and access control lists. Scalability and Performance

Not really an issue Other

leverage existing tools and infrastructures to solve related problems.

Page 11: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 11

Simple Domain Model

Page 12: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 12

Design Direction

We are currently focused on getting the infrastructure right more

than the specific metrics to measure resources usage

Open: we give APIs

Distributed: Meters are distributed objects

Based on open source standard technologies: Web Services, Java

Platform, Tomcat, Axis, Hibernate

Same idea as GUMS and JClarens: the service is an independent

Tomcat Application (JClarens for authentication)

Insure interoperability with OSG partners (LCG, TeraGrid…)

Page 13: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 13

Architecture Overview

Page 14: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 14

Meter

A Meter is responsible for Gathering all the data about a Grid service usage Gathering all the data about the resources used by that Grid service Assembling a Service Usage record

Logically there is 1 Meter entity per 1 Grid Service

Each Meter is composed by one or more Probes and one Assembler (plus some other components for management functions)

Grid Service uses resources distributed across the Resource Provider’s LAN, therefore the Meter is also distributed

Page 15: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 15

Meter Logical View

Page 16: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 16

Meter’s Probe and Assembler

Probes use secure channel (mutual authentication, data integrity) to send usage information to the Assemblers.

Usage information is packaged in ProbeEvents that are send to the Assemblers through a Web Service interface.

Each ProbeEvent object has a standard header and a payload in XML format.

Probes use “at least one semantics” technique to send ProbeEvents to the Assemblers (communication is resilient to failure)

Assemblers can choose synchronous or asynchrous processing of ProbeEvents

Page 17: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 17

Collector

Main functionalities:

Hosting the Meters' components (the Assemblers) that are responsible for

assembling Service Usage Records

Monitoring the Meters' components called Probes

Communication between Probes and Assemblers: routing of ProbesEvents

to the proper Assembler

Communication between Assemblers and Data Store

Page 18: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 18

Collector Logical View

Data Store Component

Page 19: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 19

Accountant

This is a component thought for future use.

Main functionalities:

further process the Service Usage Records to apply economic policy

(pricing & billing)

Page 20: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 20

Deployment View

Deployed as a Tomcat application: can take advantage of Tomcat clustering features for scalability and availability

Collector and Publisher can run on two different Tomcat instance

Can use the most popular database implementations; the database server can be on the same host with Tomcat or on different host

Probes can run anywhere on the LAN

Page 21: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 21

Deployment Diagram

Page 22: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 22

Page 23: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 23

Conclusion

More Information

Project Charter, Requirements and Design Documents

OSG Accounting Twiki page and

Mailing list: [email protected]

Any Questions, Comments, etc?

Page 24: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 24

SPARE SLIDES

Page 25: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 25

Page 26: Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC) Mumbai, India CHEP2006

CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 26

Probe

Collector

Repository of Accounting Records

Data Store Access Layer

Resource Provider Site

WSAPI

Web Presenter

Statistical Analyzer

Probe

Probe

Probe

Collector

Repository of Accounting Records

Grid Operation Center

Probe

Probe

Probe

Collector

Repository of Accounting Records

Data Store Access Layer

VO Center

Web Presenter

Statistical Analyzer

Probe

Probe Data Store

Access Layer

Web Presenter

Statistical Analyzer

Overview