Upload
job-harmon
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Adrian Jackson, Stephen BoothEPCC
[email protected]+44 131 650 5746
Resource Usage Monitoring and
Accounting
GridSafe AHM 2009 2
Introduction
• Resource usage accounting has long been standard practice
on high-end compute resources.
• Historically less common on smaller systems where it was
easier to apportion costs locally.– This is becoming less viable.
– FEC costing– Grid computing (users no longer local) – Virtualisation
GridSafe AHM 2009 3
GridSAFE
• JISC funded project to build general purpose accounting/monitoring solution.– http://gridsafe.forge.nesc.ac.uk/– Builds on accounting subsystem from SAFE user administration system used
by HPCx/HECToR
• Challenges:– Need to work with wide variety of different local policies.– Need to work with both grids and local HPC resources.
• One solution won’t fit all potential users– Build kit of parts – Pre-built solutions for common deployment scenarios.
• Key aims– Modular design, individual functions can be deployed independently – Behaviour can be customised using plug-ins to implement different service
policies.
GridSafe AHM 2009 4
End Users
• End users are interested in accounting for their own use.– Compare the efficiency of different systems– Compare the cost effectiveness of different systems.– Check resources available
• Often interested in individual jobs as well as overall totals.
GridSafe AHM 2009 5
Resource Providers
• Need to gather the raw
accounting data.– Format depends on the
underlying technology.
• Need to apply local policies– Charges
– Discounts
– Where to charge
• Usage data may be useful for
purposes other than accounting.– Analysing queue wait times.
– Job size profiles.
– May want to keep some of this data private.
GridSafe AHM 2009 6
Research groups/Virtual organisations
• Research groups/VOs need to manage their resources
across all available platforms.– Ideally have all information available in a single place.
• Where all resources reside within a single grid this can be
provided by grid-level accounting.
• Resources may come from multiple grids or independent
resource/ providers.
GridSafe AHM 2009 8
Grid-SAFE core
• Java code with data stored in MySQL database.– Normally run within a tomcat container.
• UsageRecords are treated as a collection of properties
• Highly customisable– Code does not mandate a single format– Can choose which of the available properties to store in database.– Can add new properties for site local concepts– Easily extendable to new types of data
– Storage accounting– Allocation tracking
GridSafe AHM 2009 9
Accounting code
• Plug-in parser modules handle different types of input data.– OGF-UR– SGE– PBS– EGEE JobManager– Etc.
• Plug-in policy modules augment these allowing site local customisation
GridSafe AHM 2009 10
Reporting Portal
• Grid-safe uses XML templates to define reports – Can generate unified reports over multiple data tables containing
different types of data
– Tables/charts
– Parameterised reports (e.g. to select user or project).
• Support reports in multiple formats– PDF HTML CSV
• Performance of report generation a particular issue– Utilise database effectively.
– Use aggregate tables for high throughput systems.
GridSafe AHM 2009 12
Web Services
• Web service interface for access by other services.
• Web service interfaces use OGF-UR XML as common
interchange format.
• RUPI – Resource Usage Publishing Interface– Interface for uploading usage records to a remote repository.– Currently a OGF-RUS-WG proposal
• RUQI – Resource Usage Query Interface– Interface for running queries on a remote repository.– Aim to submit to OGF-RUS-WG
GridSafe AHM 2009 13
Grid level accounting
• Grid accounting is not a solved problem– We are aiming to contribute useful technology not to dictate a solution.
• Different grids are pursuing different architectures– EGEE/NGS hierarchical model
– Data published up tree of repositories
– DEISA distributed model.
– Resource providers run local repositories and control access to data.
– Accounting operations query multiple repositories.
• Some commonality– OGF-UR format generally accepted as common data interchange format.
• Combination of RUPI/RUQI can be used to implement either model.
GridSafe AHM 2009 14
• Actively looking for sites to use the software
• Don’t need to use everything
• http://gridsafe.forge.nesc.ac.uk/