21
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Grid Observatory: The Grid Observatory: goals and challenges goals and challenges C. Germain-Renaud (CNRS/LRI & LAL) EGEE’07 Conference Budapest, Hungary 1-5 October 2007

The Grid Observatory: goals and challenges

Embed Size (px)

DESCRIPTION

The Grid Observatory: goals and challenges. C. Germain-Renaud (CNRS/LRI & LAL) EGEE’07 Conference Budapest, Hungary 1-5 October 2007. Overview. NA4 cluster in EGEE-III proposal - PowerPoint PPT Presentation

Citation preview

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

The Grid Observatory: goals and The Grid Observatory: goals and challengeschallenges

C. Germain-Renaud (CNRS/LRI & LAL)

EGEE’07 Conference

Budapest, Hungary

1-5 October 2007

Application Track - Grid Observatory 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Overview

• NA4 cluster in EGEE-III proposal

• Integrate the collection of data on the behaviour of the EGEE grid and users with the development of models and of an ontology for the domain knowledge

Application Track - Grid Observatory 3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Some immediate questions

• Ressource allocation– Performance of the gLite scheduling hierarchy– Published waiting time– Reactive grids – Everybody's grid

• Dimensioning– Patterns and trends in requests and usage– Anticipate peaks

• On-line fault management– Detection– Diagnosis– Prevention

Application Track - Grid Observatory 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The big picture

• Considering current technologies, we expect that the total number of device administrators will exceed 220 millions by 2010 – Gartner June 2001

• No more Moore’s Law free lunch: much more complex software & applications

• The Virtual Organization concept creates common goods

Application Track - Grid Observatory 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Autonomic Computing

Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003

– Self-*: configuration, optimization, healing, protection– Of open non steady state dynamic systems

Application Track - Grid Observatory 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Autonomic Computing

Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003

– Self-*: configuration, optimization, healing, protection– Of open non steady state dynamic systems– Academic and industry involved

Application Track - Grid Observatory 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Autonomic Grids

• Statistical analysis• Data mining• Machine learning

monitor

analy

ze

pla

n

execute

knowledge

DATA REQUIRED

Application Track - Grid Observatory 8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Collection and Publication

• Acquisition, consolidation, long-term conservation of traces of EGEE activities – Permanent storage of reliable, exhaustive, filtered information– Exhaustive: added value in snapshots of the inputs and grid

state e.g. workload and available services during a relevant time range

– Filtered: from operational to structured

No join !L&B schema

Application Track - Grid Observatory 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Collection and Publication

• Acquisition, consolidation, long-term conservation of traces of EGEE activities – Permanent storage of reliable, exhaustive, filtered information:

from operational to structured– No monitoring development: rich ecosystem of sources, with

very different scopes, deployment and institutional status– Centralized

• CIC tools (GOCDB, SAM, SFT,…),• core gLite (L&B, BDII,…)• sites (Maui/PBS logs)• gLite integrators (R-GMA, Job

Provenance)• experience integrators

(DashBoard)• external software (MonaLisa)

Application Track - Grid Observatory 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Collection and Publication

• Acquisition, consolidation, long-term conservation of traces of EGEE activities – Permanent storage of reliable, exhaustive, filtered information:

from operational to structured– No monitoring development: rich ecosystem of sources, with

very different scopes, deployment and institutional status

• The major challenge is exhaustive– Some data are outside the scope: external traffic on shared

resources– Inside the scope, we need snapshots of the grid state and inputs – Privacy related legal constraints– Scientific usage will help– Interaction with EGI– Long-term: privacy-preserving data mining

Application Track - Grid Observatory 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Collection and Publication

• Publication service: navigation and querying – Integration of independent sources– Indexing along the needs of the users communities

Scheduling: ongoing work with CoreGrid Jobs: ongoing work with KDUbik

• Ontology– The Glue Information Model: an ontology of the

resources– Concepts for the grid dynamics e.g. job lifecycle or

users relations– Expert concepts as prior knowledge of non-trivial

correlations: workflows, failure modes,…

Resource

Job

Application Track - Grid Observatory 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Models

• Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality– Likely to be similar to IP traffic: many short, and a significant

number of long, at all scales– Long range dependencies

Application Track - Grid Observatory 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Models

• Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality– Likely to be similar to IP traffic: many short, and a significant number

of long, at all scales– Long range dependencies

• Characterizations of middleware-dependant metrics e.g. queuing delays, overhead, SE load

Application Track - Grid Observatory 14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Models

• Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality– Likely to be similar to IP traffic: many short, and a significant number of long, at all

scales– Long range dependencies

• Characterizations of middleware-dependant metrics e.g. queuing delays, SE load

• Inference of models for middleware components and applications, users and usage profiles, users interactions

Application Track - Grid Observatory 15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Autonomic dependability

• On-line failure detection and anticipation• Passive vs Active probing : a lot of information

is available from user work• Black-box

– On-line statistics from « similar » actions (executions, data access, middleware modules)

Application Track - Grid Observatory 16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Evaluation

• Assessing performance at the grid scale is a challenge– Need a snapshot of the inputs and grid state e.g.

workload and available services during a relevant time range

– Classical optimization does not scale– Advanced optimization: anytime algorithms

Application Track - Grid Observatory 17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Abrupt changepoint detection

• Page-Hinckley statistics

• Time-sequential version of Wald’s statistics – also known as CUSUM

• « intelligent threshold » test which minimizes the expected time before a change detection for a fixed false positive rate

• Routine in quality control, clinical trials

VO software bug

Blackhole

Application Track - Grid Observatory 18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Autonomic dependability

• On-line failure detection and anticipation• Passive vs Active probing : a lot of information

is available from user work• Black-box

– On-line statistics from « similar » actions (executions, data access, middleware modules)

• Supervised and unsupervised learning

Application Track - Grid Observatory 19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Mining the L&B logs

Constructive induction

Double clustering

Application Track - Grid Observatory 20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Autonomic dependability

• On-line failure detection and anticipation• Passive vs Active probing : a lot of information

is available from user work• Black-box

– On-line statistics from « similar » actions (executions, data access, middleware modules)

• Supervised and unsupervised learning• Active probing

– Adaptive on-line test selection for best coverage of possibly faulty components

– Experience planning

Application Track - Grid Observatory 21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Goals & Challenges

• Contributions to a quantitative approach of grid middleware and architecture, in the RISC sense

• Operational impacts on EGEE: evaluation, autonomic dependability

• Basic research in autonomic computing• Collaboration between EGEE and national research

initiatives and other UE projects: DEMAIN, PASCAL KD-Ubiq, CoreGrid, and hopefully more

• Adequate tradeoff between productivity and sustainability