24
March 2006 Iosif Legrand 1 Iosif Legrand Iosif Legrand California Institute of Technology March 2006 March 2006 An Agent Based, Dynamic Service System to Monitor, An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems Control and Optimize Distributed Systems

Iosif Legrand California Institute of Technology

  • Upload
    berget

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed Systems. March 2006. Iosif Legrand California Institute of Technology. The MonALISA Framework. - PowerPoint PPT Presentation

Citation preview

Page 1: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand1

Iosif LegrandIosif LegrandCalifornia Institute of Technology

March 2006March 2006

An Agent Based, Dynamic Service System to Monitor,An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed SystemsControl and Optimize Distributed Systems

Page 2: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand2

The MonALISA Framework

MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems.

The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.

Page 3: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand3

MonALISA is A Dynamic, Distributed Service Architecture

The framework is based on a hierarchical structure of loosely coupled agents acting as distributed services which are independent & autonomous entities able to discover themselves and to cooperate using a dynamic set of proxies or self describing protocols.

An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems manageable in real time. For an effective use of distributed resources, these services provide adaptability and self-organization.

Page 4: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand4

The MonALISA Discovery System & ServicesThe MonALISA Discovery System & Services

Network of JINI-LUSsNetwork of JINI-LUSsSecure & Public Secure & Public

MonALISA servicesMonALISA services

ProxiesProxies

Clients , HL servicesClients , HL servicesrepositoriesrepositories

Distributed Dynamic Distributed Dynamic Discovery- based on a lease Discovery- based on a lease Mechanism and REN Mechanism and REN

Distributed System Distributed System for gathering and for gathering and Analyzing InformationAnalyzing Information..

Dynamic load balancing Dynamic load balancing Scalability & ReplicationScalability & ReplicationSecuritySecurity AAA for Clients AAA for Clients

Global Services orGlobal Services orClientsClients

Fully Distributed System with no Single Point of Failure

AGENTS

Page 5: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand5

Monitoring Grid sites, Running Jobs, Monitoring Grid sites, Running Jobs, Network Traffic and Topology Network Traffic and Topology

TOPOLOGY

JOBS

ACCOUNTING

Page 6: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand6

Monitoring OSG: Resources, Jobs & AccountingMonitoring OSG: Resources, Jobs & Accounting

42 SITES 42 SITES ~ 4 000 Nodes ( 10 000 CPUs) ~ 4 000 Nodes ( 10 000 CPUs) Thousands of Jobs Thousands of Jobs 60 000 parameters60 000 parameters

Running Jobs Accounting

Page 7: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand7

Monitoring CMS Jobs Monitoring CMS Jobs

Type of Jobs Running Jobs per Site

Integrated No. of Events Rate for Processing Events per Site

Page 8: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand8

FTP Data Transfer between GRID sitesFTP Data Transfer between GRID sites

Total FTP Traffic per VO

Page 9: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand9

Monitoring Internet2 backbone NetworkMonitoring Internet2 backbone Network

Test for a Land Speed Record Test for a Land Speed Record ~ 7 Gb/s in a single TCP stream ~ 7 Gb/s in a single TCP stream

from Geneva to Caltechfrom Geneva to Caltech

Page 10: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand10

The UltraLight Network The UltraLight Network

BNL ESnet IN /OUT

Page 11: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand11

Monitoring The GLORIAD RingMonitoring The GLORIAD Ring

Page 12: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand12

Monitoring Network Topology Monitoring Network Topology Latency, RoutersLatency, Routers

NETWORKS

AS

ROUTERS

Page 13: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand13

Bandwidth Challenge at SC2005Bandwidth Challenge at SC2005

151 Gbs

~ 500 TB Total in 4h

Page 14: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand14

Available Bandwidth Measurements Available Bandwidth Measurements

one-to-one realtime bandwidth estimation.one-to-one realtime bandwidth estimation.

Page 15: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand15

Monitoring the Execution of JobsMonitoring the Execution of Jobs and the Time Evolution and the Time Evolution

SPLIT JOBSSPLIT JOBS

LIFELINES for JOBS

Job Job

Job1

Job2

Job3

Job31

Job32

Summit a Job

DAG

Page 16: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand16

ApMon – Application Monitoring

MonALISAService

MonALISAService

ApMon

ApMon

APPLICATION

APPLICATION

MonitoringData

UDP/XDR

Mbps_out: 0.52 Status: reading

App. Monitoring

MB_inout: 562.4

ApMonConfig

parameter1: value parameter2: value

App. Monitoring

...

Time;IP;procIDMonitoring

Data

UDP/XDR

MonitoringData

UDP/XDR

load1: 0.24 processes: 97

System Monitoring

pages_in: 83

MonALISA

hosts

Config Servlet

Library of APIs (C, C++, Java, Perl. Python) that can be used to send any information to MonALISA services

Flexibility, dynamic configuration, high communication performancedynamic reloading

ApMon configuration generated automatically by a servlet / CGI script

Automated system monitoring

Accounting information

0

10

20

30

40

50

60

70

0 1000 2000 3000 4000 5000 6000

Messages per second

Mon

ALIS

A CP

U Us

age (

%)

No Lost Packages

Page 17: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand17

End User / Client AgentLISA- Localhost Information Service AgentLISA- Localhost Information Service Agent

Authorization Service discovery Local detection of the hardware and software configuration Complete end-system monitoring: Per-process load, I/O and

network throughputs, etc. End-to-end performance measurements Will act as an active listener for all events related with the requests generated

by its local applications.

Page 18: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand18

Optical Switch

Runs a ML Demon Runs a ML Demon

>>ml_path IP1 IP4 “copy file IP4”ml_path IP1 IP4 “copy file IP4”

ML proxy servicesML proxy servicesused in Agent Communicationused in Agent Communication

ML Demon ML Demon

Control and Control and Monitor the Monitor the switchswitch

Optical Switch

Optical Switch

MonALISAML Agent

MonALISAML Agent

MonALISAML Agent

2

1

3

Discovery &Secure Connection

4

MonALISA agents to create on demand MonALISA agents to create on demand on an optical path or treeon an optical path or tree

Time to create a Time to create a path on demand path on demand <1s independent <1s independent of the location of the location and the number and the number of connectionsof connections

Page 19: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand19

Monitoring and Controlling Optical Planes

Port power monitoring

Controlling

Page 20: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand20

Monitoring Optical Switches Monitoring Optical Switches Agents to Create on Demand an Optical PathAgents to Create on Demand an Optical Path

Page 21: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand21

The Functionality of the VINCI System

Layer 3

Layer 2

Layer 1

Site A Site B Site C

MonALISA

ML AgentML Agent

MonALISA

ML AgentML Agent

MonALISA

ML AgentML Agent

ML proxy servicesML proxy services

Agent

Agent

Agent

Agent

ROUTERS

ETHERNETLAN-PHYor WAN-PHY

DWDMFIBER

Agent

Page 22: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand22

Vertical Integration of Services

MPLS/GMPLS/TL1MPLS/GMPLS/TL1

Job

Job

Job1

Job2Job3

Job31

Job32

User

Farms &Data Servers

Networking

Applications

Real Time Correlations & Feedback between Major Layers is Crucial for Dynamic Load Balancing ,

Adaptability and Self-Organization

“On-Demand”, Dynamic and Self Adaptinguse of networking resources

Note: Grid Services will have to Interact & Negotiate with Network Services for Network “Resources”

Page 23: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand23

Major Communities OSG CMS ALICE D0 STAR VRVS LGC RUSSIA SE Europe GRID APAC Grid UNAM Grid

ABILENE ULTRALIGHT GLORIAD LHC Net RoEduNET

Communities using MonALISACommunities using MonALISA

ABILENEABILENE

VRVSVRVS

--

ALICE

CMS-DC04CMS-DC04

Demonstrated at:

SC2003

Telecom 2003

WSIS 2003

SC 2004

I2 2005

TERENA 2005

IGrid 2005

SC 2005

CENIC 2006

MonALISARunning 24 X 7

at 250 SitesCollecting 250,000

parameters in near real-time

Update rate of 25,000 parameter updates per second

Monitoring12,000 computers > 100 WAN Links

Thousands of Grid jobs running con- currently

Page 24: Iosif Legrand California Institute of Technology

March 2006 Iosif Legrand24

The MonALISA Architecture Provides: Distributed Distributed Registration and DiscoveryRegistration and Discovery for Services and Applications. for Services and Applications.

Monitoring all aspects of complex systems :Monitoring all aspects of complex systems : System information for computer nodes and clusters System information for computer nodes and clusters Network information : WAN and LAN Network information : WAN and LAN Monitoring the performance of Applications, Jobs or services Monitoring the performance of Applications, Jobs or services The End User Systems, its performance The End User Systems, its performance Video streaming Video streaming

Can Can interact with any other servicesinteract with any other services to provide in near real-time customized to provide in near real-time customized information based on monitoring datainformation based on monitoring data

Secure, remote Secure, remote administrationadministration for services and applications for services and applications

Agents to supervise applicationsAgents to supervise applications, trigger alarms, restart or reconfigure , trigger alarms, restart or reconfigure them, and to notify other services when certain conditions are detected.them, and to notify other services when certain conditions are detected.

The MonALISA framework is used The MonALISA framework is used to develop higher level decision servicesto develop higher level decision services, , implemented as a distributed network of communicating agents, to perform implemented as a distributed network of communicating agents, to perform global optimization tasks. global optimization tasks.

Graphical User InterfacesGraphical User Interfaces to visualize complex information to visualize complex information