The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented

The Network Weather Service

A Distributed Resource Performance Forecasting Service

for Metacomputing

Rich Wolski, Neil T. Spring and Jim HayesPresented By: Mohammad Al-Saeed

Organization

Introduction Motivation: why the NWS? The NWS: what is the NWS?

Related work NWS system architecture

Design goals System components

NWS components NWS interface Conclusion and future work

Motivation

Searching for the environment that delivers the most

Dynamic nature of metacomputing environments

Adaptive applications Adapt to changing environments Knowledge needed for adaptation

Resource discovery and allocation

The Network Weather Service

A distributed system for producing short-term deliverable performance forecasts

Goal: dynamically measure and forecast the performance deliverable at the application level from a set of network resources

Measurements currently supported: Available fraction of CPU time End-to-end TCP connection time End-to-end TCP network latency End-to-end TCP network bandwidth

Related Work

TReno: performance at transport layer using TCP

Pathchar: bandwidth over a path bprobe/cprobe: bottleneck link speed and

competing traffic Topology-d: uses ping and netperf to find

bandwidth between hosts in a group then analyzes this data to find minimum-cost logical topology

ReMoS: network resource monitoring

NWS System Architecture

Design objectives Scalability: scales to any metacomputing

infrastructure Predictive accuracy: provides accurate

measurements and forecasts Non-intrusiveness: shouldn’t load the

resources it monitors Execution longevity: available all time Ubiquity: accessible from everywhere,

monitors all resources

System Components

Four different component processes Persistent State process: handles storage

of measurements Name Server process: directory server for

the system Sensor processes: measure current

performance of different resources Forecaster process: predicts deliverable

performance of a resource during a given time

NWS Processes

NWS Components

Persistent State Management Naming Server Performance Monitoring: NWS Sensors

CPU Sensor Network Sensor

Sensor Control Cliques: hierarchy and contention Adaptive time-out discovery

Forecasting Forecaster and forecasting models Sample forecaster results

Persistent State Management

All NWS processes are stateless The system state (measurements) are

managed by the PS process: Storage & retrieval of measurements Measurements are time-stamped plain-

text strings Measurements are written to disk

immediately and acknowledged Measurements are stored in a circular

queue of tunable size

Naming Server

Primitive text string directory service for the NWS system

The only component known system-wide Information stored include

Name to IP binding information Group configuration Parameters for various processes

Each process must refresh its registration with the name server periodically

Centralized

Performance Monitoring

Actual monitoring is performed by a set of sensors

Accuracy vs. Intrusiveness A sensor’s life:

{ Register with the NS;

Query the NS for parameters;

Generate conditional test;

Forever { if conditions are met then

{ perform test;

time-stamp results and send them to the PS

refresh registration with the NS }

}

CPU Sensor

Measures available CPU fraction Testing tools:

Unix uptime: reports load average in the past x minutes

Unix vmstat: reports idle-, user- and system-time

Active probes Accuracy:

Results assume a full priority job Doesn’t know the priority of jobs in the

queue

Active Probing Improvements

Measurements produced using uptime

Measurements produced using vmstat

Network Sensor

Carries network-related measurements Testing: using active network probes

Establish and release TCP connections Moving large (small) data to measure

bandwidth (delay) Measures connections with all peer sensors

Problems Accuracy: depends on socket interface Complexity: N2-N tests, collisions

(contention)

Network Sensor Control

Sensors are organized into sensor sets called cliques

Each clique is configurable and has one leader Clique sets are logical, but can be based on

physical topology Leaders are elected using a distributed

election protocol A sensor can participate in many cliques Advantages

Scalability by organizing cliques in a hierarchy Reduce the N2-N Accuracy by more frequent tests

Clique Hierarchy

National

UTenn

SDSCPCL

UCSD

Contention

Each leader maintains a clique token (and time between tokens)

The sensor that has the token performs all its tests then passes the token to the next sensor in the list

Adaptive time-out discovery Tokens have time-out field Tokens have sequence numbers The leader adaptively controls the time-

out

Forecaster Process

A forecasting driver and a set of compile-time prediction modules

Forecasting process: Fetching required measurements from

the PS Passing the time series to each prediction

module Choosing the best returned prediction

Incorporate sophisticated prediction techniques?

Sample Forecaster Results

UC Santa Barbara – Kansas State U.Recorded Bandwidth

UC Santa Barbara – Kansas State U.Forecasted Bandwidth

NWS Interface

C API Quick short-term forecasts for

applications InitForecaster() RequestForecasts()

CGI interface Continuous access to NWS forecasts

through the web Interactively produces graphs for

performance and forecasts http://nws.cs.utk.edu

Sample CGI-Generated Graph

Conclusion and Future Work

NWS is scalable, stable and always available NWS relies on adaptivity to achieve its design

goals NWS is open (adding sensors and forecasting

models) Current forecasting is excellent compared to

powerful sophisticated forecasting techniques Enhancements

Basing the NS on LDAP Automatic clique configuration Forecasting methodologies

Documents

The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented