74
1 Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department University of California Berkeley, CA 94720-1776

Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

Embed Size (px)

DESCRIPTION

Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring. Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department University of California Berkeley, CA 94720-1776. Outline. Web Traffic Measurement Multi-layer Tracing and Analysis - PowerPoint PPT Presentation

Citation preview

Page 1: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

1

Berkeley-Helsinki Summer Course

Lecture #7: Network Measurement and

MonitoringRandy H. Katz

Computer Science Division

Electrical Engineering and Computer Science Department

University of California

Berkeley, CA 94720-1776

Page 2: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

2

Outline

• Web Traffic Measurement• Multi-layer Tracing and Analysis• Network Distance Mapping• SLA Verification• Service Management

Page 3: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

3

Outline

• Web Traffic Measurement• Multi-layer Tracing and Analysis• Network Distance Mapping• SLA Verification• Service Management

Page 4: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

4

Measuring/CharacterizingWeb Traffic

• Motivation for Measurement– Insights into Web site design – Managing Proxies and Servers– Operating IP Networks

• Measurement Process– Monitoring from some network location– Generate measurement records in some format– Preprocessing for subsequent analysis

• Based on Chapter 9, “Web Traffic Measurement,” in Web Protocols and Practice, Krishnamurthy and Rexford, Addison Wesley, Reading, MA, 2001.

Page 5: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

5

Web Measurment

• Content Creators– Measurements of user browsing patterns

» Number of visitors, site stickiness influences advertising revenue

» Optimize for common user sequences» User perceived latency influences server and

placement decisions

• Web Hosting Company– Number of response messages/bytes served influence

load balancing strategy among multiple hosted sites» Mix of busy day sites/busy night sites » Managing persistent connections

– Resource usage influences billing» When to introduce more servers, better connectivity

Page 6: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

6

Web Measurement

• Network Operators– Resource decisions: where to add bandwidth, when to

upgrade links, where to place proxies, caches, how to modify routing within the provider cloud, etc.

– User community: relative mix of clients with low vs. high bandwidth connectivity

• Web/Networking Researchers– Evaluating performance of protocols and software– Drive evolution of protocols, policies, algorithms– Better understanding of Internet traffic dynamics

Page 7: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

7

Measurement Techniques

• Server Logging– Log entry per HTTP request– Requesting client

» Could be a user, a proxy, or a cache—the latter two represent aggregated patterns

» Identified by an IP addresd• Could represent the workload of multiple users• Dynamically assigned addresses not correlated with same

user each time encountered

– Request time– Request/response messages– Coarse grained, aggregated times– NOTE: proxy/cache satisfied requests filtered before

reaching the server– Hard to obtain!

Page 8: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

8

Measurement Techniques

• Proxy Logging– Proxies can be associated with clients or servers, e.g.,

proxy for UC Berkeley vs. proxy for Google– Former provides insights into client behavior

aggregated by administrative domain; more detailed information about individual clients may be available

– Degree of aggregation depends on how close proxy is to clients (close implies small community, far implies large community)

– Limited scope, accesses filtered by browser caches– Hard to obtain!

Page 9: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

9

Measurement Techniques

• Packet Monitoring– Network level logging (HTTP, IP, TCP)– Fine grained time stamping possible– Some requests satisfied from client caches, encrypted

packets could represent collection difficulties– Monitor needs to be placed so as to be able to ease

drop on packets

Page 10: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

10

Measurement Techniques

• Active Measurement– Generate requests in a controlled manner, observe

their performance– Issues:

» Where to locate the modified user agents—geographical placement, quality of connectivity to wide-area network

» What requests to generate—e.g., based on profile of popular web sites

» What measurements to collect—DNS queries, TCP timeouts, proxy interception difficult to distinguish sources of latencies

Page 11: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

11

Inferences from Measurement Data

• Limitation of HTTP Header Information– Incomplete header logging– Heuristics needed to reconstruct behavior from log

• Ambiguous Client/Server Identity– Client identity/unique IP address– Many IP addresses associated with same server

• Inferring User Actions– Difficult to correlate user level actions like mouse clicks

with observed network activity– One click many http requests

• Detecting Resource Modifications– Web level actions typically miss modifications– Incomplete use of Last-Modified and Date fields by

servers

Page 12: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

12

Web Workload Characterization

• Applications of Workload Models– Identifying performance problems

» High latency/low thruput under specific load scenarios

– Benchmarking Web components» Selecting among competing architectures

– Capacity planning» “Right sizing” net b/w, CPU, disk, memory given

expected loads

• Workload Parameters– Protocols: Request method/Response code– Resources: Content type, Resource size, Response size,

Popularity, Modification frequency, Temporal locality, Number of embedded resources

– Users: Session interarrival times, Number of clicks per session, Request interarrival times

Page 13: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

13

Workload Characteristics

• HTTP Requests/Responses– GET method predominates, small number of POSTs (forms),

OK responses– More intelligent protocols for communicating with caches

may change distribution of requests (e.g., HEAD)

• Web Resources– Text and images dominate, increasing audio/video content– Small resource size dominates, average HTML file size is 4-

8 KB, image file size 14 KB, wide variation around the mean implies Pareto distribution (“heavy tailed”)

– Higher b/w connections imply larger web objects over time

• Response Sizes– Users likely to abort large transfers, so median response

size smaller than median resource size; very heavy tail– Effect of higher b/w connections on response size?

Page 14: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

14

Workload Characteristics

• Resource Popularity– Zipf’s Law: a small number of objects are highly

popular– Effectiveness of caching at all levels (client browser

cache, site proxy cache, even DNS name cache)

• Resource Changes– Static content vs. script-based descriptions– Periodic changes (“young die young”)

• Temporal Locality– Correlated access to resources in time

• Embedded Resources– Web pages have median of 8-20 embedded resources,

heavy tailed distribution

Page 15: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

15

Workload Characteristics

• User Behavior– Session and request arrivals

» Infer session via repeated access to same server» Burst of HTTP requests, think time

– Clicks per session» 4-10 clicks on average; distinguish between

“sticky” sites and directory/redirection sites» Heavy user vs. light user

– Request interarrival times» Activity punctuated with think times» Request interarrivals order of 60 seconds

Page 16: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

16

Research Perspectives on Measurement

• Packet monitoring of HTTP traffic• Analyzing Web server logs• Publicly available logs and traces• Measuring multimedia streams

Page 17: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

17

Packet Monitoring of HTTP Traffic

• Tapping a link carrying IP packets• Capturing packets from HTTP transfers• Demux packets into TCP connections• Reconstructing ordered stream of bytes• Extracting HTTP messages from byte

stream• Generating a log of HTTP messages

Page 18: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

18

Analyzing Web Server Logs

• Parsing and Filtering– Logs in multiple formats– Interleaved log records– Timestamp diversity

• Transforming– Remove erroneous records– Diverse formats for URLs, conversion to unique

integers for easier processing

Page 19: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

19

Publicly Available Logs and Traces

• Internet Traffic Archive– http://www.acm.org/sigcomm/ita

• World Wide Web Consortium’s Web Characterization Group Repository

– http://www.purl.org/net/repository

• NLANR– http://ircache.nlanr.net/Cache/

• CAnet Squid logs– http://ardnoc41.canet2.net/cache/

Page 20: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

20

Measuring Multimedia Streams

• Static analysis of multimedia resources– Locating video content at various web sites– Acquiring copies– Computing statistics

• Multimedia server logs– VCR-like operations– User access patterns, frequency of early abort

• Packet monitoring of multimedia streams– Infer session identity from src/dst IP address, port #,

protocol

• Multilayer packet monitoring– Correlation of control and data streams

Page 21: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

21

Probability Distributions in Web Workload Models

• Exponential: Session interarrival times• Pareto:

– Response Sizes (tail of distribution)– Resource Sizes (tail of distribution)– Number of Embedded Images– Request Interarrival Times

• Lognormal:– Response sizes (body of distribution)– Resource sizes (body of distribution)– Temporal locality

• Zipf-like: Resource popularity

Page 22: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

22

Outline

• Web Traffic Measurement• Multi-layer Tracing and Analysis• Network Distance Mapping• SLA Verification• Service Management

Page 23: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

23

Wireless Link Management

• Modeling GSM data network layers– Media access, link, routing, and transport– Validated ns modeling suite and BONES simulator– GSM channel error models from Ericsson

• Reliable Link Protocols– Wireless links have high error rates (> 1%)– Reliable transport protocols (TCP) interpret errors as

congestion» Need tools to determine multi-layer interaction effects» Large amounts of data: 120 bytes/s» Important for design of next generation networks

– One solution: use a reliable link layer (ARQ) protocol» However, retransmissions introduce jitter

– Alternative: use error-resilient algorithms to allow apps to handle corrupted data (only protect network protocol headers)

» Less end-to-end delay, constant jitter, higher throughput

Page 24: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

24

Fixed HostUnix BSDi 3.0

GSMBTS

GSM Network

PSTN

Mobile HostUnix BSDi 3.0

Testbed, Protocols, Tools

Socket InterfaceRTP

PacketizationH.263+ Encoder

UDP / UDP Lite

IP

PPP

RTPDe-Packetization

H.263+ Decoder

UDP / UDP Lite

IP

PPP

Socket Interface

Transparent /Non-transparent

Transparent /Non-transparent

SocketDUMP

RLPDUMP

SocketDUMP

RLPDUMP

MultiTracer

Plotting & Analysis(MATLAB)

Page 25: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

25

MultiTracer Time-Sequence Plots

398000

400000

402000

404000

406000

408000

410000

412000

414000

416000

480 485 490 495 500 505 510 515 520

Bytes

Tim e of Day (sec)

RlpSnd_rst

18 Segm ents

13 Segm entsdropped at

TCP receiver

TcpRcv_ack

TcpSnd_data

TcpSnd_ack

TcpRcv_data

5 Segm entslost due toRLP Reset

Page 26: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

26

Outline

• Web Traffic Measurement• Multi-layer Tracing and Analysis• Network Distance Mapping• SLA Verification• Service Management

Page 27: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

27

Applications of Network Distance Mapping

• Mirror Selection• Cache-infrastructure Configuration• Service Redirection• Service Placement• Overlay Routing/Location

Page 28: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

28

Distance Mapping Framework

• Feasible distance metrics– Number of hops– Latency– Bandwidth

• Continuous measurement• Provide approximate distance information• Continue to operate in the presence of

components changes/failures• Scale the measurement by self-adaptation

Goal:Develop scalable, robust distance information

collection/sharing infrastructure

Page 29: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

29

Distance Mapping Challenges

• Select how may probes/monitors to deploy• Monitor placement• Choose appropriate monitor for given client• Statistically quantify estimation error: e.g,

x% of the estimates within a factor of actual distances

• How stable are these clustering?

Page 30: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

30

IDMaps Project

• Internet-wide infrastructure to collect distance information

• IDMaps provides:– Long-term approximate distances– Distance estimation between any 2 points on the

Internet

• IDMaps does not provide:– End-to-end application-level performance– Available bandwidth or current delay– Characteristics of any specific path

Page 31: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

31

IDMaps Components

• Tracers: autonomous instrumentation boxes

• Tracers measures distance between themselves and to APs

• APs (Address Prefixes): regions of the Internet; Hosts within AP are equi-distant from rest of Internet

Courtesy of IDMaps group

Hosts in AP near tracer

T*T + AP costT = number of tracersAP = number of APs

tracer

Page 32: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

32

IDMaps Architecture

Courtesy of IDMaps group

Page 33: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

33

IDMaps Results and Limitations

• Simulation results on synthetic and static network topology

– Cyan: random selection

– Others: various heuristics & algorithms

Percentage of correct answers

Com

ple

menta

ry d

istributio

n

functio

n

Courtesy of IDMaps group

Page 34: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

34

IDMaps Limitations

• Based on triangulation inequality

• Consider only number of hops

• Ignore the dynamics of Internet, no stability study

D

Clients

MonitorsA

B

C

AB = AC + CD + DB ?

Page 35: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

35

Wide-area Network Measurement and Monitoring

Services

• Layered Architecture– Bottom layer a common core shared across multiple

apps with generic metrics– More application-specific at the top layer

• Modularity– Separation of functionality– Clear definition of interaction between different

layers– Ease of customization and modification

Goal: Understand behavior of Internet/provide adaptation to Internet apps thru monitoring

services

Page 36: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

36

Layered Architecture

Measurement Layer

Measurement Collection, Transformation and Storage Layer

Federation for Sharing Layer

Dissemination Layer

Decision/Design Procedures

What to measure, what tools?Probe placement & density

Pull-/push- based APIs

Application side

Page 37: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

37

Current Focus at Berkeley: Internet “Iso-bar”

• Regions of network that perceive similar performance to the Internet, i.e., spatial correlation

– How to find it without knowing the topology?

• Used to determine # and placement of monitors;High dimensional feature space for iso-bar clustering

– Each host collects distance values to m hosts as m-dim feature vector

– Use K-means for high-dimension clustering– Choose site closest to the cluster center as monitor– Initially m can be the total number of clients, later it may

be the number of representative monitoring sites

Page 38: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

38

Iso-bar Experiments

• Remove triangulation inequality assumption• Stationarity: Predictability of network

properties – temporal correlation– Global stationarity: change of the total number of

clusters– Local stationarity: expand and shrink of each cluster

• Experiements with NLANR Active Measurement Project (AMP) data set

– 119 sites on US and New Zealand– Traceroute between every pair of hosts every minute– Use daily average round-trip time (RTT)– Color the clustered hosts and map them on US map

with longitude and latitude info (imprecise mapping)

Page 39: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

39

Geographic Distribution of NLANR AMP Monitoring

Sites

Page 40: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

40

Underlying Topology of NLANR Sites

Most of the NLANR sites use Abilene Network

Page 41: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

41

Preliminary Clustering Results

Page 42: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

42

Stationarity of Iso-bar• Global stationarity quite good• Local stationarity still under investigation• Will apply more statistical learning methods, e.g., Gaussian

mixture model, kernel methods for clustering and its dynamics• Will evaluate its prediction with real measurement data

Page 43: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

43

Inferring Internet Topology

Goal: Determine hierarchy amongst autonomous

systems(AS) based on types of relationships among them

• Assume two-types of relationships– Provider-Customer– Peer-Peer

• Providers are above customers in the hierarchy; peers mostly in same level in the hierarchy.

• Inferences– 5-level hierarchy in the Internet– Connectivity across levels is strictly non-hierarchical

Page 44: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

44

Inferring Internet Topology

• CAIDA & Mercator– Traceroutes from diff locations to get connectivity– Whois & BGP dumps to find IP addr ownership

• Krishnamurthy et al.– BGP dumps to find IP addr ownership– Use web server logs to cluster IP addrs by behaviour

• GT-ITM– Generated topologies– Useful for testing on specific cases, but not actual Internet

• Our work– BGP dumps to find AS connectivity– BGP dumps to find amount of paths carried by each link– BGP dumps to find AS preferences for links

Page 45: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

45

Page 46: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

46

Inferring Type of Relationship

Assumption: ISPs with high probability do not forward BGP advertisements from its peers or providers to other peers or providers

• Implication: If assumption is completely true, every AS path is “valley-free” (no traversal from peer/provider to customer and back to peer/provider)

• Features of inference algorithm – Collected large # of BGP dumps;

Partial views of Internet from different sources– Assign every AS rank based on every dump;

Apply dominance/clustering rules to find type of relationships

Page 47: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

47

Layers in the Internet

• Layer 0 (Strong Core)– Dense sub-graph(peering links) of the Internet topology

consisting of only Tier-1 ISPs

• Layer 1 (Transit Core)– Consists of all top transit providers/large national ISPs

• Layer 2 (Outer Core)– Last layer where any two ASs have peering relationship

• Layer 3 (Regional)– Collection of regional ISPs that support small customer

base

• Layer 4 (Customers)– Large collection (87%) of ASs that are only customers

Page 48: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

48

Our Findings

• Innercore of 20 AS’s is highly connected

– 271 edges (full clique = 380)

• Full graph has 10918 AS’s

– 24,598 edges out of 119,191,806 possible edges

• Distribution of paths carried by edges

Page 49: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

49

Our Graph of the Core

Page 50: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

50

Quantifying the Layering

Layer # of ASs % # Intra-Layer Edges

# Inter-Layer edges

Strong Core

Transit Core

Outer Core

Regional

Customers

20

162

674

950

8852

0.2

1.5

6.3

9.2

83.0

329 9600

1052 6000

1070

3600

202 2400

0 0

Note: Edges directed from providers to customers; peer-peer links directed both ways

Page 51: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

51

Outline

• Web Traffic Measurement• Multi-layer Tracing and Analysis• Network Distance Mapping• SLA Verification• Service Management

Page 52: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

52

“Trust but Verify”

• Monitoring is integral to SLA verification• Built on top of SNMP Architecture

– SNMP Agents– SNMP Manager– SNMP Protocol (polling/trapping)– Objects and Management Information Bases (MIBs)

Manager

Management Station

SNMPNetwork

Ethernet I/FManaged Element Agent

Managed Node

Agent

Page 53: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

53

Network ConnectivitySLA Monitoring

• Need to monitor availability, traffic (bandwidth, latency) between access routers

• Standard SNMP MIBs– Current interface status (up/down)– Time since last status change– # bytes/packets received/transmitted– # packets discarded/received in error– Length of packet queue

• Not really sufficient for determining connectivity SLA!

Page 54: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

54

Remote Monitoring of IP Network

• RMON Architecture– Manager (SNMP Manager), Probe Points (SNMP

Agents)– Network is a collection of LAN segments;

For each, collect:» Segment statistics (e.g., packet counts)» Host specific statistics» Traffic matrix between hosts on same segment

– Lots of stats can be collected by difficult to correlate across the LAN segments

– Best for finding bottleneck segments and to drive capacity planning

– Not helpful for delays or latency measurements

Page 55: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

55

Monitoring Flows

• Flow: correlated subset of network traffic, e.g., with a common source and destination

• Cisco Proprietary NetFlow Architecture– Flow Collector– Router to collect the flow information– Traffic counts on virtual links

• IETF Real-Time Flow Monitoring– Standardized Flow MIB

Page 56: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

56

Network Monitoring with Active Probing

• Ping Program– Active probing via ICMP echo messages– Determines loss rates and delays

• Traceroute– Path and estimated delay that packet followed in the

IP network– Sends multiple ICMP packets with increasing TTL,

discovering routers due to ICMP TTL expired messages– This can cause high variability in the reported delays

• NTP Sync Messages– Clock offset, round trip delay, dispersion info

exchange

• Various Statistical Probing Schemes– Delays and loss rates

Page 57: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

57

SLA Monitoring Issues

• Client- versus operator-side monitoring and reporting

• Monitoring in multi-class network• Transport- and Application-level monitoring• Monitoring in an overlay network• Monitoring in a multi-service provider

environment (finding “the weakest link”)• Accuracy in monitoring

– Number of measurements, frequency of measurements, stability of results, confidence intervals

Page 58: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

58

Measurement Pointsfor Verifying SLAs

• Distinguish between measuring within service provider cloud and end-to-end between customer nodes

Page 59: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

59

Outline

• Web Traffic Measurement• Multi-layer Tracing and Analysis• Network Distance Mapping• SLA Verification• Service Management

Page 60: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

60

• Server and site availability

• Balanced server and site load

• Rapid change

• Network and application flexibility

• Scalability

• Complex site administration

From Network Management to Service Management

ServerLoad Balancing

Advanced TrafficManagement

• Rapid problem diagnosis/ isolation

• Service level measurement

• Multi-tier resource monitoring

•Preferential Services

•Resource Provisioning

•Self-tuning

•Problemprevention

Service Level Control

Morino, Resonate

Page 61: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

61

Network

Failure 18.2%

• ISP connection down• LAN segment overloaded

Service Reliability is Critical

Systems

Server Failure 20%OS Failure 24.6%

• CPU overloaded• NIC failure

Administration 8.7%

Applications

Failure 28.5%

• Process hung• Slowed database performance

Source: IDC Morino, Resonate

Page 62: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

62

Traditional Traffic Management

• Single tier, single site,service level control

– Higher service levels– Better resource utilization– Multiple features to meet

unique needsContentServers

Internet

Traffic Management

User

Morino, Resonate

Page 63: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

63

Basic LAN Solution Requirements

• Simple load balancing– Establish Virtual IP address (VIP)– Delivers scalable performance,

• Health checks and service monitoring– Look beyond layer 3/4 characteristics– Returned content, response times, etc.– Better information to determine server status– Use traffic management techniques to insulate user

from affects of server or software failure

Morino, Resonate

Page 64: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

64

Advanced LAN Solution Requirements

• Complex traffic management– More intelligent policies for application state management– Enforce sophisticated user based policies– Inspection of application header

» URL parsing - Direct requests to systems with available content

• Functional segregation of Web site

» SSL Session IDs - Requirement to maintain persistence • Maintains application state• Multiple TCS sessions within since SSL session

» Cookies - More precise user identification and classification • Look through proxies, firewalls• Establish preferential services

– Integration with WAN solutions

Advanced Traffic Management Features Require Delayed BindingMorino, Resonate

Page 65: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

65

Delayed Binding Connection Syn

Syn/Ack

Client (Browser) Server (HTTPd)

Data from server

Push (HTTP Get)

Bound to ‘Server’

Dela

yed

Bin

din

g

Imm

edia

te

Bin

din

g

Morino, Resonate

Page 66: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

66

Delayed Binding Issues

• Push Packet contains URL, cookie, all application information (except port number)

• Must read application header to deliver advanced traffic management features

• Delayed Binding is the only way to see the application header before decision is made to ‘bind’ to server

Morino, Resonate

Page 67: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

67

Be Careful What You Wish For...

• Now that you have the header, whatdo you do with it?

• Unstructured format, applicationspecific, might be encrypted

• CPU Sink hole!!!!

• Be sure to watch what happens to throughput when you turn on Delayed Binding features

Morino, Resonate

Page 68: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

68

ContentServers

Internet

Traffic Management

User

App Servers Data Layer

Systems ManagementServer side

instrumentation

Deeper Visibility for Managing

Complex Infrastructures

Multi-tier service level control

• Instrument back-end systems

• Capture health and status

• Diagnose and isolate problems

• Take correctiveaction immediately!

Morino, Resonate

Page 69: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

69

Redundant Site Implementation: Growth and

Failover• Multi-site service level

control• Higher service levels• Better resource utilization

• Not a networking solution• Not a performance issue

– POP persistencedominates issues

WAN Traffic Management

Internet

User

SF

NY

SF

Morino, Resonate

Page 70: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

70

Management and Administration

is CrucialConsolidated view of multiple

sites

• Eases management of complexe-businesses

• Reduces costs associated withundetected problems

Sys Admin

Denver, CO

Enterprise Services Console

SF

NY

SF

Morino, Resonate

Page 71: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

71

Closed-loop Real-Time Controlof IP-based Applications

Policy-Based

Control

Feedback

Feedback

IP-ApplicationTraffic

ManagementFunctions

IP-ApplicationTraffic

ManagementFunctions

Intelligent Service Management

SystemsManagement

Functions

SystemsManagement

Functions

Morino, Resonate

Page 72: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

72

Resonate Case Study

• Central Dispatch– Software-based load balancer for servers on a LAN– Sophisticated policy-driven filtering, redirection, load balancing– Class of service support for server access

• Global Dispatch– Multi-site management, wide-area redirection, disaster recovery– Advanced Traffic Mapping capabilities:

» Sticky/persistent session support and sticky session failover » Directed Traffic Table directs users to predefined POP

– Configurable scheduling based on WAN latency and site load – POP failover handling – Advanced stats: avg. DNS response, POP hit rate, other QoS– Coexistence with existing DNS and load balancing architecture – Pass multiple IP addresses to client for browser-based failover – Weighted round-robin scheduling

http://www.resonate.com

Page 73: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

73

Resonate Case Study

• Commander– End-to-End monitoring

» URL tests, host access tests, HTTP service availability tests» SNMP traps

– Test, statistics, and control features» Gather availability info: site + Web/app/DB servers» Process events: inaccessible file servers, db, net

congestion, etc. for reporting/initialization of user-defined action

– Features:» Rapid identification and resolution of site problems » Multi-tier resource monitoring of site servers » Identify problems before service levels are affected » Identify network trends essential to optimized site planning » User-defined service mgmt policies for automated control

http://www.resonate.com

Page 74: Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring

74

Resonate Case Study

• Automated Control for Policy-Based Problem Resolution

– Sophisticated server-level control policies– Monitors events/processes them according to pre-defined rules &

action(s)– E.g., sending email/electronic pages, script invocation

• Examples of policy-based control include: – Schedule traffic from Web server w/ slow/failed backend app

server – Increase/decrease traffic to server when perf crosses thresholds – Enable backup content server in a Central Dispatch site when

one or more active content servers fail/become too busy – Monitor apps and server processes; restart any that fail

http://www.resonate.com