6/05/011 Virginie Galtier, Yannick Carlinet, Kevin L. Mills, Stefan Leigh, and Andrew Rukhin DARPA Active Networks PI Meeting June 5, 2001 Measurement

6/05/01 1

Virginie Galtier, Yannick Carlinet, Kevin L. Mills,

Stefan Leigh, and Andrew Rukhin

DARPA Active Networks PI Meeting

June 5, 2001

Measurement and Modeling for Resource Prediction and Control in Heterogeneous Active Networks

6/05/01 2

Outline of Presentation

Project Quad Chart Why is the problem important?

Thumbnail: How have we tried to solve the problem?

What have we done in the past 12 months? Jun 00 to Dec 00: Demonstrate the application of our work to predict and control CPU usage in active applications (together with GE, Magician, and AVNMP) Jan 01 to Jun 01: Turn the demonstrations into accurately measured, controlled experiments, confirming results, and writing papers and a Ph.D. dissertation

Summary & What’s next? (1) Future Research (Address Failures) (2) Prepare code and documentation for release on the project web site (3) Develop a white-box model

6/05/01 3

Measurement and Modeling for Resource Prediction and Control in Heterogeneous Active Networks

Projected Impact

Effective policy enforcement provides security and reliability essential for well- engineered active nets.

An independent CPU metric permits resource requirements to be expressed across a heterogeneous set of active nodes to support:

• node-level policy enforcement,• static path-level network admission control,• dynamic path-level QoS routing.

GoalDevise and validate a means to express the CPU time

requirements of a mobile-code application in a form that can be meaningfully interpreted among heterogeneous nodes in an active network.

Technical Approach• Propose abstract mathematical models for active

network nodes and active applications.• Validate the abstract models against measurements

from real active nodes and applications.• Prototype validated mechanisms in a node operating

system and evaluate them in live operations.

FY01 Accomplishments• Demonstrated the application of our work to predictive estimation and control of resource requirements for an Active Application. (collaboration with GE research)

• Based on encouraging results from the demonstration, modified Magician and AVNMP to make more precise and accurate measurements, and then conducted controlled experiments to confirm the demonstration.

• Published two papers on the application of our work.

“How Much CPU Time?”

S2

TS2

S3

TS3Sm

TSm

• • •

S0Idle

S1TS1

PS0-S0

TS0-S0

PS0-S2

TS0-S2 PS1-S1 TS1-S1

PS0-S1

TS0-S1

TS1-S0

PS1-S0

PS2-S2

TS2-S2

PS1-S2

TS1-S2

PS2-S1

TS2-S1

PS3-S2TS3-S2

PS2-S3

TS2-S3

PS3-S3

TS3-S3

PS2-Sm

TS2-Sm

PSm-S2

TSm-S2

PS3-Sm

TS3-Sm

PSm-S3

TSm-S3

PSm-S0TSm-S0

PS0-Sm

TS0-Sm

TSm-Sm

PSm-Sm

PS3-S0 T3-S0

PS0-S3

TS0-S3

TS2-S0

PS2-S0

PSm-S1 TSm-S1

PS1-Sm

TS1-Sm

S2

TS2

S3

TS3Sm

TSm

• • •• • •

S0Idle

S1TS1

PS0-S0

TS0-S0

PS0-S2

TS0-S2 PS1-S1 TS1-S1

PS0-S1

TS0-S1

TS1-S0

PS1-S0

PS2-S2

TS2-S2

PS1-S2

TS1-S2

PS2-S1

TS2-S1

PS3-S2TS3-S2

PS2-S3

TS2-S3

PS3-S3

TS3-S3

PS2-Sm

TS2-Sm

PSm-S2

TSm-S2

PS3-Sm

TS3-Sm

PSm-S3

TSm-S3

PSm-S0TSm-S0

PS0-Sm

TS0-Sm

TSm-Sm

PSm-Sm

PS3-S0 T3-S0

PS0-S3

TS0-S3

TS2-S0

PS2-S0

PSm-S1 TSm-S1

PS1-Sm

TS1-Sm

6/05/01 4

Growing Population of MobilePrograms on Heterogeneous Platforms

dlls, dlls, and more dlls

APPLETS &SERVLETS

SCRIPTING ENGINES & LANGUAGES

Python

MOBILEAGENTS

C#

http://www.winbatch.com/winware/wbuse-list.html

http://www.vex.net/parnassus/apyllo.py

http://berlin.inesc.pt/agentspace/main-eng.html

http://www.microsoft.com/isapi/gomscom.asp?target=/

6/05/01 5

Sources of Variability

6112,0424414,7315122,800stat

7314,5605317,5916027,066socketcall

6212,3624314,3945022,609write

6312,6063712,3624319,321read

uspccuspccuspccSystem Call

GreenBlackBlue

6112,0424414,7315122,800stat

7314,5605317,5916027,066socketcall

6212,3624314,3945022,609write

6312,6063712,3624319,321read

uspccuspccuspccSystem Call

GreenBlackBlue

VARIABILITY IN EXECUTION ENVIRONMENT

VARIABILITYIN

SYSTEM CALLS

Processor RAM Persistent

storage

Network

cards

Device

drivers

Scheduler Resources Management

Services

Network

Protocols

Physical layer

Virtual machine layer

OS layer

S1 S2 S3 SnActive Node OS system calls ...

SC1 SC2 SC3 SC4 SCmReal OS system calls ...

AA3

EE2:Magician (java)

AA4AA1AA2

EE1:ANTS (java)

ANodeOS interface layer

Processor RAM Persistent

storage

Network

cards

Device

drivers

Scheduler Resources Management

Services

Network

Protocols

Physical layer

Virtual machine layer

OS layer

S1 S2 S3 SnActive Node OS system calls ...

SC1 SC2 SC3 SC4 SCmReal OS system calls ...

S1 S2 S3 SnActive Node OS system calls ...S1 S2 S3 SnActive Node OS system calls ...

SC1 SC2 SC3 SC4 SCmReal OS system calls ...SC1 SC2 SC3 SC4 SCmReal OS system calls ...

AA3

EE2:Magician (java)

AA4AA3

EE2:Magician (java)

AA4AA1AA2

EE1:ANTS (java)

AA1AA2

EE1:ANTS (java)

AA1AA2

EE1:ANTS (java)

ANodeOS interface layer

ANETS ARCHITECTURE 843

167,830

479

159,412

534

240,269

BenchmarkAvg. CPU usAvg. PCCs

jdk 1.1.6jdk 1.1.6jdk 1.1.6JVM

Linux 2.2.7Linux 2.2.7Linux 2.2.7OS

64 MB128 MB128 MBMemory

PentiumProPentium IIPentium IIProcessor

199 MHz333 MHz450 MHzCPU Speed

GreenBlackBlueTrait

843

167,830

479

159,412

534

240,269

BenchmarkAvg. CPU usAvg. PCCs

jdk 1.1.6jdk 1.1.6jdk 1.1.6JVM

Linux 2.2.7Linux 2.2.7Linux 2.2.7OS

64 MB128 MB128 MBMemory

PentiumProPentium IIPentium IIProcessor

199 MHz333 MHz450 MHzCPU Speed

GreenBlackBlueTrait

6/05/01 6

Statistically Compare Simulation Results against Measured Data

Simulate Model with Monte Carlo Experiment

236.66120.30130.73Route

321.77320.70330.44PingMagician

164.9130.351.90.40Mcast

102.7020.640.90.86PingANTS

Avg. High Per.MeanAvg. High Per.MeanAvg. High Per.MeanAAEE

50 bins-500 reps50 bins-20000 reps100 bins-20000 reps

236.66120.30130.73Route


164.9130.351.90.40Mcast

102.7020.640.90.86PingANTS



Statistically Compare Simulation Results against Measured Data

Simulate Model with Monte Carlo Experiment

236.66120.30130.73Route


164.9130.351.90.40Mcast

102.7020.640.90.86PingANTS



236.66120.30130.73Route


164.9130.351.90.40Mcast

102.7020.640.90.86PingANTS



Trace is a series of system calls and transitions stamped with CPU time use

AA2

EE1:ANTS (java)

read write kill...

ANodeOS interface

OS layer

Physical layer

Generate Execution Trace

Monitor atSystem Calls

in Active Node OS

…begin, user (4 cc), read (20 cc), user (18 cc), write(56 cc), user (5 cc), end

begin, user (2 cc), read (21 cc), user (18 cc), kill (6 cc), user (8 cc), end


begin, user (5 cc), read (20 cc), user (18 cc), write(53 cc), user (5 cc), end

begin, user (2 cc), read (18 cc), user (17 cc), kill (20 cc), user (8 cc), end…


AA2

EE1:ANTS (java)

read write kill...

ANodeOS interface

OS layer

Physical layer



in Active Node OS


AA2

EE1:ANTS (java)

read write kill...read write kill...

ANodeOS interface

OS layer

Physical layer



in Active Node OS

…begin, user (4 cc), read (20 cc), user (18 cc), write(56 cc), user (5 cc), end



begin, user (5 cc), read (20 cc), user (18 cc), write(53 cc), user (5 cc), end

begin, user (2 cc), read (18 cc), user (17 cc), kill (20 cc), user (8 cc), end…

Our Approach in Thumbnail

Scenario A: sequence = “read-write”, probability = 2/5

Scenario B: sequence = “read-kill”, probability = 3/5

Distributions of CPU time in system calls :

GenerateActive Application Model

Distributions of CPU time between system calls :

0 5 10 15 20

0.8

0.2

P

cc

read

write kill

0 5 10 15 20

0.670.33

cc

P

read-kill

write-end

begin-read read-write

kill-end

Scenario A: sequence = “read-write”, probability = 2/5

Scenario B: sequence = “read-kill”, probability = 3/5

Distributions of CPU time in system calls :

GenerateActive Application Model

Distributions of CPU time between system calls :

0 5 10 15 20

0.8

0.2

P

cc

read

write kill

0 5 10 15 20

0.670.33

cc

P

0 5 10 15 20

0.670.33

cc

P

read-kill

write-end

begin-read read-write

kill-end

Scaling AA Models

AA model on node X:read 30 ccuser 10 ccwrite 20 cc

Model of node X:read 40 ccwrite 18 ccuser 13 cc

Model of node Y:read 20 ccwrite 45 ccuser 9 ccscale

AA model on node Y:read 30*20/40 = 15 ccuser 10*9/13 = 7 ccwrite 20*45/18 = 50 cc

Scaling AA Models









6/05/01 7

What have we done in the past 12 months?

Jun 00 to Dec 00: Demonstrated the application of our work to predict and control CPU usage in active applications (together with GE, Magician, and AVNMP)

Jan 01 to Jun 01: Turned the demonstrations into accurately measured, controlled experiments, confirming results, and writing papers and a Ph.D. dissertation

6/05/01 8

Control Demo Revisited

Policy 1: Use CPU time-to-live set to fixed value per packetPolicy 2: Use a CPU usage model, but scaled naively based solely on CPU speedPolicy 3: Use a well-scaled NIST CPU usage model

Naïve Scaling

High Fidelity

Malicious Packet dropped too late (CPU use reached TTL)

TTL

Normal execution time CPU time “stolen”

FastestIntermediate

Node

SlowestIntermediate

Node

Destinationnode

Sendingnode

Maliciouspacket

Goodpackets

Goodpackets

Good packet dropped early(CPU use reached TTL)

TTL

CPU time“wasted”

Additional CPU time needed


TTL



TTL


FastestIntermediate

Node

SlowestIntermediate

Node

Destinationnode

Sendingnode

Maliciouspacket

Goodpackets

Goodpackets

FastestIntermediate

Node

SlowestIntermediate

Node

Destinationnode

Sendingnode

Sendingnode

Maliciouspacket

Goodpackets

Goodpackets


TTL




TTL



6/05/01 9

TTLTTL

Prediction Demo Revisited

Time

L-3

L-2

L-4

AN-5

AN-4 Operational Network

Shadow, Prediction-Overlay Network

L-1 L-3

L-2

L-4

AN-5AN-1

AN-4DP

LP

LPLP

L-1

AN-1

Space

Time

L-3

L-2

L-4

AN-5

AN-4 Operational Network

Shadow, Prediction-Overlay Network

L-1 L-3

L-2

L-4

AN-5AN-1

AN-4DP

LP

LPLP

L-1

AN-1

Space

With the NIST CPU usage model integrated, AVNMP requires fewer rollbacks And so AVNMP can predict CPU usage in the network further into the future

DPDPPredictor

Driver

PP

LP

PP

LP

Sendingnode

FastestIntermediate

Node

Destinationnode

SlowestIntermediate

Node

Green Black Red Yellow

DPDPPredictor

Driver

DPDPPredictor

Driver

PP

LP

PP

LP

PP

LP

PP

LP

Sendingnode

FastestIntermediate

Node

Destinationnode

SlowestIntermediate

Node

Green Black Red Yellow

CPU PredictionCPU Prediction

6/05/01 10

• Completed true integration of NIST LINUX kernel measurement code with Magician and AVNMP control and prediction code (in Java)

• Modified AVNMP MIB to distinguish between CPU vs. packet stimulated rollbacks (and to prevent periodic resetting of values)

• Recalibrated demonstration nodes (and evaluated the calibrations and our models using selected Magician AAs – ping, route, activeAudio – new results given on next slide)

• Ran the two demonstrations again, but this time as controlled experiments (new results given on subsequent slides)

• Wrote two papers describing the control and prediction experiments

• Completed draft dissertation (Virginie Galtier)

From Demo to Experiment

6/05/01 11

Evaluating Scaled AA ModelsPrediction Error Measured when Scaling Application Models between Selected Pairs of Nodes

vs. Scaling with Processor Speeds Alone (MAGICIAN EE)

NEW RESULTS -- MAY 2001 Scaling with

model

Scaling with processors

speeds

AA Node

X Node

Y Mean

Avg. High Perc.

Mean Avg. High Perc.

K B 18 12 49 32 R G 21 32 74 72 R K 3 14 18 16 Y K 8 18 84 81

Pin

g

G Y 4 18 193 160 K Y 16 22 341 404 Y R 2 14 76 75 K B 6 13 13 30 G K 13 11 46 52 R

oute

Y G 2 21 57 58 Y B 11 27 85 83 K Y 13 14 400 399 G Y 9 10 80 143 G B 9 17 74 58 A

udio

Y K 7 12 80 80

6/05/01 12

Experiment #1: Control Execution of Mobile Code

FastestIntermediate

Node

SlowestIntermediate

Node

Destinationnode

Sendingnode

Maliciouspacket

Goodpackets

Goodpackets

Malicious Packet dropped too late (CPU use reached TTL + tolerance)

Needed execution time

CPU time “stolen”

TTL

Good packet dropped early(CPU use reached TTL + tolerance)

TTL

CPU time possibly “wasted” Additional CPU

time neededWhen mobile code CPU usage controlled with fixed allocation or TTL, malicious or “buggy” mobile programs can “steal” substantial CPU cycles, especially on fast nodes

When mobile code CPU usage controlled with fixed allocation or TTL, correctly coded mobile programs can be terminated too soon on slow nodes, wasting substantial CPU cycles

Goals: (1) Show reduced CPU usage by terminating malicious packets earlier AND (2) Show fewer terminations of good packets

6/05/01 13

CPU Control: Experiment Results

Metric Fixed TTL Model NIST CPU Model

Estimated Average CPU Utilization (ms/packet) Fast Intermediate Node

8.15 7.63

(6.3% estimated CPU time saved)

Good Packets Killed (all nodes) 72 (3%) 10 (0.4%)

Parameter Value

Good Packets 2278

Malicious Packets 379

Time-to-Live (ms) 39.87

NIST Model Threshold 99th percentile

36.35

Difference in Per Packet CPU Usage (TTL - NIST Model)

-1

-0.5

0

0.5

1

1.5

Measurement Interval

Savi

ngs

in C

PU U

se (m

s pe

r pac

ket)

Predicted vs. Measured

Predicted CPU time saved (ms/packet) [ANALYSIS]

0.5

Estimated CPU time saved (ms/packet) [EXPERIMENT]

0.52

Summary of Results

Experiment Parameters

Results vs. Prediction

6/05/01 14

Experiment #2: Predict CPU Usage among Heterogeneous Network Nodes

AVNMP AA

SourceNode

FastestIntermediate

Node

DestinationNode

SlowestIntermediate

Node

Active Audio CPU ModelGeneration

Magician AAs

500000 1 1́06 1.5 1́06 2 1́06 2.5 1́06 3 1́06WallclockHmSL

50

100

150

200

Prediction Error Accuracy

Logical

ProcessActiveAudio

AA

Magician EEMIB

AVNMP LPs predictnumber of messagesand CPU use and update predicted MIB values on nodes

Magician EEs update actual MIB values for CPU use andnumber of messages

Active Audio CPU usage model injected into AVNMP LP on each node

Logical

Process

CPUModel

Driver

Process

MessageModel

CPUModel

ActiveAudio

AA

AVNMP LPs determineprediction error, compareagainst tolerance, initiaterollbacks, and display graphs

VirtualMessages

Real Messages

AVNMP AA

SourceNode

FastestIntermediate

Node

DestinationNode

SlowestIntermediate

Node

Active Audio CPU ModelGeneration

Magician AAs

500000 1 1́06 1.5 1́06 2 1́06 2.5 1́06 3 1́06WallclockHmSL

50

100

150

200


500000 1 1́06 1.5 1́06 2 1́06 2.5 1́06 3 1́06WallclockHmSL

50

100

150

200


Logical

ProcessActiveAudio

AA

ActiveAudio

AA

Magician EEMIB

AVNMP LPs predictnumber of messagesand CPU use and update predicted MIB values on nodes

Magician EEs update actual MIB values for CPU use andnumber of messages

Active Audio CPU usage model injected into AVNMP LP on each node

Logical

Process

CPUModel

Logical

Process

CPUModel

Driver

Process

MessageModel

CPUModel

ActiveAudio

AA

ActiveAudio

AA

AVNMP LPs determineprediction error, compareagainst tolerance, initiaterollbacks, and display graphs

VirtualMessages

Real Messages

Goals: (1) Show improved look ahead into virtual time AND (2) Show fewer tolerance rollbacks in the simulation

6/05/01 15

CPU Prediction: Experiment Results

Look Ahead - NIST Model vs. TTL

0

100

200

300

400

500

600

700

800

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93


Lo

gic

al V

irtu

al T

ime

(sec

s)

NIST Model

TTL

Tolerance Rollbacks NIST vs. TTL

0

10

20

30

40

50

60

70

80

90

100

1 4 7

10

13

16

19

22

25

28

31

34

37

40

43

46

49

52

55

58

61

64

67

70

73

76

79

82

85

88

91


Ro

llb

ack

s

NIST Model

TTL

TTL NIST Model

Metric Fast Node

Slow Node

Fast Node

Slow Node

Maximum Look-ahead (s)

265 0 437 28

Tolerance Rollbacks

93 47 67 20

Fast Node Fast Node

Summary of ResultsTTL NIST Model

Parameter Fast Node

Slow Node

Fast Node

Slow Node

Avg. CPU Time

(ms and ccs)

7

2,340,750

7

693,000

3

900,000

16.5

1,633,478

Error Tolerance

+-10% (ccs) 234,075 69,300 90,000 163,347

Avg. Measurement Interval (s) 8.8 12.1 10.1 7

Experiment Parameters

6/05/01 16

Publications Since June 2000PapersY. Carlinet, V. Galtier, K. Mills, S. Leigh, A. Rukhin, “Calibrating an Active Network

Node,” Proceedings of the 2nd Workshop on Active Middleware Services, ACM, August 2000.

V. Galtier, K. Mills, Y. Carlinet, S. Leigh, A. Rukhin, “Expressing Meaningful Processing Requirements among Heterogeneous Nodes in an Active Network,” Proceedings of the 2nd International Workshop on Software Performance, ACM, September 2000.

V. Galtier, K. Mills, Y. Carlinet, S. Bush, and A. Kulkarni, “Predicting and Controlling Resource Usage in a Heterogeneous Active Network”, accepted by 3rd International Workshop on Active Middleware Services, ACM, August 2001.

V. Galtier, K. Mills, Y. Carlinet, S. Bush, and A. Kulkarni, “Predicting Resource Demand in Heterogeneous Active Networks”, accepted by MILCOM 2001, October 2001.

DissertationV. Galtier, Toward finer grain management of computational resources in heterogeneous

active networks (Vers une gestion plus fine des ressources de calcul des réseaux actifs hétérogènes), Henri Poincaré University Nancy I, Advisors: André Schaff and Laurent Andrey, projected graduation October 2001.

Papers available on the project web site: http://w3.antd.nist.gov/active-nets/

http://w3.antd.nist.gov/active-nets/







6/05/01 17

Future Research (& Failures)

Improve Black-box Model (recent failures) Space-Time Efficiency Account for Node-Dependent Conditions Characterize Error Bounds

Investigate Alternate Models White-box Model (currently underway) Lower-Complexity Analytically Tractable Models (original failure) Models that Learn

Investigate Prediction based on Competition Run and Score Competing Predictors for Each Application Reinforce Good Predictors Use Prediction from Best Scoring Model

6/05/01 18

Additional Work Since June 2000Investigating white-box approach to model CPU needs for mobile code

Calibrated CPU Usage

EE Function Avg. Std Dev 99th P

deliverToApp t1 s1 p1

getAddress t2 s2 p2

getCache t3 s3 p3

getDst t4 s4 p4

intValue t5 s5 p5

routeForNode t6 s6 p6

Integer f = (Integer)n.getCache().get(getDst()) if (f != null) { next = f.intValue(); if (n.getAddress() != getDst()) { return n.routeForNode(this, next); } else { return n.deliverToApp(this, dpt); }

Active ApplicationCalibrate EE by Functions

Active Application ModelL = delay (t3 + t4) if (c1) {delay (t5 + t2 + t4) if (c2) delay (t6)} else {delay (t1)}

For each arriving packet,determine conditionsand sum delays basedon EE function calibration.

Statistic AA1 AA2 AA3 AA4 AA5

AveragePredicted 205 164 175 183 333

Measured 150 172 197 250 334

Standard DeviationPredicted 163 61 122 240 96

Measured 156 154 165 351 200

99th PercentilePredicted 318 224 239 484 500

Measured 475 224 255 1455 1493

Some Preliminary Results (using the PLAN EE) – all times are us/packet

Documents

6/05/011 Virginie Galtier, Yannick Carlinet, Kevin L. Mills, Stefan Leigh, and Andrew Rukhin DARPA Active Networks PI Meeting June 5, 2001 Measurement