Upload
cecilia-thornton
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
6/05/01 1
Virginie Galtier, Yannick Carlinet, Kevin L. Mills,
Stefan Leigh, and Andrew Rukhin
DARPA Active Networks PI Meeting
June 5, 2001
Measurement and Modeling for Resource Prediction and Control in Heterogeneous Active Networks
6/05/01 2
Outline of Presentation
Project Quad Chart Why is the problem important?
Thumbnail: How have we tried to solve the problem?
What have we done in the past 12 months? Jun 00 to Dec 00: Demonstrate the application of our work to predict and control CPU usage in active applications (together with GE, Magician, and AVNMP) Jan 01 to Jun 01: Turn the demonstrations into accurately measured, controlled experiments, confirming results, and writing papers and a Ph.D. dissertation
Summary & What’s next? (1) Future Research (Address Failures) (2) Prepare code and documentation for release on the project web site (3) Develop a white-box model
6/05/01 3
Measurement and Modeling for Resource Prediction and Control in Heterogeneous Active Networks
Projected Impact
Effective policy enforcement provides security and reliability essential for well- engineered active nets.
An independent CPU metric permits resource requirements to be expressed across a heterogeneous set of active nodes to support:
• node-level policy enforcement,• static path-level network admission control,• dynamic path-level QoS routing.
GoalDevise and validate a means to express the CPU time
requirements of a mobile-code application in a form that can be meaningfully interpreted among heterogeneous nodes in an active network.
Technical Approach• Propose abstract mathematical models for active
network nodes and active applications.• Validate the abstract models against measurements
from real active nodes and applications.• Prototype validated mechanisms in a node operating
system and evaluate them in live operations.
FY01 Accomplishments• Demonstrated the application of our work to predictive estimation and control of resource requirements for an Active Application. (collaboration with GE research)
• Based on encouraging results from the demonstration, modified Magician and AVNMP to make more precise and accurate measurements, and then conducted controlled experiments to confirm the demonstration.
• Published two papers on the application of our work.
“How Much CPU Time?”
S2
TS2
S3
TS3Sm
TSm
• • •
S0Idle
S1TS1
PS0-S0
TS0-S0
PS0-S2
TS0-S2 PS1-S1 TS1-S1
PS0-S1
TS0-S1
TS1-S0
PS1-S0
PS2-S2
TS2-S2
PS1-S2
TS1-S2
PS2-S1
TS2-S1
PS3-S2TS3-S2
PS2-S3
TS2-S3
PS3-S3
TS3-S3
PS2-Sm
TS2-Sm
PSm-S2
TSm-S2
PS3-Sm
TS3-Sm
PSm-S3
TSm-S3
PSm-S0TSm-S0
PS0-Sm
TS0-Sm
TSm-Sm
PSm-Sm
PS3-S0 T3-S0
PS0-S3
TS0-S3
TS2-S0
PS2-S0
PSm-S1 TSm-S1
PS1-Sm
TS1-Sm
S2
TS2
S3
TS3Sm
TSm
• • •• • •
S0Idle
S1TS1
PS0-S0
TS0-S0
PS0-S2
TS0-S2 PS1-S1 TS1-S1
PS0-S1
TS0-S1
TS1-S0
PS1-S0
PS2-S2
TS2-S2
PS1-S2
TS1-S2
PS2-S1
TS2-S1
PS3-S2TS3-S2
PS2-S3
TS2-S3
PS3-S3
TS3-S3
PS2-Sm
TS2-Sm
PSm-S2
TSm-S2
PS3-Sm
TS3-Sm
PSm-S3
TSm-S3
PSm-S0TSm-S0
PS0-Sm
TS0-Sm
TSm-Sm
PSm-Sm
PS3-S0 T3-S0
PS0-S3
TS0-S3
TS2-S0
PS2-S0
PSm-S1 TSm-S1
PS1-Sm
TS1-Sm
6/05/01 4
Growing Population of MobilePrograms on Heterogeneous Platforms
dlls, dlls, and more dlls
APPLETS &SERVLETS
SCRIPTING ENGINES & LANGUAGES
Python
MOBILEAGENTS
C#
6/05/01 5
Sources of Variability
6112,0424414,7315122,800stat
7314,5605317,5916027,066socketcall
6212,3624314,3945022,609write
6312,6063712,3624319,321read
uspccuspccuspccSystem Call
GreenBlackBlue
6112,0424414,7315122,800stat
7314,5605317,5916027,066socketcall
6212,3624314,3945022,609write
6312,6063712,3624319,321read
uspccuspccuspccSystem Call
GreenBlackBlue
VARIABILITY IN EXECUTION ENVIRONMENT
VARIABILITYIN
SYSTEM CALLS
Processor RAM Persistent
storage
Network
cards
Device
drivers
Scheduler Resources Management
Services
Network
Protocols
Physical layer
Virtual machine layer
OS layer
S1 S2 S3 SnActive Node OS system calls ...
SC1 SC2 SC3 SC4 SCmReal OS system calls ...
AA3
EE2:Magician (java)
AA4AA1AA2
EE1:ANTS (java)
ANodeOS interface layer
Processor RAM Persistent
storage
Network
cards
Device
drivers
Scheduler Resources Management
Services
Network
Protocols
Physical layer
Virtual machine layer
OS layer
S1 S2 S3 SnActive Node OS system calls ...
SC1 SC2 SC3 SC4 SCmReal OS system calls ...
S1 S2 S3 SnActive Node OS system calls ...S1 S2 S3 SnActive Node OS system calls ...
SC1 SC2 SC3 SC4 SCmReal OS system calls ...SC1 SC2 SC3 SC4 SCmReal OS system calls ...
AA3
EE2:Magician (java)
AA4AA3
EE2:Magician (java)
AA4AA1AA2
EE1:ANTS (java)
AA1AA2
EE1:ANTS (java)
AA1AA2
EE1:ANTS (java)
ANodeOS interface layer
ANETS ARCHITECTURE 843
167,830
479
159,412
534
240,269
BenchmarkAvg. CPU usAvg. PCCs
jdk 1.1.6jdk 1.1.6jdk 1.1.6JVM
Linux 2.2.7Linux 2.2.7Linux 2.2.7OS
64 MB128 MB128 MBMemory
PentiumProPentium IIPentium IIProcessor
199 MHz333 MHz450 MHzCPU Speed
GreenBlackBlueTrait
843
167,830
479
159,412
534
240,269
BenchmarkAvg. CPU usAvg. PCCs
jdk 1.1.6jdk 1.1.6jdk 1.1.6JVM
Linux 2.2.7Linux 2.2.7Linux 2.2.7OS
64 MB128 MB128 MBMemory
PentiumProPentium IIPentium IIProcessor
199 MHz333 MHz450 MHzCPU Speed
GreenBlackBlueTrait
6/05/01 6
Statistically Compare Simulation Results against Measured Data
Simulate Model with Monte Carlo Experiment
236.66120.30130.73Route
321.77320.70330.44PingMagician
164.9130.351.90.40Mcast
102.7020.640.90.86PingANTS
Avg. High Per.MeanAvg. High Per.MeanAvg. High Per.MeanAAEE
50 bins-500 reps50 bins-20000 reps100 bins-20000 reps
236.66120.30130.73Route
321.77320.70330.44PingMagician
164.9130.351.90.40Mcast
102.7020.640.90.86PingANTS
Avg. High Per.MeanAvg. High Per.MeanAvg. High Per.MeanAAEE
50 bins-500 reps50 bins-20000 reps100 bins-20000 reps
Statistically Compare Simulation Results against Measured Data
Simulate Model with Monte Carlo Experiment
236.66120.30130.73Route
321.77320.70330.44PingMagician
164.9130.351.90.40Mcast
102.7020.640.90.86PingANTS
Avg. High Per.MeanAvg. High Per.MeanAvg. High Per.MeanAAEE
50 bins-500 reps50 bins-20000 reps100 bins-20000 reps
236.66120.30130.73Route
321.77320.70330.44PingMagician
164.9130.351.90.40Mcast
102.7020.640.90.86PingANTS
Avg. High Per.MeanAvg. High Per.MeanAvg. High Per.MeanAAEE
50 bins-500 reps50 bins-20000 reps100 bins-20000 reps
Trace is a series of system calls and transitions stamped with CPU time use
AA2
EE1:ANTS (java)
read write kill...
ANodeOS interface
OS layer
Physical layer
Generate Execution Trace
Monitor atSystem Calls
in Active Node OS
…begin, user (4 cc), read (20 cc), user (18 cc), write(56 cc), user (5 cc), end
begin, user (2 cc), read (21 cc), user (18 cc), kill (6 cc), user (8 cc), end
begin, user (2 cc), read (15 cc), user (8 cc), kill (5 cc), user (9 cc), end
begin, user (5 cc), read (20 cc), user (18 cc), write(53 cc), user (5 cc), end
begin, user (2 cc), read (18 cc), user (17 cc), kill (20 cc), user (8 cc), end…
Trace is a series of system calls and transitions stamped with CPU time use
AA2
EE1:ANTS (java)
read write kill...
ANodeOS interface
OS layer
Physical layer
Generate Execution Trace
Monitor atSystem Calls
in Active Node OS
Trace is a series of system calls and transitions stamped with CPU time use
AA2
EE1:ANTS (java)
read write kill...read write kill...
ANodeOS interface
OS layer
Physical layer
Generate Execution Trace
Monitor atSystem Calls
in Active Node OS
…begin, user (4 cc), read (20 cc), user (18 cc), write(56 cc), user (5 cc), end
begin, user (2 cc), read (21 cc), user (18 cc), kill (6 cc), user (8 cc), end
begin, user (2 cc), read (15 cc), user (8 cc), kill (5 cc), user (9 cc), end
begin, user (5 cc), read (20 cc), user (18 cc), write(53 cc), user (5 cc), end
begin, user (2 cc), read (18 cc), user (17 cc), kill (20 cc), user (8 cc), end…
Our Approach in Thumbnail
Scenario A: sequence = “read-write”, probability = 2/5
Scenario B: sequence = “read-kill”, probability = 3/5
Distributions of CPU time in system calls :
GenerateActive Application Model
Distributions of CPU time between system calls :
0 5 10 15 20
0.8
0.2
P
cc
read
write kill
0 5 10 15 20
0.670.33
cc
P
read-kill
write-end
begin-read read-write
kill-end
Scenario A: sequence = “read-write”, probability = 2/5
Scenario B: sequence = “read-kill”, probability = 3/5
Distributions of CPU time in system calls :
GenerateActive Application Model
Distributions of CPU time between system calls :
0 5 10 15 20
0.8
0.2
P
cc
read
write kill
0 5 10 15 20
0.670.33
cc
P
0 5 10 15 20
0.670.33
cc
P
read-kill
write-end
begin-read read-write
kill-end
Scaling AA Models
AA model on node X:read 30 ccuser 10 ccwrite 20 cc
Model of node X:read 40 ccwrite 18 ccuser 13 cc
Model of node Y:read 20 ccwrite 45 ccuser 9 ccscale
AA model on node Y:read 30*20/40 = 15 ccuser 10*9/13 = 7 ccwrite 20*45/18 = 50 cc
Scaling AA Models
AA model on node X:read 30 ccuser 10 ccwrite 20 cc
Model of node X:read 40 ccwrite 18 ccuser 13 cc
Model of node Y:read 20 ccwrite 45 ccuser 9 ccscale
AA model on node Y:read 30*20/40 = 15 ccuser 10*9/13 = 7 ccwrite 20*45/18 = 50 cc
AA model on node X:read 30 ccuser 10 ccwrite 20 cc
Model of node X:read 40 ccwrite 18 ccuser 13 cc
Model of node Y:read 20 ccwrite 45 ccuser 9 ccscale
AA model on node Y:read 30*20/40 = 15 ccuser 10*9/13 = 7 ccwrite 20*45/18 = 50 cc
6/05/01 7
What have we done in the past 12 months?
Jun 00 to Dec 00: Demonstrated the application of our work to predict and control CPU usage in active applications (together with GE, Magician, and AVNMP)
Jan 01 to Jun 01: Turned the demonstrations into accurately measured, controlled experiments, confirming results, and writing papers and a Ph.D. dissertation
6/05/01 8
Control Demo Revisited
Policy 1: Use CPU time-to-live set to fixed value per packetPolicy 2: Use a CPU usage model, but scaled naively based solely on CPU speedPolicy 3: Use a well-scaled NIST CPU usage model
Naïve Scaling
High Fidelity
Malicious Packet dropped too late (CPU use reached TTL)
TTL
Normal execution time CPU time “stolen”
FastestIntermediate
Node
SlowestIntermediate
Node
Destinationnode
Sendingnode
Maliciouspacket
Goodpackets
Goodpackets
Good packet dropped early(CPU use reached TTL)
TTL
CPU time“wasted”
Additional CPU time needed
Malicious Packet dropped too late (CPU use reached TTL)
TTL
Normal execution time CPU time “stolen”
Malicious Packet dropped too late (CPU use reached TTL)
TTL
Normal execution time CPU time “stolen”
FastestIntermediate
Node
SlowestIntermediate
Node
Destinationnode
Sendingnode
Maliciouspacket
Goodpackets
Goodpackets
FastestIntermediate
Node
SlowestIntermediate
Node
Destinationnode
Sendingnode
Sendingnode
Maliciouspacket
Goodpackets
Goodpackets
Good packet dropped early(CPU use reached TTL)
TTL
CPU time“wasted”
Additional CPU time needed
Good packet dropped early(CPU use reached TTL)
TTL
CPU time“wasted”
Additional CPU time needed
6/05/01 9
TTLTTL
Prediction Demo Revisited
Time
L-3
L-2
L-4
AN-5
AN-4 Operational Network
Shadow, Prediction-Overlay Network
L-1 L-3
L-2
L-4
AN-5AN-1
AN-4DP
LP
LPLP
L-1
AN-1
Space
Time
L-3
L-2
L-4
AN-5
AN-4 Operational Network
Shadow, Prediction-Overlay Network
L-1 L-3
L-2
L-4
AN-5AN-1
AN-4DP
LP
LPLP
L-1
AN-1
Space
With the NIST CPU usage model integrated, AVNMP requires fewer rollbacks And so AVNMP can predict CPU usage in the network further into the future
DPDPPredictor
Driver
PP
LP
PP
LP
Sendingnode
FastestIntermediate
Node
Destinationnode
SlowestIntermediate
Node
Green Black Red Yellow
DPDPPredictor
Driver
DPDPPredictor
Driver
PP
LP
PP
LP
PP
LP
PP
LP
Sendingnode
FastestIntermediate
Node
Destinationnode
SlowestIntermediate
Node
Green Black Red Yellow
CPU PredictionCPU Prediction
6/05/01 10
• Completed true integration of NIST LINUX kernel measurement code with Magician and AVNMP control and prediction code (in Java)
• Modified AVNMP MIB to distinguish between CPU vs. packet stimulated rollbacks (and to prevent periodic resetting of values)
• Recalibrated demonstration nodes (and evaluated the calibrations and our models using selected Magician AAs – ping, route, activeAudio – new results given on next slide)
• Ran the two demonstrations again, but this time as controlled experiments (new results given on subsequent slides)
• Wrote two papers describing the control and prediction experiments
• Completed draft dissertation (Virginie Galtier)
From Demo to Experiment
6/05/01 11
Evaluating Scaled AA ModelsPrediction Error Measured when Scaling Application Models between Selected Pairs of Nodes
vs. Scaling with Processor Speeds Alone (MAGICIAN EE)
NEW RESULTS -- MAY 2001 Scaling with
model
Scaling with processors
speeds
AA Node
X Node
Y Mean
Avg. High Perc.
Mean Avg. High Perc.
K B 18 12 49 32 R G 21 32 74 72 R K 3 14 18 16 Y K 8 18 84 81
Pin
g
G Y 4 18 193 160 K Y 16 22 341 404 Y R 2 14 76 75 K B 6 13 13 30 G K 13 11 46 52 R
oute
Y G 2 21 57 58 Y B 11 27 85 83 K Y 13 14 400 399 G Y 9 10 80 143 G B 9 17 74 58 A
udio
Y K 7 12 80 80
6/05/01 12
Experiment #1: Control Execution of Mobile Code
FastestIntermediate
Node
SlowestIntermediate
Node
Destinationnode
Sendingnode
Maliciouspacket
Goodpackets
Goodpackets
Malicious Packet dropped too late (CPU use reached TTL + tolerance)
Needed execution time
CPU time “stolen”
TTL
Good packet dropped early(CPU use reached TTL + tolerance)
TTL
CPU time possibly “wasted” Additional CPU
time neededWhen mobile code CPU usage controlled with fixed allocation or TTL, malicious or “buggy” mobile programs can “steal” substantial CPU cycles, especially on fast nodes
When mobile code CPU usage controlled with fixed allocation or TTL, correctly coded mobile programs can be terminated too soon on slow nodes, wasting substantial CPU cycles
Goals: (1) Show reduced CPU usage by terminating malicious packets earlier AND (2) Show fewer terminations of good packets
6/05/01 13
CPU Control: Experiment Results
Metric Fixed TTL Model NIST CPU Model
Estimated Average CPU Utilization (ms/packet) Fast Intermediate Node
8.15 7.63
(6.3% estimated CPU time saved)
Good Packets Killed (all nodes) 72 (3%) 10 (0.4%)
Parameter Value
Good Packets 2278
Malicious Packets 379
Time-to-Live (ms) 39.87
NIST Model Threshold 99th percentile
36.35
Difference in Per Packet CPU Usage (TTL - NIST Model)
-1
-0.5
0
0.5
1
1.5
Measurement Interval
Savi
ngs
in C
PU U
se (m
s pe
r pac
ket)
Predicted vs. Measured
Predicted CPU time saved (ms/packet) [ANALYSIS]
0.5
Estimated CPU time saved (ms/packet) [EXPERIMENT]
0.52
Summary of Results
Experiment Parameters
Results vs. Prediction
6/05/01 14
Experiment #2: Predict CPU Usage among Heterogeneous Network Nodes
AVNMP AA
SourceNode
FastestIntermediate
Node
DestinationNode
SlowestIntermediate
Node
Active Audio CPU ModelGeneration
Magician AAs
500000 1 1́06 1.5 1́06 2 1́06 2.5 1́06 3 1́06WallclockHmSL
50
100
150
200
Prediction Error Accuracy
Logical
ProcessActiveAudio
AA
Magician EEMIB
AVNMP LPs predictnumber of messagesand CPU use and update predicted MIB values on nodes
Magician EEs update actual MIB values for CPU use andnumber of messages
Active Audio CPU usage model injected into AVNMP LP on each node
Logical
Process
CPUModel
Driver
Process
MessageModel
CPUModel
ActiveAudio
AA
AVNMP LPs determineprediction error, compareagainst tolerance, initiaterollbacks, and display graphs
VirtualMessages
Real Messages
AVNMP AA
SourceNode
FastestIntermediate
Node
DestinationNode
SlowestIntermediate
Node
Active Audio CPU ModelGeneration
Magician AAs
500000 1 1́06 1.5 1́06 2 1́06 2.5 1́06 3 1́06WallclockHmSL
50
100
150
200
Prediction Error Accuracy
500000 1 1́06 1.5 1́06 2 1́06 2.5 1́06 3 1́06WallclockHmSL
50
100
150
200
Prediction Error Accuracy
Logical
ProcessActiveAudio
AA
ActiveAudio
AA
Magician EEMIB
AVNMP LPs predictnumber of messagesand CPU use and update predicted MIB values on nodes
Magician EEs update actual MIB values for CPU use andnumber of messages
Active Audio CPU usage model injected into AVNMP LP on each node
Logical
Process
CPUModel
Logical
Process
CPUModel
Driver
Process
MessageModel
CPUModel
ActiveAudio
AA
ActiveAudio
AA
AVNMP LPs determineprediction error, compareagainst tolerance, initiaterollbacks, and display graphs
VirtualMessages
Real Messages
Goals: (1) Show improved look ahead into virtual time AND (2) Show fewer tolerance rollbacks in the simulation
6/05/01 15
CPU Prediction: Experiment Results
Look Ahead - NIST Model vs. TTL
0
100
200
300
400
500
600
700
800
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93
Measurement Interval
Lo
gic
al V
irtu
al T
ime
(sec
s)
NIST Model
TTL
Tolerance Rollbacks NIST vs. TTL
0
10
20
30
40
50
60
70
80
90
100
1 4 7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
Measurement Interval
Ro
llb
ack
s
NIST Model
TTL
TTL NIST Model
Metric Fast Node
Slow Node
Fast Node
Slow Node
Maximum Look-ahead (s)
265 0 437 28
Tolerance Rollbacks
93 47 67 20
Fast Node Fast Node
Summary of ResultsTTL NIST Model
Parameter Fast Node
Slow Node
Fast Node
Slow Node
Avg. CPU Time
(ms and ccs)
7
2,340,750
7
693,000
3
900,000
16.5
1,633,478
Error Tolerance
+-10% (ccs) 234,075 69,300 90,000 163,347
Avg. Measurement Interval (s) 8.8 12.1 10.1 7
Experiment Parameters
6/05/01 16
Publications Since June 2000PapersY. Carlinet, V. Galtier, K. Mills, S. Leigh, A. Rukhin, “Calibrating an Active Network
Node,” Proceedings of the 2nd Workshop on Active Middleware Services, ACM, August 2000.
V. Galtier, K. Mills, Y. Carlinet, S. Leigh, A. Rukhin, “Expressing Meaningful Processing Requirements among Heterogeneous Nodes in an Active Network,” Proceedings of the 2nd International Workshop on Software Performance, ACM, September 2000.
V. Galtier, K. Mills, Y. Carlinet, S. Bush, and A. Kulkarni, “Predicting and Controlling Resource Usage in a Heterogeneous Active Network”, accepted by 3rd International Workshop on Active Middleware Services, ACM, August 2001.
V. Galtier, K. Mills, Y. Carlinet, S. Bush, and A. Kulkarni, “Predicting Resource Demand in Heterogeneous Active Networks”, accepted by MILCOM 2001, October 2001.
DissertationV. Galtier, Toward finer grain management of computational resources in heterogeneous
active networks (Vers une gestion plus fine des ressources de calcul des réseaux actifs hétérogènes), Henri Poincaré University Nancy I, Advisors: André Schaff and Laurent Andrey, projected graduation October 2001.
Papers available on the project web site: http://w3.antd.nist.gov/active-nets/
6/05/01 17
Future Research (& Failures)
Improve Black-box Model (recent failures) Space-Time Efficiency Account for Node-Dependent Conditions Characterize Error Bounds
Investigate Alternate Models White-box Model (currently underway) Lower-Complexity Analytically Tractable Models (original failure) Models that Learn
Investigate Prediction based on Competition Run and Score Competing Predictors for Each Application Reinforce Good Predictors Use Prediction from Best Scoring Model
6/05/01 18
Additional Work Since June 2000Investigating white-box approach to model CPU needs for mobile code
Calibrated CPU Usage
EE Function Avg. Std Dev 99th P
deliverToApp t1 s1 p1
getAddress t2 s2 p2
getCache t3 s3 p3
getDst t4 s4 p4
intValue t5 s5 p5
routeForNode t6 s6 p6
Integer f = (Integer)n.getCache().get(getDst()) if (f != null) { next = f.intValue(); if (n.getAddress() != getDst()) { return n.routeForNode(this, next); } else { return n.deliverToApp(this, dpt); }
Active ApplicationCalibrate EE by Functions
Active Application ModelL = delay (t3 + t4) if (c1) {delay (t5 + t2 + t4) if (c2) delay (t6)} else {delay (t1)}
For each arriving packet,determine conditionsand sum delays basedon EE function calibration.
Statistic AA1 AA2 AA3 AA4 AA5
AveragePredicted 205 164 175 183 333
Measured 150 172 197 250 334
Standard DeviationPredicted 163 61 122 240 96
Measured 156 154 165 351 200
99th PercentilePredicted 318 224 239 484 500
Measured 475 224 255 1455 1493
Some Preliminary Results (using the PLAN EE) – all times are us/packet