Upload
tamra
View
19
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction. Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore. Why Timing Analysis?. Timing guarantees for real time embedded system Real time scheduling: - PowerPoint PPT Presentation
Citation preview
Accurate Timing Analysisby Modeling Caches, Speculation and their Interaction
Accurate Timing Analysisby Modeling Caches, Speculation and their Interaction
Xianfeng Li Tulika Mitra Xianfeng Li Tulika Mitra Abhik Abhik RoychoudhuryRoychoudhury
National University of SingaporeNational University of Singapore
Why Timing Analysis?Why Timing Analysis?
Timing guarantees for real time embedded system
Real time scheduling: – Worst case bound on execution time– Tasks are guaranteed to be schedulable
irrespective of inputs
Tight bound to avoid idle processor cycles
Extremely important for safety critical systems
Worst Case Execution Time (WCET)Worst Case Execution Time (WCET) Maximum execution time of a program on a
micro-architecture for all possible inputs
Measurement– Execute program for all inputs: impractical– Execute program for selected inputs to get a
lower bound on WCET (Observed WCET)
Analysis– Employ static analysis to compute an upper
bound on WCET (Estimated WCET)
Observed
Actual
Estimated
WCET AnalysisWCET Analysis
Program path analysis [Shaw’89, Healy’98,..]– All possible paths in program are not feasible
Micro-architectural modeling– Dynamically variable instruction execution time
Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..]
Speculative execution (branch prediction) [Mitra’02]
Combined modeling of cache + speculative execution
Speculative ExecutionSpeculative Execution
No Speculative ExecutionNo Speculative Execution
MispredictionMisprediction
Correct predictionCorrect prediction
B
N T
SMisprediction penalty
Cache + Speculation: Destructive Effect Cache + Speculation: Destructive Effect
B
N T
S
Cache Execution
Cache Miss 1:Loading into cache
from speculated path
Cache Miss 2:Loading into cache from correct path
N T&map to same cache block
Destructive Effect: Extra Cache MissesDestructive Effect: Extra Cache Misses
Cache miss penalty (CMP) along speculative path – Fully masked by branch misprediction penalty
(BMP)– Partially masked by BMP
wait for cache miss to be serviced before executing correct path
Cache miss penalty along correct path due to fetch along speculative path
BMP
CMP
BMP
CMP
Cache + Speculation: Constructive EffectCache + Speculation: Constructive Effect
B
N
S
Cache Execution
Cache Miss 1:Loading into cache
from speculated path
Cache Hit:Correct block already
loaded into cache
&
map to same cache block
B S
How serious is the effect?How serious is the effect?
-10 %
-5 %
0 %
5 %
10 %
15 %
20 %
Modelin
g w
ith in
tera
ctio
n/
Modelin
g w
/o in
tera
ctio
n
mat
sum
mat
mult
bsea
rch
fdct fft
dhry
whe
t
Cache Miss Overhead
Technique: Integer Linear ProgrammingTechnique: Integer Linear Programming
Integrate program analysis and micro-architectural modeling in an ILP framework [Li and Malik 1995]
Input:Input:– Control Flow Graph (CFG) of the program Control Flow Graph (CFG) of the program – User provided loop bounds, recursion depth etc.User provided loop bounds, recursion depth etc.– Specification of micro-architectureSpecification of micro-architecture
Objective function: Execution time (maximized)
Constraints– Flow constraints from Control Flow Graph– Constraints from micro-architectural modeling
ILP formulation of instruction cache + speculative exec.
Objective FunctionObjective Function
WCET = (costB × countB + BMP x mispredictionB
+ CMP x missB + mp_delayB)
costB × countB : Execution time of basic block B without cache miss and branch misprediction
BMP x mispredictionB: Penalty due to mispredictions
CMP x missB : Penalty due to cache misses – Includes constructive and destructive effect of
speculation along correct path
mp_delayB : Penalty due to partially masked cache misses along speculative path (variable CMP)
Flow Constraints: Easy !!Flow Constraints: Easy !!
es,1 + e3,1 = count1 = e1,2 + e1,4
e1,2 + e2,2 = count2 = e2,3 + e2,2
e2,3 + e4_3 = count3 = e3,1 + e3,E
e1_4 = count4 = e4,3
Loop bounds: e2,2 100 e3,1 10
B1
B3
Bounds countB
Inflow = Basic Block Execution Count = OutflowBound on maximum loop iterations
B2 B4
Other ConstraintsOther Constraints
Branch misprediction constraints– Bounds mispredictionsB
– Details appeared in an earlier paper Timing Analysis of Embedded Software for
Speculative Processors. T. Mitra, A. Roychoudhury and X. Li. In ACM Intl. Symposium on System Synthesis (ISSS) 2002
Instruction cache miss constraintsInstruction cache miss constraints– Bounds missBounds missB [Li, Malik and Wolfe 1999][Li, Malik and Wolfe 1999]
Modeling Cache-Speculation InteractionModeling Cache-Speculation Interaction
Modify instruction cache miss constraints to Modify instruction cache miss constraints to model model constructive/destructive effect of speculation along correct path
Add additional constraints on mp_delayB : Penalty due to partially masked cache misses along speculative path
Modeling Instruction CacheModeling Instruction Cache
B1 B3
S
E
B1
B3
B2
Cache Conflict Graph
pS_1 p1_3
p3_1 p3_E
Flow among blocks mapping to the same cache line
pS_1 + p3_1 = count1 = p1_3
miss1 = pS_1 + p3_1
B4
Constructive Effect of SpeculationConstructive Effect of Speculation
B1 B3
B1
B3
B2
N
T
N
T
B4
TN
Speculative Path
Correct Path
B3 (2,T)
Miss
Miss
Partially Masked CMP
Constructive Effect of SpeculationConstructive Effect of Speculation
B1 B3
B1
B3
B2
N
T
N
T
B4
TN
Speculative Path
Correct Path
B3 (2,T)
Partially Masked CMP
HitMiss
Miss
miss3 will decrease by the amount of flow between B3 (2,T) and B3
Destructive Effect of SpeculationDestructive Effect of Speculation
B2 B4
B1
B3
B2
N
T
N
T
B4
TN
Speculative Path
Correct Path
B4 (1,N)
Miss
MissPartially Masked CMP
Hit
miss2 will increase by the amount of flow between B4 (1,N) and B2
General Flow Involving Extra NodesGeneral Flow Involving Extra Nodes
n
m (b,X) n1
b
Case 1
Case 3
Case 2
XX
b
b1
XX
YCase 4
m1 (b,X)
m (b,X)
m2 (b1,Y)
Y
Case 2
Additional ConstraintsAdditional Constraints
b
B1
BnBMP
count (mi(b,X)) = misprediction(b, X) - miss (mk
(b,X))
k=1
i-1
CMP > BMP
XX
mp_delay (b, X) = miss (mk(b,X)) × delay (mk
(b,X)) k=1
n
delay (mi(b,X)) = CMP – (BMP - cost (mk
(b, X))k=1
i-1
And some others ….
B2
BenchmarksBenchmarks
PrograProgramm
DescriptionDescription PathsPaths LoopLoopss
matsum Summation of two 100 * 100 matrices
S
matmult Multiplication of two 10 * 10 matrices S
isort Insertion sort of 100-element array
bsearch Binary search of 100 element array
fft 1024-point Fast Fourier Transform S
fdct Fast Discrete Cosine Transform S
dhry Dhrystone benchmark S
des Data Encryption Standard
whet Whetstone benchmark S
djpg Decompress 128 * 96 color JPG image
Experimental MethodologyExperimental Methodology
Observed WCET: simulationObserved WCET: simulation– SimpleScalar cycle-accurate architectural SimpleScalar cycle-accurate architectural
simulatorsimulator– In-order exec, No pipeline, No Data Cache In-order exec, No pipeline, No Data Cache
missesmisses– Branch misprediction penalty = 5 cyclesBranch misprediction penalty = 5 cycles– Cache miss penalty = 10 cyclesCache miss penalty = 10 cycles
Estimated WCET: Prototype analyzerEstimated WCET: Prototype analyzer Input: benchmark in assembly code, Input: benchmark in assembly code, -arch -arch
parameters, loop boundsparameters, loop bounds Output: ILP constraintsOutput: ILP constraints Feed the constraints to CPLEX: a commercial ILP Feed the constraints to CPLEX: a commercial ILP
solversolver
Accuracy (Smaller Benchmarks)Accuracy (Smaller Benchmarks)
Program
WCET Ratio Misprediction
Est/Obs
Cache miss
Est/Obs
Obs Est
matsum 105K
106K
1.00 1.00 1.33
matmult 25.1K
25.6K
1.02 1.05 1.03
isort 48.6K
48.8K
1.00 1.02 1.02
bsearch 506
546
1.07 1.25 1.06
fft 8798
8803
1.00 1.00 1.00
fdct 219K
229K
1.04 1.66 1.19
Accuracy (Larger Benchmarks)Accuracy (Larger Benchmarks)
Program
WCET Ratio
Misprediction
Est/Obs
Cache miss
Est/Obs
Obs Est
dhry 218.6K
232.5K
1.06 0.96 1.18
des 87.4K
96.4K
1.10 2.54 1.07
whet 545.5K
581.5K
1.06 2.81 1.29
djpg 44.9 M
65.2 M
1.44 3.25 1.37
ScalabilityScalability
0.0 s
0.5 s
1.0 s
1.5 s
2.0 s
2.5 s
3.0 s16
32
64
128
256
512
1K
Predictor Table Size
fft dhry des whet
0.0 s
0.5 s
1.0 s
1.5 s
2.0 s
2.5 s
3.0 s
32
64
128
256
512
1K
2K
Cache Size
fft dhry des whet
SummarySummary
Micro-architectural modeling is crucial for Micro-architectural modeling is crucial for tight estimation of Worst Case Execution Time tight estimation of Worst Case Execution Time (WCET)(WCET)
Existing methods typically focus on a single Existing methods typically focus on a single micro-architectural featuremicro-architectural feature– Cache Cache – PipelinePipeline– SpeculationSpeculation
A step towards combining micro-architectural A step towards combining micro-architectural features which effect each otherfeatures which effect each other– Cache misses/hits due to speculationCache misses/hits due to speculation