A Unified WCET Analysis Framework for Multi-core
Platforms
Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury
National University of Singapore
Timon Kelter, Peter Marwedel Heiko Falk
TU Dortmund, Germany Ulm University, Germany
RTAS 2012, Beijing1
Timing Analysis
RTAS 2012, Beijing2
Hard real time systems require absolute timing guarantees System level analysis Single task analysis
Worst case execution time (WCET) analysis An upper bound on execution time for all possible
inputs Sound over-approximation is obtained by static
analysis
WCET Analysis
RTAS 2012, Beijing3
Program Micro-architectural
modeling
Control flow
graph
WCET of basic blocks
constraints
Infeasible path
constraints
Loop bound
Path analysis
IPET
IPET = Implicit Path Enumeration Technique
Architecture
RTAS 2012, Beijing4
Core 1 Core n
L1 cache L1 cache
Shared L2 cache
Memory
Shared bus
Micro-architectural Modeling
RTAS 2012, Beijing5
pipelinecache
branch predictor
Single Core
Interactions
shared cache
shared bus
Multi Core
Rosen et. al RTSS’07
Li et. al RTSS’09
Chattopadhyay et. alSCOPES’10
Kelter et. al ECRTS’11
Unified Multi-core timing analysis
Timing Anomaly (shared Cache)
RTAS 2012, Beijing6
hit miss
hit hit missmiss
miss miss missmisshit hit hit hit
misshit
May not be the worst case path
Timing Anomaly (Shared Bus)
RTAS 2012, Beijing7
delaymindelaymax
delaymax
delaymax
delaymin
delaymin
May not be the worst case path
Background
RTAS 2012, Beijing8
Representing each pipeline stage as a timing interval
IF
IF
IF
IF
IF
ID
ID
ID
ID
ID
EX
EX
EX
EX
EX
WB
WB
WB
WB
WB
CM
CM
CM
CM
CM
Structural dependency
R1 := R2 + 5
R5 := R1 * R7
R3 := R5 * 5
Contention
A fixed-point analysis derives the timing of each stage as an interval
[3,7] [4,10]start finish
latency
[1,3]
Shared Cache + Pipeline
RTAS 2012, Beijing9
L1
L2
Abstract interpretation – hit, miss or unclear
Timing interval
T := T + [1, 1]
T := T + [ miss1 + 1, miss1
+ 1]T := T + [miss1 + 1, miss1 + miss2 + 1]T := T + [1, miss1 + miss2
+ 1]
hit
hit
unclearmiss
unclear
hit latency = 1 cyclemiss1 L1 cache miss penaltymiss2 L2 cache miss penalty
(shared)
Shared Bus Analysis
RTAS 2012, Beijing10
Time Division Multiple Access (TDMA) Offset abstraction
Core 0 Core 1 Core 0 Core 1
Core 0 Core 1 Core 0 Core 1
T(core 1)
offsetround round
offsetdelayT’
(core 0)
delay = 0
Shared bus + pipeline
RTAS 2012, Beijing11
IF3
IF1 ID1
ID3
O1 O2
Oin
ID1 IF2
Oin = O1
IF2 ID1
Oin = O2
IF2 ID1
Oin = O1 U O2
(approximate timing by static analysis)IF2 finishes after ID1ID1 finishes after IF2
Property: Offset content monotonically decreases over different iterations
IF2 ID2
Loop Construct
RTAS 2012, Beijing12
C1 C2 C3 C100
Unrolling loop iterations
EXPENSIVE
……
Bus contexts
Ci = bus context of the loop body at i-th iteration
Loop Construct
RTAS 2012, Beijing13
Bus context flow graph
C1
C2
C3
C4
C5 C3C5
Property: If Ci Cj, then Ci+k Cj+k for any k > 0
How do we define bus context?
Loop Construct
RTAS 2012, Beijing14
How do we define bus context?
Bus offsets of all pipeline stagesof all instructions?
There could be thousands of nodes
C1
C2
C3
C4
Bus context flow graph
Loop Construct
RTAS 2012, Beijing15
How do we define bus context?
IF
IF
IF
IF
ID
ID
ID
ID
EX
EX
EX
EX
WB
WB
WB
WB
CM
CM
CM
CM
previousiteration
currentiteration
Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change
Loop Construct
RTAS 2012, Beijing16
C1
C2
C3
C4
Compute WCET for each bus context
Generate ILP flow constraints:
E(C1) + E(C2) + E(C3) + E(C4) ≤ loop boundE(C1) ≥ E(C2)
E(C1) = number of times context C1 is executed
Bus context flow graph
Branch prediction + Cache
RTAS 2012, Beijing17
m’
m
m
Cache conflict
Cache hit
branch correctly predictedbranch incorrectly predicted
m evicted from cache
Cache miss
Branch prediction + Cache
RTAS 2012, Beijing18
m’
m
m
Branch location
Maximum number of speculated instructions
JOIN
Unclearcache access
Cachecontent
Cachecontent
Overall Picture
RTAS 2012, Beijing19
pipelinecache
branch predictor
shared cache
shared bus
Multi Core
WCET of basic blocks
constraints
Infeasible
path constrai
ns
Loop bound
Path analysis
IPET
Bus context
constraints
Experimental Setup (Chronos Toolkit)
RTAS 2012, Beijing20
C sourceGCC
simplescalar Binary code CFG
Micro architectural
modeling
Private cache
pipeline Branchprediction
Micro-architectural constraints
ILP
Flowconstraints
WCET
Shared cache Shared bus
Cache Sharing vs Cache Partitioning
RTAS 2012, Beijing21
8
4
Shared Cachebetween 2 cores
8
4
Core 1 Core 2
Vertically partition
8
Core 1
Core 2
Horizontally partition
4
Evaluation (cache + pipeline)
RTAS 2012, Beijing22
jfdctint statemate
Imprecision of sharedcache analysis
Evaluation (Cache + pipeline + Speculation)
RTAS 2012, Beijing23
Imprecision of modelingspeculation
Evaluation (Bus + pipeline)
RTAS 2012, Beijing24
Imprecision of sharedbus analysis
Imprecision of path analysis
Evaluation (Bus + pipeline + Speculation)
RTAS 2012, Beijing25
Imprecision of sharedbus analysis
Imprecision of path analysis
Conclusion
RTAS 2012, Beijing26
A unified WCET analysis framework Handles interaction of shared cache and bus
with pipeline and branch prediction
Timing anomaly is possible, state explosion is handled by timing interval abstraction
Detailed information of the tool and extensive results are available at: http://www.comp.nus.edu.sg/~rpembed/chronos-multi-
core.html
RTAS 2012, Beijing27
QuestionsThank You