Upload
emery-waller
View
56
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Time-Predictable Execution of Embedded Software on Multi-core Platforms. Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury. Embedded Systems. Real-time Constraints. Hard real-time. Embedded system. Soft real-time. Timing Analysis. - PowerPoint PPT Presentation
Citation preview
TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS
Sudipta Chattopadhyay
under the guidance of A/P Abhik Roychoudhury
1
EMBEDDED SYSTEMS
2
REAL-TIME CONSTRAINTS
3
Embeddedsystem
Hard real-time
Soft real-time
TIMING ANALYSIS
Hard real time systems require absolute timing guarantees System level analysis Single task analysis
Worst case execution time (WCET) analysis An upper bound on execution time for all possible
inputs Sound over-approximation is obtained by static
analysis
4
WCET ANALYSIS
Program Micro-architectural
modeling
Control flow
graph
WCET of basic blocks
constraints
Infeasible path
constraints
Loop bound
Path analysis
WCETboun
d
5
ARCHITECTURE
Core 1 Core n
L1 cache L1 cache
Shared L2 cache
Memory
Shared busResource sharing
6
OVERVIEW
7
Dissertation work(Time-predictable execution in multi-core)
Unified cache
Shared cacheShared cache
+shared bus
A multi-core WCET tool
Cache related preemption delay
analysis
Coherence missmodeling
Shared scratchpadallocation
Core 1 Core n
L1 cache L1 cache
Shared L2 cache
Memory
Shared bus Resource sharing
Main Memory
L1 instruction cache
Instr. accesses
Data accesses
Bus
L1 data cache
L2 unified cache
Processor
Conflicts with different instruction and data memory
blocks
MICRO-ARCHITECTURAL MODELING
pipelinecache
branch predictor shared cache
shared bus
Single Core Multi Core
8
(AI+MC) MC > RTSS’10 = RTSS’10
COMPARISON
9
Work Micro-arch. level
technique
Program level
technique
Precision
Scalability
Classical abstract
interpretation (AI)
AI AI × √
Classical model checking (MC)
MC MC √ ×RTS’00
(aiT, Chronos)AI Integer
linearprogrammi
ng
Can be improve
d
√RTSS’10 AI MC Can be
improved
_
Our approach (AI+MC) Integer linear
programming
> RTS’00 = RTS’00
IMPRECISION IN ABSTRACT INTERPRETATION
p1 p2
Cache state = C1
Cache state = C2
Joined Cache state = C3
10
a
b
b
x
Abstractcache set
Abstractcache set
youngyoung
b Joined cache statePath p1 or path p2?
Joined cache state loses information about path p1 and p2
MODEL CHECKING ALONE ?
A path sensitive search Path sensitive search is expensive – path
explosion Worse, combined with possible cache states
p1 p2
Cache state = C1
Cache state = C2
11
MODEL CHECKING ALONE ?
A path-sensitive search Path sensitive search is expensive – path
explosion Worse, combined with possible cache states
p1p2
12
a
b
young b
x
Abstract LRU cache set
young
a
b
Abstract LRU cache set
young b
xAbstract LRU
cache set
young
State Explosion
CACHE ANALYSIS
Program
Pipelineanalysis
Branch predictormodeling
WCET of basic blocks
constraints
Infeasible path
constraints
Loop bound
IPET
Micro architectural modeling
Path analysis
Cache analysis by
abstract interpretatio
n
Analysisoutcome
Refine by model checker
All checked
Timeout
13
Refinement by model checker can be terminated at any point
Model checker refinement steps are inherently parallel
Each model checker refinement step checks light assertion property
REFINEMENT (INTER-CORE)
14m
m
Task
Cache hit
start
exit
Conflicting task
Cache miss
m1
m2
m
cache
x < y
x == yInfeasible
m1
m2
Spurious
≠m ≠myoung
REFINEMENT (INTER-CORE)
m
m
Task
start
exit
Conflicting task
m1
m2
m
cache
x < y
x == yInfeasible
m1
m2
C_m++Increment
conflict
C_m++
Increment conflict
assert (C_m <= 1)
Verified
m
A Cache Hit
15
young
REFINEMENT (WHY IT WORKS?)
16
Path 2
Cache missm
m
Conflict to mm’
C_m++ Increment conflict
assert (C_m <= 0)
Property
Does not affect the value of
C_m
x < y
x == y
m’
m
EXPERIMENTAL SETUP (CHRONOS TOOLKIT)
17
C sourceGCC
simplescalar Binary code CFG
Micro architectural
modeling
cache pipeline Branchprediction
Micro-architectural constraints
ILP
Flowconstraints
WCET
CBMC
C bounded model checking
EXPERIMENTAL RESULT
18
EXPERIMENTAL RESULT
19
L1 cache L1 cache
Shared L2 cache
WCET
4-way associative, 8 KB
Direct-mapped, 256 bytes
Average time = 70 secs
Tasks
cnt
jfdctint
edn
fir
fdct
ndes
EXTENSION USING SYMBOLIC EXECUTION
Conflicting task
m1
m2
x < y
x == y
m1
m2
C_m++Increment
conflict
C_m++
Increment conflict
assert (C_m <= 1)
x < y
constraint
solver
x = y x = y
x < y x ≥ y
x < y ˄ x = y
unknown
NO
assert (C_m <= 1)
satisfied
abort
20
EXTENSION USING KLEE
21
C sourceGCC
simplescalar Binary code CFG
Micro architectural
modeling
cache pipeline Branchprediction
Micro-architectural constraints
ILP
Flowconstraints
WCET
CBMC/KLEE
A GENERIC FRAMEWORK
Three different architectural/application settings
Intra task(WCET in single core)
Highpriority
Lowpriority
Inter task(Cache Related
Preemption Delay analysis)
cache cache L1 cache L1 cache
Shared L2 cache
Task in Core 1
Task in Core 2
Inter core(WCET in multi-core)
22
Cacheconflict Cache
conflictCacheconflict
MICRO-ARCHITECTURAL MODELING
pipelinecache
branch predictor shared cache
shared bus
Single Core Multi Core
23
TASK-LEVEL INTERFERENCE
Timeline
T3
T2
T1
T1
T2
T3
Task interference graph24
Core 1 Core n
L1 cache L1 cache
Shared L2 cache
T1 T2 T3
Shared bus
Tasks
SHARED CACHE + TDMA SHARED BUS
T1
T2
T3
T4
Core 1slot
Core 2slot
Core 1slot
Core 2slot
T1
T2
T3
T4
L2 missdue to
T2
Disjointlifetime
WAIT
T4
25
Core 1 Core 2
L1 cache L1 cache
Shared L2 cache
Shared bus
Task graphsTime Division Multiple Access (TDMA)
T1 T2
T3 T4
Bus access
Bus access
OVERVIEW OF THE FRAMEWORK
L1 cache analysis
L2 cacheanalysis
Filter
L1 cache analysis
L2 cache analysis
L2 conflict analysisInitial interference
Filter
Bus awareanalysis
WCRT computation
Interference changes ?
Yes
Estimated WCRT
No
Task interference monotonically
decreases
26
EVALUATION (2-CORE)
One core runs statemate another core runs the program under evaluation
27
EVALUATION (4-CORE)
Either runs (edn, adpcm, compress, statemate) or runs (matmult, fir, jfdcint, statemate) in 4 different cores
28
MICRO-ARCHITECTURAL MODELING
pipelinecache
branch predictor
Single Core
Interactions
shared cache
shared bus
Multi Core
29
TIMING ANOMALY (SHARED CACHE)
hit miss
hit hit missmiss
miss miss missmisshit hit hit hit
misshit
May not be the worst case path 30
BASELINE ABSTRACTION – TIMING INTERVAL
Representing each pipeline stage as a timing interval
IF
IF
IF
IF
IF
ID
ID
ID
ID
ID
EX
EX
EX
EX
EX
WB
WB
WB
WB
WB
CM
CM
CM
CM
CM
Structural dependency
R1 := R2 + 5
R5 := R1 * R7
R3 := R5 * 5
Contention
A fixed-point analysis derives the timing of each stage as an interval 31
[3,7] [4,10]start finish
latency
[1,3]End = Start + cache miss latency interval
TDMA SHARED BUS ANALYSIS
Time Division Multiple Access (TDMA) Offset abstraction
Core 0 Core 1 Core 0 Core 1
Core 0 Core 1 Core 0 Core 1
T(core 1)
offsetround round
offsetdelayT’
(core 0)
delay = 0
32
LOOP CONSTRUCT
How do we define bus context?
IF
IF
IF
IF
ID
ID
ID
ID
EX
EX
EX
EX
WB
WB
WB
WB
CM
CM
CM
CM
previousiteration
currentiteration
Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change
33
LOOP CONSTRUCT
Bus context flow graph
C1
C2
C3
C4
C5 C3C5
Property: If Ci Cj, then Ci+k Cj+k for any k > 0 34
Ci = bus context of the loop body at i-th iteration
LOOP CONSTRUCT
C1
C2
C3
C4
Compute WCET for each bus context
E(C1) = number of times context C1 is executed
Generate linear constraints:E(C1) + E(C2) + E(C3) + E(C4) ≤ loop boundE(C1) ≥ E(C2)
Bus context flow graph
35
loop bound
Program Micro-architectural
modeling
Control flow graph
WCET of basic blocks
constraints
Infeasible path
constraints
Loop bound
Path analysis
ILPsolve
r
ILP = Integer Linear Programming
BRANCH PREDICTION + CACHE
m’
m
m
Branch location
Maximum number of speculated instructions
JOIN
Unclearcache access
Cachecontent
Cachecontent
36
Cache conflict
EXPERIMENTAL SETUP (CHRONOS TOOLKIT)
C sourceGCC
simplescalar Binary code CFG
Micro architectural
modeling
Private cache
pipeline Branchprediction
Micro-architectural constraints
ILP
Flowconstraints
WCET
Shared cache Shared bus
37
EVALUATION (CACHE + PIPELINE)
jfdctintstatemate
Imprecision of sharedcache analysis
38
Core 1 Core 2
Vertically partition
Core 1
Core 2
Horizontally partition
EVALUATION (CACHE + PIPELINE + SPECULATION)
Imprecision of modelingspeculation
39
EVALUATION (BUS + PIPELINE)
Imprecision of sharedbus analysis
Imprecision of path analysis
40
RECAP
41
Dissertation work(Time-predictable execution in multi-core)
Unified cache
Shared cacheShared cache
+shared bus
A multi-core WCET tool
Cache related preemption delay
analysis
Coherence missmodeling
Shared scratchpadallocation
Core 1 Core n
L1 datacache
L1 data cache
Shared L2 cache
Memory
Shared bus
Coherencemiss traffic
Stale data items
Core 1 Core n
L1 cache L1 cache
Shared L2 cache
High priority task
Low priority taskCache
conflict
Task
c
PE-0 PE-1 PE-N
SPM-0 SPM-1 SPM-N
Shared off-chip data bus
Off-chip memory
External Memory Interface
……
Fast on-chip communication media
PERSPECTIVE
42
Time-predictable execution in single-core
Time-predictable execution in multi-core
Resource sharing(cache and bus)
Data sharing(cache coherence)
Testing Static analysis
Shared cache
Shared bus
Cachecoherence
Customizedhardware
Sharedscratchpad
ARM Cortex A9 MPCoreSamsung Exynos
Nvidia Tegra II(smart phones)
Time Division Multiple Access
Aethreal Network-on-chip
Sony PSPIBM Cell
PERSPECTIVE
Spuriouscounter example
Abstraction
Property
Concrete domain
Verifier
Abstractionrefinement
Functionality Verification
Verified
SLAM
(Microsoft)
BLAST
(UC Berkley)
MAGIC
(CMU) Abstract
domain in abstract
Interpretation (AI)
AI
Concrete domain
May bespurious Generate
Quantitative property
Path-sensitive Verification
Quantitative Verification
Refinement
Anytime
Verificatio
n
of
Quantitative
properties
FUTURE WORK
44Battery life
Mobile devices
x < y
x == y
m1
m2
x < y
x = y x = y
x < y x ≥ y
assert (C_m <= 1)
Symbolic ExecutionStatic performance analysis + testing
Performancetesting
abort
Energy analysis of software
Energy-aware software testing
x < y ˄ x ≠ y
Input
(Quantitative property e.g. cache conflict)
THANK YOU
45
My sincere thanks to all the Examiners and especially the anonymous Examiner 1 for his
comment on symbolic execution