Upload
sharyl-brown
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Fast Simulation Techniques for
Design Space Exploration
Daniel Knorreck,Ludovic Apvrille, Renaud Pacalet
slide 2 2
Outline
DIPLODOCUS basics New simulation strategy Case study MPEG decoder Conclusions and Future Work
slide 3
DIPLODOCUS basics
slide 4
DIPLODOCUS on one slide
Platform for efficient Design Space Exploration of SoCs• Clear separation between
- Applications
- Architecture
- Mapping
• Data abstraction• Control flow oriented• Simulation and formal analysis on abstract models
Our environment is based on• UML as modeling language• LOTOS and UPPAAL for formal analysis• SystemC/C++ for simulation
slide 5
Methodology
Application modeling
Architecture modeling
DSE
mapping
Simulation
Static analysis
Simulation
Static analysis
slide 6
Toolkit: TTool
slide 7
DIPLODOCUS: Task Diagram
Declaration of a taskDeclaration of a task
Event. Used for inter-task signaling.
Type may be infinite FIFO or finite FIFO. When a finite FIFO is
full, the older event is erased. Events may carry values.
Event. Used for inter-task signaling.
Type may be infinite FIFO or finite FIFO. When a finite FIFO is
full, the older event is erased. Events may carry values.
Request. Use to spawn a task if an instance of that task
is not currently executing.
Request. Use to spawn a task if an instance of that task
is not currently executing.
Channel. Do not convey
value: they are meant to
model a number of
exchanged samples.
cha1 = event name
Three channel types:
BR-BW: Blocking Read –
Blocking write (= Finite
FIFO)
BR-NBW: Blocking Read
– Non Blocking Write (=
infinite FIFO)
NBR-NBW: Non
Blocking Read - Non
Blocking Write (= shared
memory)
Channel. Do not convey
value: they are meant to
model a number of
exchanged samples.
cha1 = event name
Three channel types:
BR-BW: Blocking Read –
Blocking write (= Finite
FIFO)
BR-NBW: Blocking Read
– Non Blocking Write (=
infinite FIFO)
NBR-NBW: Non
Blocking Read - Non
Blocking Write (= shared
memory)
slide 8
DIPLODOCUS: Application Modeling
A behavior must be provided for each task
• UML activity diagram
• Usual control operators
- Loops
- Choices
• Channels
- Write x samples on a channel
- Read x samples from a channel
• Events
- Send, receive an event
- Test whether an event may be received
- Select between events
• Requests
- Send a request
slide 9
DIPLODOCUS: Task Behavior
Sending of request req1 with “1” as
natural parameter
Sending of request req1 with “1” as
natural parameter
LoopLoop
Sending of event doneSending of event done
Receiving of one data sample on channel
cha1
Receiving of one data sample on channel
cha1
Loop condition is falseLoop condition is false
Loop condition is trueLoop condition is true
Receiving of one data sample on
channel cha1
Receiving of one data sample on
channel cha1
Modeling between 1 and 2
execution instructions on an
integer unit. It has no meaning at
application modeling level.
Modeling between 1 and 2
execution instructions on an
integer unit. It has no meaning at
application modeling level.
slide 10
DIPLODOCUS: Mapping
slide 11
New Simulation Strategy
slide 12
Motivations for a new Simulator
Existing SystemC based simulator
• Relies on the freely available SystemC kernel• One SystemC Task per CPU, Bus, Memory• Simulation on cycle accurate level• And so… that simulator is quite slow
New simulator implemented in pure C++:
• No overhead due to the SystemC kernel• Coarse grained simulation strategy based on transactions
comprising several cycles• Simulation granularity is automatically adapted to the
application
slide 13
Architecture of the Simulator
For the sake of comprehensibility, many sub-classes have been omitted and merely inheritance
relationships are shown.
TMLbrbwChannel
TMLChannel
...
TraceableDevice
TMLTask CPU Bus
UserDefTasks
SchedulableDevice Master Slave
Bridge Memory
TMLCommand
TMLExeciCommand ...
TMLTransaction
Interfaces
TML semantics
Hardware models
User defined tasks are created dynamically by a code generator based on the graphical model
Simulation Kernel
slide 14
Transactions
Merges several clock cycles, contains penalties Important parameters:
• virtualLengh: number of virtual execution units• length: duration of the transaction in time units• runnableTime: time when it becomes runnable• startTime: execution starts at this time• penalties: task switching, branching, idle
Transaction travels through simulator:
Command Channel CPU Bus Slave
slide 15
Basic Simulation Strategy in one slide
CPU 2
CPU 1 T11
T12
T21 T22
T11
T21
T22
T11
T21
T23
CPU 1
CPU 2
T22
T11
T21
CPU 1
CPU 2
T12 CPU 1
CPU 2
activate
slide 16
Hierarchical scheduling process
Task Task Task
CPU
Bus
TaskTask TaskTask
CPUCPU
Main scheduler
nextTransaction proposal
SchedulingTrans. length calculation
SchedulingAdd bus delay to trans. length
Transaction to schedule
Guarantees causality, truncates transactions
slide 17
Simulation phases
Three phases are entered alternately• Preparation Phase
- Check if current command has been processed entirely, proceed to next command if necessary
- Create next transaction, register transaction at channel if necessary
• Scheduling procedure• Execution phase
- Issue read/write operations on channels
- Update progress of command
- Add transaction to schedule
slide 18
Case study MPEG decoder
slide 19 04/20/23 DIPLODOCUS: System Level Design Space Exploration
Task diagram (Data dependencies)(processing sequence)
slide 20 DIPLODOCUS: System Level Design Space Exploration
Task: Parser
Sequence header
Picture, Slice, Macroblock
header, to be refinedLaunch processing
No of coded blocks
Picture type decision
Picture format
Picture type decision
slide 21
Conclusions and Future Work
slide 22
Simulation Strategy: Summary
Strength: simulation time increases with the number of transactions and NOT with the number of clock cycles• Thus in general, and take the same
execution time. Scenario 1: Task 1 executes , after that, Task 2
executes a million times• result: 1,000,001 transactions
Scenario 2: same as before, but the tasks execute the read/write commands concurrently:• result: 2,000,000 transactions• split of write transaction is necessary to leave the decision which
task executes next up to the CPU scheduler
slide 23
Conclusions
Implementation of a simulation environment • Simulation granularity automatically adapts to the
application model• Based on pure C++• Simulation speed up by 6x up to 30x or more depending
on the application granularity MPEG case study
slide 24
Future Work
Extension of the simulation environment:
• Refinement of bus and memory model
• Refinement of the hardware accelerator component
• MPEG case study, using meta-data to direct control flow
Longer-term objectives:
• Verification of functional requirements during simulation
• Exploration of several branches of control flow, possibility to return
to a previous system state
• Technical improvements of the simulator
slide 25
Thank You!
Questions?