View
219
Download
2
Embed Size (px)
Citation preview
1
RAMP Infrastructure
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Krste AsanovicUC Berkeley
RAMP Tutorial, ISCA/FCRC, San DiegoJune 10, 2007
RAMP: An infrastructure to build simulators using FPGAsRAMP: An infrastructure to build simulators using FPGAs
3
Host Host PlatformPlatform
CPU CPU CPU CPU
Interconnect Network
DRAM
Target Target ModelModel
Hard WorkHard Work
Run Target Model on Host Platform
4
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Reduce, Reuse, Recycle
Reduce effort to build target models Users just build components, infrastructure
handles connections (The RDL Compiler) Reuse components by having good
abstractions Across different target models Across different host platforms
XUP, Calinx, BEE2, BEE3, also Altera (see Greg)
Recycle existing IP for use as simulation models Commercial processor RTL is its own model
5
RAMP Target Models
Units Relatively large chunks of functionality
e.g., processor + L1 cache User-written in some HDL or software
Channels Point-point, undirectional, two kinds:
FIFO channel: Flow-controlled interface Pipeline channel: Simple shift register, bits drop off end
Generated by RAMP infrastructure
Unit Unit CC
Unit Unit BB
Unit Unit AA
FIFO ChannelFIFO ChannelPipelinPipelin
e e ChannChann
elel
6
Target FIFO Channel Parameters
Need buffering of at least (Forward+Reverse) latency to get full bandwidth over link
RAMP infrastructure instantiates channel with desired parameters
D
Forward Latency
Buffering
D
Reverse Latency
Data
wid
th
RDYRDYENQENQ
RDYRDYDEQDEQ
7
Target Pipeline Channel Parameters
Only recommended for expert use in target models
(Should use FIFO channels and latency-insensitive protocols in target design)
D
Forward Latency
Data
wid
th
D
8
RAMP Description Language (RDL)
Unit Unit CC
Unit Unit BB
Unit Unit AA
User describes target model topology, channel parameters, and (manual) mapping to host platform FPGAs using RDL
RDL Compiler (RDLC) generates configurations
UniUnit Ct C
UniUnit Bt BUniUni
t At A
FPGA1FPGA1 FPGA2FPGA2
RDLCRDLC
Generated Generated Unit Unit
WrappersWrappers
Generated Generated links carry links carry channelschannels
Target:Target:
Host:Host:
[ Greg Gibeling, UCB ][ Greg Gibeling, UCB ]
9
Virtual Target Clock
10
Virtualized RTL Improves FPGA Resource Usage RAMP allows units to run at varying target-host
clock ratios to optimize area and overall performance
Example 1: Multiported register file Example, Sun Niagara has 3 read ports and 2 write
ports to 6KB of register storage If RTL mapped directly, requires 48K flip-flops
Slow cycle time, large area If mapping into block RAMs (one read+one write per
cycle), takes 3 host cycles and 3x2KB block RAMs Faster cycle time (~3X) and far less resources
Example 2: Large L2/L3 caches Current FPGAs only have ~1MB of on-chip SRAM Use on-chip SRAM to build cache of active piece of
L2/L3 cache, stall target cycle if access misses and fetch data from off-chip DRAM
11
Start/Done Timing Interface
Wrapper generated by RDL asserts “Start” on the physical FPGA cycle when the inputs to the unit are ready for the next target cycle
Unit asserts “Done” when it finishes the target cycle and its outputs are ready
Unit can take variable amount of time Unvirtualized RTL unit can connect “Done” to “Start” (but must not
clock until “Start”)
Unit Unit
StartStart
DoneDone
WrapperWrapper
OutOutIn1In1
In2In2
12
Distributed Timing Models
13
Distributed Timing Example
Unit Unit AA
Unit Unit BB
Latency LLatency L
DDTarget:Target:
RDYsRDYs
RDYRDY
Host:Host:
Unit Unit AA
Unit Unit BB
DDDD
StartStart
DoneDone
StartStart
DoneDoneDEQsDEQs
ENQENQ DEQDEQ
Pipeline target channel Pipeline target channel implemented as implemented as
distributed FIFO with at distributed FIFO with at least L buffersleast L buffers
14
Latency LLatency L
DD
Target:Target:DDDD
CreditsCredits
RDYRDYENQENQ
DD
RDYRDYDEQDEQ
Credit Credit contrcontr
olol
Timing Target FIFO Channel
Can build timed credit-based flow control (CBFC) FIFO inside Target model, using pipeline channels for communicating data forwards and credits backwards
But this puts two CBFCs in series (one in target unit, one hidden in host implementation of pipeline channels)
RDL can generate a unified FIFO that merges both of these behind the FIFO interface
15
Other Automatically Generated Networks Control network has workstation as master and
every unit as slave device Memory-mapped interface with block transfers Used for initialization, stats gathering, debugging, and
monitoring Units can connect to DRAM resources outside of
timed target channels Used to support emulation and virtualization state
Units can communicate with each other outside of timed target channels Support arbitrary communication. E.g., for distributed
stats gathering
16
Wide Variety of RAMP Simulators
17
Simulator Design Choices Structural Analog versus Highly Virtualized
Functional-only versus Functional+Timing
Timing via (virtual) RTL design versus separate functional and timing models
Hybrid software/hardware simulators
We’re trying to build layers of abstractions that are useful to all types of simulator
Also, trying to make modules in different styles inter-operate
18
Effective Abstractions Hide Details
19
…But Provide Inter-Operability
20
Work in Progress: Stay Tuned