20
1 RAMP Infrastructure QuickTime™ and TIFF (Uncompressed) d are needed to see t Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

1

RAMP Infrastructure

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Krste AsanovicUC Berkeley

RAMP Tutorial, ISCA/FCRC, San DiegoJune 10, 2007

Page 2: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

RAMP: An infrastructure to build simulators using FPGAsRAMP: An infrastructure to build simulators using FPGAs

Page 3: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

3

Host Host PlatformPlatform

CPU CPU CPU CPU

Interconnect Network

DRAM

Target Target ModelModel

Hard WorkHard Work

Run Target Model on Host Platform

Page 4: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

4

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Reduce, Reuse, Recycle

Reduce effort to build target models Users just build components, infrastructure

handles connections (The RDL Compiler) Reuse components by having good

abstractions Across different target models Across different host platforms

XUP, Calinx, BEE2, BEE3, also Altera (see Greg)

Recycle existing IP for use as simulation models Commercial processor RTL is its own model

Page 5: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

5

RAMP Target Models

Units Relatively large chunks of functionality

e.g., processor + L1 cache User-written in some HDL or software

Channels Point-point, undirectional, two kinds:

FIFO channel: Flow-controlled interface Pipeline channel: Simple shift register, bits drop off end

Generated by RAMP infrastructure

Unit Unit CC

Unit Unit BB

Unit Unit AA

FIFO ChannelFIFO ChannelPipelinPipelin

e e ChannChann

elel

Page 6: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

6

Target FIFO Channel Parameters

Need buffering of at least (Forward+Reverse) latency to get full bandwidth over link

RAMP infrastructure instantiates channel with desired parameters

D

Forward Latency

Buffering

D

Reverse Latency

Data

wid

th

RDYRDYENQENQ

RDYRDYDEQDEQ

Page 7: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

7

Target Pipeline Channel Parameters

Only recommended for expert use in target models

(Should use FIFO channels and latency-insensitive protocols in target design)

D

Forward Latency

Data

wid

th

D

Page 8: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

8

RAMP Description Language (RDL)

Unit Unit CC

Unit Unit BB

Unit Unit AA

User describes target model topology, channel parameters, and (manual) mapping to host platform FPGAs using RDL

RDL Compiler (RDLC) generates configurations

UniUnit Ct C

UniUnit Bt BUniUni

t At A

FPGA1FPGA1 FPGA2FPGA2

RDLCRDLC

Generated Generated Unit Unit

WrappersWrappers

Generated Generated links carry links carry channelschannels

Target:Target:

Host:Host:

[ Greg Gibeling, UCB ][ Greg Gibeling, UCB ]

Page 9: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

9

Virtual Target Clock

Page 10: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

10

Virtualized RTL Improves FPGA Resource Usage RAMP allows units to run at varying target-host

clock ratios to optimize area and overall performance

Example 1: Multiported register file Example, Sun Niagara has 3 read ports and 2 write

ports to 6KB of register storage If RTL mapped directly, requires 48K flip-flops

Slow cycle time, large area If mapping into block RAMs (one read+one write per

cycle), takes 3 host cycles and 3x2KB block RAMs Faster cycle time (~3X) and far less resources

Example 2: Large L2/L3 caches Current FPGAs only have ~1MB of on-chip SRAM Use on-chip SRAM to build cache of active piece of

L2/L3 cache, stall target cycle if access misses and fetch data from off-chip DRAM

Page 11: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

11

Start/Done Timing Interface

Wrapper generated by RDL asserts “Start” on the physical FPGA cycle when the inputs to the unit are ready for the next target cycle

Unit asserts “Done” when it finishes the target cycle and its outputs are ready

Unit can take variable amount of time Unvirtualized RTL unit can connect “Done” to “Start” (but must not

clock until “Start”)

Unit Unit

StartStart

DoneDone

WrapperWrapper

OutOutIn1In1

In2In2

Page 12: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

12

Distributed Timing Models

Page 13: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

13

Distributed Timing Example

Unit Unit AA

Unit Unit BB

Latency LLatency L

DDTarget:Target:

RDYsRDYs

RDYRDY

Host:Host:

Unit Unit AA

Unit Unit BB

DDDD

StartStart

DoneDone

StartStart

DoneDoneDEQsDEQs

ENQENQ DEQDEQ

Pipeline target channel Pipeline target channel implemented as implemented as

distributed FIFO with at distributed FIFO with at least L buffersleast L buffers

Page 14: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

14

Latency LLatency L

DD

Target:Target:DDDD

CreditsCredits

RDYRDYENQENQ

DD

RDYRDYDEQDEQ

Credit Credit contrcontr

olol

Timing Target FIFO Channel

Can build timed credit-based flow control (CBFC) FIFO inside Target model, using pipeline channels for communicating data forwards and credits backwards

But this puts two CBFCs in series (one in target unit, one hidden in host implementation of pipeline channels)

RDL can generate a unified FIFO that merges both of these behind the FIFO interface

Page 15: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

15

Other Automatically Generated Networks Control network has workstation as master and

every unit as slave device Memory-mapped interface with block transfers Used for initialization, stats gathering, debugging, and

monitoring Units can connect to DRAM resources outside of

timed target channels Used to support emulation and virtualization state

Units can communicate with each other outside of timed target channels Support arbitrary communication. E.g., for distributed

stats gathering

Page 16: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

16

Wide Variety of RAMP Simulators

Page 17: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

17

Simulator Design Choices Structural Analog versus Highly Virtualized

Functional-only versus Functional+Timing

Timing via (virtual) RTL design versus separate functional and timing models

Hybrid software/hardware simulators

We’re trying to build layers of abstractions that are useful to all types of simulator

Also, trying to make modules in different styles inter-operate

Page 18: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

18

Effective Abstractions Hide Details

Page 19: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

19

…But Provide Inter-Operability

Page 20: 1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007

20

Work in Progress: Stay Tuned