24
Evaluation of RISC-V RTL with FPGA-Accelerated Simulation Donggyu Kim, Christopher Celio, David Biancolin, Jonathan Bachrach, Krste Asanović CARRV 2017 10/14/2017

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

EvaluationofRISC-VRTLwithFPGA-AcceleratedSimulation

DonggyuKim,ChristopherCelio,DavidBiancolin,JonathanBachrach,KrsteAsanovic

CARRV201710/14/2017

Page 2: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

EvaluationMethodologiesForComputerArchitectureResearch

2

Analytic Power/Energy Modeling

Microarchitectural Cycle-by-Cycle Software Simulators

Simulation Sampling

Page 3: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

3

HowToDoComputerArchitectureResearchInThePast?

FewlinesofC/C++code

Paper

100M~1Binstructions

Page 4: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

HowEasySingle-CyclePerfectCaches?

4

§ 10linesofC++codeinMARRSx86è Easytoimplementunrealistic,non-cycle-accuratemodels

Page 5: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

5

NoµarchSimulationAnyMore!

§ Validationisdifficult- Isyourmodelingstillcycle-accurate?

§ Simulationistooslowà SimulationSampling

Validation of one design instance[1]

Other designs points?

Your target modeling?

[1] Gutierrez et al. Sources of error in full-system simulation, ISPASS 2014

Page 6: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

6

NoSimulationSampling!

§ Phase-basedSampling(e.g.SimPoint)- Notheoreticallyguaranteederrorbounds- ShortperiodsofphasesshowsimilarIPCwheneverrepeated- 401.bzip2withitsreferenceinputinBOOM-2w

§ StatisticalSampling(e.g.SMARTS)- Staticallyboundederrors- Statewarmingproblems• µarchitecturalstateshouldberecoveredfromfunctionalsimulators

§ Whataboutmanaged-languageapplications?

Page 7: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

7

BuildRTLToValidateYourDesignIdeas!

§ RTLisnolongerdifficult- Hardwareconstructionlanguages(e.g.Chisel)- RISC-Vimplementationcodebase(e.g.RocketChip)

§ RTLsimulationisnolongerslow- TensofMIPSusingFPGAs

FGPA-Accelerated Simulation

RTL Implementation

HardwareSpecification

CAD Tools

Area, Timing, Power Evaluation

PerformanceEvaluation

Page 8: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

OldBOOMLayout

8

RegFile

ICache

Uncore

LSU

RenameTable

FPU

ROB

Free List

Issue Window

Branch PredictorALUs

Fetch Buffer

DCache

DCacheControl

IDIVIMUL Busy Table

Bypasses

Page 9: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

§ Automatically transformsanyRTLdesignsintoFPGA-AcceleratedRTLsimulators

§ Quickly evaluatesperformance,power,andenergyofRTLdesignswithrealisticsoftwareapplications.

§ Flexibly Co-simulatesabstractHW&SWmodels

9

MIDAS:TurnYourRTLintoGold

Target RTL

Page 10: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARMIDASCustomCompilerPasses

10

§ FIRRTL:IRforRTLtransformshttps://github.com/freechipsproject/firrtl

§ InstrumentRTLdesignsfor-Accurateperformancemodeling- EasysimulationcontrolinFPGA- Interactionswithabstracttimingmodels- RTLstatesnapshotsforenergymodeling

FIR

RTL

Com

pile

r

FPGA-Accelerated RTL Simulator

Target RTL Design

Macro MappingFAME1 Transform

Scan Chain InsertionSimulation MappingPlatform Mapping

Page 11: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARMappingSimulationtotheFPGAHost

11

I/ODevices

ProcessorL2 Cache

/ MainMemory

FPGA Board( e.g. Xilinx Zynq)

Software FPGA

I/ODevices Processor

MemorySystemTiming

Simulation Driver

Board DRAM

L2 Cache /Main Memory

I/O Endpoints

§ SimulationSpeed:3.56MHz(ISCA`16)à ~40MHz(CARRV`17)§ I/OEndpoints- Low-leveltimingtokens<->High-leveltransactions- OptimizethecommunicationsbetweenSW&FPGA

§ L2/MainMemory- Abstracttimingmodel(L2$tags)inFPGA- ActualdatainBoardDRAM- Setsize,associativity,blocksize,latency:runtimeconfigurable

Page 12: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARMemoryTimingModelValidation

12

§ Apointer-chaseµbenchmarkofccbench runninginBOOM(https://github.com/ucb-bar/ccbench)

§ L1$:16KiB,6cycles§ L2$:1MiB,6+23cycles (runtimeconfigurable)§ DRAM:6+23+80cycles(runtimeconfigurable)

Page 13: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARTargetDesign:RocketChipGenerator

13

Rocket(In-order Processor)

BOOM-2w (Version 1)(Out-of-order Processor)

Fetch-width 1 2

Issue-width 1 3

Issue slots 20

ROB size 80

Ld/St entries 16/16

Physical registers 32(int)/32(fp) 110

Branch predictor gshare: 16KiB history

BTB entries 40 40

RAS entries 2 4

MSHR entreis 2 2

L1 I$ / D$ 16 KiB or 32 KiB

ITLB / DTLB reaches 128 KiB / 128 KiB

L2 $ 1MiB / 23 cycles

DRAM latency 80 cycles

Page 14: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

§ InstructioncountwithRISC-Vforthereferenceinputs

§ InstructionCountAverage:2.08Trillion§ 445.gobmk,456.hmmer,462.libquantum:failinBOOM

SPEC2006intBenchmarkSuite

14

Benchmarks Instruction Count(T) Benchmarks Instruction Count(T)

400.perlbench 2.48 458.sjeng 2.85

401.bzip2 3.08 462.libquantum 2.09

403.gcc 1.37 464.h264ref 5.07

429.mcf 0.29 471.omnetpp 0.61

445.gobmk 2.04 473.astar 1.05

456.hmmer 2.95 483.xalanbmk 1.10

Page 15: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARSimulationTimeforSPEC2006int

15

Simulators Speed Average Max(464.h264ref: 5T)

GEM5+Ruby 100 KIPS[1]240 days

(8 months)640 days

(1.8 years)

MARSSx86 400 KIPS[2]60 days

(2 months)160 days

(5.3 months)

MIDAS 18 MIPS (40MHz) 1.4 days 3 days

[1] http://gem5-users.gem5.narkive.com/hCJL6O2Q/gem5-simulation-speed[2] Patel et al. MARSSx86: A full system simulator for x86 CPUs, DAC 2011.

Page 16: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARCaseStudy:IPC

16

0.0

0.2

0.4

0.6

0.8

1.0

1.2

IPC

Rocket 16KiB L1 Rocket 32KiB L1 BOOM-2w 16KiB L1 BOOM-2w 32KiB L1 Cortex A9

§ RocketiscomparabletoCortexA9§ BOOMoutperformsCortexA9§ Performanceimprovement:16KiBL1$à 32KiBL1$

1.41

Page 17: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARCaseStudy:MPKIsofBOOM-2w

17

0

10

20

30

40

50

60

MPK

I

Conditional Branch Indirect Branch L1 I-Cache ITLB L1 D-Cache DTLB L2 Cache

§ 40BTBentries,32KiBL1$,1MiBL2$,TLBreach=128KiB§ 473.omnetpp,483.xalancbmk:LargeindirectbranchMPKIsà BiggerBTBs

Page 18: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

0

20

40

60

80

100

Issu

e Sl

ots

/ Iss

ue W

idth

(%)

Issued Slots Empty Slots Non-issued Slots

CaseStudy:IssueQueueUtilization

18

§ IssuedSlots/Cycle=IPC§ EmptySlots:frontendhazards,lackofresources(403.gcc)§ Non-issuedslots:backendhazards

Page 19: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

Full Program Execution In FPGA

StroberPower/EnergyModeling

19

RTL State Snapshots

I/O Traces

PrimeTime PX

RTL SimulationPost-synthesisDesignsRTL Signal Activities

Average Power

§ NostatewarmingisnecessaryinRTL/gate-levelsimulation

Page 20: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

0

100

200

300

400

500

600Ro

cket

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

Rock

et

BOOM

-2w

400.perlbench 401.bzip2 403.gcc 429.mcf 458.sjeng 464.h264ref 471.omnetpp 473.astar 483.xalancbmk

Pow

er (m

W)

Misc

Uncore

L1 D-cache control

L1 D-cache meta + data

L1 I-cache

ROB

LSU

FPU

Integer Unit

Issue Logic

Register File

Rename + Control

Branch Predictior

Fetch Unit

CaseStudy:PowerBreakdown

20

§ Synopsys32nmeducationaltechnology§ 50randomsampleRTLstatesnapshots§ PipelineUtilization↑à Power↑§ BOOM:40%powerfromRenameLogic&RegisterFile

Page 21: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARCaseStudy:EnergyPerInstruction

21

§ BOOM-2wisperformant,Rocketismoreenergyefficient§ 429.mcf,471.omnetpp:lowpower,lessenergyefficientPipelineUtilization↑à EnergyEfficiency↑

0

200

400

600

800

1000

EPI(p

J / I

nst)

Rocket 32 KiB L1 BOOM-2w 32 KiB L1

1923.4 1027.0

Page 22: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BAR

§ Debugging,validationforRTLdesigns-Assertiondetection-CommitlogcomparisonwithSpike

§ Memorysystemtimingmodeling-RealisticDRAMtimingmodelsinFPGA

§ FireSim:datacentersimulation(fires.im)

-ConnectRocketChips withNICs&switchtimingmodels-Amazonblogpost,7th RISC-Vworkshop

On-goingMIDASResearchInUCB

22

Page 23: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARMIDAS(Strober)IsOpen-SourceNow

23

Your Processors /Accelerators

§ Examples:https://github.com/ucb-bar/midas-examples

§ RocketChip template:https://github.com/ucb-bar/midas-top-release

§ Willsupportvarioushostplatforms(XilinxZynq,AmazonEC2F1,IntelXeon+In-packageFPGA)

https://github.com/ucb-bar/midas-releasestrober.org

Page 24: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated

BARAcknowledgements

24

§ Funding:- DARPAAwardNumberHR0011-12-2-0016- TheCenterforFutureArchitectureResearch,amemberofSTARnet,aSemiconductorResearchCorporationprogramsponsoredbyMARCOandDARPA

- ASPIRELabindustrialsponsorsandaffiliates:Intel,Google,HPE,Huawei,LGE,Nokia,NVIDIA,Oracle,andSamsung

- Kwanjeong Scholarship