Upload
kirkan
View
25
Download
1
Embed Size (px)
DESCRIPTION
SPREE Tutorial. Peter Yiannacouras April 13, 2006. Processors on FPGAs. You all used FPGAs (ECE241) Adders 7-segment decoders Etc. We are putting whole microprocessors on them We call these soft processors. Soft Processor Written in HDL Programmed onto chip. Hard Processors - PowerPoint PPT Presentation
Citation preview
SPREE Tutorial
Peter Yiannacouras
April 13, 2006
Processors on FPGAs
You all used FPGAs (ECE241)Adders7-segment decodersEtc.
We are putting whole microprocessors on themWe call these soft processors
Hard Versus Soft Processors
Soft Processor Written in HDL Programmed onto chip
Hard Processors Made of transistors Costs millions to make
Verilog
FasterSmallerLess Power
Processors and FPGA Systems
We aim to improve soft processors by customizing them
FPGAs are a common platform for digital systems
MemoryInterface
UART
Custom Logic
Ethernet
Performs coordination and even computation Better processors => less hardware to design
Soft Processor
Our Research Problem
Soft processors have worse Area Speed Power
But are Flexible
use tocounteract
HOW???
Customize the processor’s architectureie. Intel vs AMDie. Motorola 68360 vs 68010
HOW????
Research Goals
1. Understand tradeoffs in soft processors Eg. A hardware multiplier is big but can
perform multiplies fast
2. Customize it to the application Eg. Bubble sort doesn’t use multiplies,
therefore remove hardware multiplier and save on area
We developed SPREE, software to help us do both
SPREE
SPREE System (Soft Processor Rapid Exploration Environment)
Verilog
ISA Datapath ■ Input: Processor description
1. Verify ISA against datapath
2. Datapath Instantiation3. Control Generation
■ SPREE System
■ Output: Synthesizable Verilog
ProcessorDescription
Input: Instruction Set Architecture (ISA) Description
SPREE
Verilog
■ ISA■ Datapath
FETCH
RFREAD
ADD
RFWRITE
RFREAD
MIPS ADD – add rd, rs, rt
■ Graph of Generic Operations (GENOPs)■ Edges indicate flow of data
ISA currently fixed (subset of MIPS I)
Input: Datapath Description
SPREE
RTL
■ ISA■ Datapath
Mul
Ifetch Reg File
ALU WriteBack
Mul
Ifetch Reg File
ALUShifter
DataMem
SPREEComponentLibrary
Mul
IfetchRegfile
ALU WriteBack
Data Mem
■ Interconnection of hand-coded components■ Allows efficient synthesis
■ Described using C++
Component Selection
Select by nameNames looked up in library
Stored in cpugen/rtl_lib
RTLComponent *ifetch=new RTLComponent("ifetch");
RTLComponent *reg_file=new RTLComponent("reg_file");
Datapath Wiring Example
rdrsrt
offset
Ifetch
dst a_reg a_datab_reg b_datawritedata
Regfile
proc.addConnection(ifetch,"rs",reg_file,"a_reg");
proc.addConnection(ifetch,"rt",reg_file,"b_reg");
opA resultopB
ALU
SPREE generator(spegen)
SPREE System + Backend (Soft Processor Rapid Exploration Environment)
VerilogProcessorDescription
1. Area2. Clock Frequency3. Power
4. Cycle Count
Quartus II CAD Software(specadflow)
ModelsimVerilog Simulator(spebenchmark)
Benchmarks
MintMIPS Simulator(simulator/run)
Comparetraces
Walking through an Example (see README.txt)
Choose a pre-built processorcpugen/src/arch lists all the processors
Let’s choose pipe3_serialshift3-stage pipeline with serial shifter
Using SPREE on a Processor
Generate, benchmark, synthesize
% spegen pipe3_serialshift
% spebenchmark pipe3_serialshift
% specadflow pipe3_serialshift
% specompare pipe3_serialshift
← Generates Verilog
← Runs benchmarks
← Synthesizes processor
← Display results
spegen – Generating Processors
Input: Processor description Syntax: spegen <processor name> Output:
A folder named after the processor Hand-coded Verilog modules system.v
Generated hookup and control OUT.cpugen
stages per instruction Hazard window/branch penalty
test_bench.v test bench for Modelsim simulation
Benchmarking
Run programs on the processorMeasure time taken till completionVerify functionality
Can do this without knowing anything about the benchmarks themselves
spebenchmark – Benchmarking
Input: Processor implementation Syntax: spebenchmark <processor> Output: (ideally)
Cycle counts of all benchmarks
Traces: /tmp/modelsim_trace.txt
******* Benchmarking pipe3_serialshift ********Simulating bubble_sort ... Success! Cycle count=2994Simulating crc ... Success! Cycle count=112750Simulating des ... Success! Cycle count=5129Simulating fft ... Success! Cycle count=5077Simulating fir ... Success! Cycle count=1214...
Benchmarking – under the hood
ModelsimVerilog Simulator(spebenchmark)
Compiler(gcc - MIPS)
MintMIPS Simulator(simulator/run)
Comparetraces
Verilog
BinaryExecutable
C sourcebenchmarks
TraceTrace Cycle Count
/tmp/modelsim_trace.txt/tmp/modelsim_store_trace.txt
applications/<benchmark name>/mint
spebenchmark
specompiler - Setup compiler
Choose the path to your compiler (prebuilt) Default: /jayar/b/b0/yiannac/spe/compiler
GCC 3.3.3, software division
Another: /jayar/b/b0/yiannac/spe/compiler-softmul GCC 3.3.3, software division and software multiplication
specompiler will:1. Compile all benchmarks (and store binaries)
2. Simulate all benchmarks (and store traces)
% specompiler /jayar/b/b0/yiannac/spe/compiler-softmul
After this point, you can just run spebenchmark
spebenchmark - failure
Shows discrepancy between MINT and Modelsim
******* Benchmarking pipe3_serialshift ********Simulating bubble_sort ... Error: Trace does not match, Cycle count=381Discrepancy found at 6800000 psModelsim: PC=04000064 | IR=24090001 | 05: 00000000 Mint: PC=040000b8 | IR=8c47004c | 07: 00000064
destinationregister
valuebeing written
Clues towhere the
error occurred
spebenchmark - waveforms
Can see any signal within the processor% sim_gui bubble_sort pipe3_serialshift
Modelsim
LEARN IT!!!
Quartus Simulator is vastly inferior, and even unusable for our purposes
The Testbench (test_bench.v)
What is it?The stimulus and monitor for your circuit
SPREE automatically generates And hence it works right away
Handcoding your own processor meansYou have to interface with the test benchOnce you have the testbench you can use
spebenchmark
Manual Interfacing with the Testbench
test_bench.v
regfile_weregfile_dst
regfile_data
datamem_wedatamem_addrdatamem_data
Your soft processor
Need only 6 wires To track writes to register file and data mem
SPREE generator(spegen)
SPREE System + Backend (Soft Processor Rapid Exploration Environment)
VerilogProcessorDescription
1. Area2. Clock Frequency3. Power
4. Cycle Count
Quartus II CAD Software(specadflow)
ModelsimVerilog Simulator(spebenchmark)
Benchmarks
MintMIPS Simulator(simulator/run)
Comparetraces
specadflow – Synthesis
Input: Processor implementationSyntax: specadflow <processor name>
Performs a “seed sweep”Average several runs since results are noisyRun several instances of quartusAcross several machines in parallel
specadflow Output
Output:Synthesis results (hidden)Summary output
Started Tue 6:27PM, Waiting for processes: 10.0.0.61 10.0.0.57 10.0.0.56 10.0.0.55 10.0.0.54 10.0.0.51 Finished Tue 6:33PM108175.78120.99822 ... Waiting on eda writer
Area (LEs or ALUTs)Clock Frequency (MHz)Estimated Energy/cycle dissipated (nJ/cycle)
Any Questions?
Technical support, ask me
EXTRAS
Setup/Install
Copy and unpack the SPREE tarball: /jayar/b/b0/yiannac/spree.tar.gz
Build all the SPREE software
Follow instructions in INSTALL.txt
If there’s any errors, email me
% cd spree% make
SPREE Directory Structure
spree
applications cpugen modelsim quartussimulatorcompiler
Benchmarks C source
binutilsgcc
newlib
the cpugenerator
+processor
descriptions
Verilogsimulator
MIPSsimulator
synthesis
Setup cluster
Choose the cluster you’re using aenao – high performance, limited access eecg – any eecg-connected machine
Edit quartus/machines.txt Put a list of 11 or so good eecg machines
% specluster eecg % specluster aenaoOR