29
Lab-3: Simple Accelerator Design National Chiao Tung University Chun-Jen Tsai 4/11/2011

Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

Lab-3: Simple AcceleratorDesign

National Chiao Tung UniversityChun-Jen Tsai

4/11/2011

Page 2: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

2/29

Video Decoder Accelerator Goal: Implement a block-based motion compensation circuit,

Tasks: Extend the given accelerator from point-wise

operation to 88 block-based operation

Please give a demo (per group) to the TAs toshow that your system is working by 5/5 Each team hands in a three-page report to describe

your design, and provide the performanceimprovement numbers

Page 3: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

3/29

mcomp

Architecture of the MCOMP Logic

hwdata

haddr

3

32

The architecture of the provided motion comp.module is composed of a register file and ainterpolation datapath

hrdata

Register filereg a

reg b

reg c

reg d

reg r

addr in

data in

addr[4:2]

Datapath

(a+b+1-r)/2, or(a+b+c+d+2-r)/4

data outhconfig

(832 ROM)

LEON3ahbctrl

256

Page 4: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

4/29

Inside Lab 3 Package Download and unzip the package for lab3, you

will see the following directory structure:

Lab3_pkg|+- m4v_dec_ecos (the complete video decoder with HW accelerator API)|+- rtl (synthesizable RTL model of the motion compensation logic)|+- testbench (C program for waveform simulation or FPGA verification)

Page 5: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

5/29

Functions to be Accelerated In bilinear8x8.c, there are three functions for

bilinear interpolation: halfpel8x8_h() halfpel8x8_v() halfpel8x8_hv()

Current code performs pixel-wise interpolation;you must modify it to block-based interpolation

p = (A + B + 1 –r)/2q = (A + C + 1 –r)/2o = (A + B + C + D + 2 –r)/4r = 0 or 1, is a “rounding control”

parameter (selected by the encoder).

A B

DC

p

oq

Page 6: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

6/29

Pixel vs. Block-based Operations Take halfpel8x8_h() for example:

voidhalfpel8x8_h(uint8 * dst, uint8 * src, xint stride, xint rounding){

xint row, col, idx, sum;

idx = 0;for (row = 0; row < (stride << 3); idx = (row += stride)){

for (col = 0; col < 8; col++, idx++){

sum = (xint) src[idx] + (xint) src[idx+1] + 1 - rounding;dst[idx] = (uint8) (sum >> 1);

}}

}

cut-off point between HW and SW in your logic

Cut-off point between HW and SW in the provided logic

Page 7: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

7/29

Block Interpolation Architecture

mcomp

hwdata

haddr

7

32

hrdata

Register file

pixel(0)

pixel(80)

addr in

data in

addr[8:2]

hconfig(832 ROM)

LEON3ahbctrl

256

..

.

type block_9x9 is array (0 to 80) of std_logic_vector(7 downto 0);signal pixel: block_9x9;

Controller

Block-basedInterpolator reg r

Page 8: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

8/29

Adding Logic to the LEON Platform In Lab 3, we add an AHB slave logic to the

LEON 3 platform. The RTL model directorycontains some sample modifications:

rtl|+- nctu +- video_acc +- mcomp.vhd, video_acc.vhd| | +- vhdlsyn.txt| || +- dirs.txt|+- device.vhd (to replace the one in $(GRLIB)/lib/grlib/amba)|+- leon3mp.vhd (to replace the one in $(GRLIB)/design/leon3-gr-xc3s-1500)

Put this whole nctu directoryin $(GRLIB)/lib

Note: $(GRLIB) is the directory of the complete source tree of GRLIB.

Page 9: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

9/29

Modify LEON Working Libraries To add our “nctu”video accelerator logic into the

LEON3 working library, we have to add thelibrary name “nctu”to the files libs.txt under thefollowing two directories: $(GRLIB)/designs/leon3-gr-xc3s1500/ $(GRLIB)/lib/

Page 10: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

10/29

Generation of Build Scripts Enter the directory, designs/leon3-gr-xc3s-1500, in

GRLIB source three, and type the followingcommands:$ make distclean clean up previous builds

$ make scripts generate automatic build scripts

The command “make scripts”searches thedirectory tree $(GRLIB)/lib for libs.txt, dirs.txt,vhdlsyn.txt, and vhdlsim.txt in order to generateproper build scripts for all the hardware logics

Page 11: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

11/29

Adding Logic to AHB Bus†

The AMBA bus used for LEON 3 has someextensions All HSEL signals are routed to all slave logics Some extra signals are added: HCACHE, HIRQ,

HCONFIG, and HINDEX The HCONFIG signal contains information about

the logic to be added Each HCONFIG signal is 84 bytes GRLIB allows 64 masters and 64 slaves, with their

HCONFIG record mirrored at 0xFFFFF000 ~0xFFFFFFFF

†For details, see Jiri Gaisler and Sandi Habinc, GRLIB IP Library User’s Manual, ver. 1.0.22, Aeroflex Gaisler, 2010

Page 12: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

12/29

HCONFIG Record HCONFIG record controls the address map,

interrupt control, and caching control of a logic

Page 13: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

13/29

Address Decoding (1/2) The memory area mapped to a logic is

determined by its bank address registers (BARs) There are two types of logics: memory and I/O For memory logic, each BAR specifies 1MB address

space in any location (except 0xFFF00000 ~0xFFFFFFFF) of the 32-bit address space by default

For I/O logic, each BAR specifies 256 bytes addressspace in any location in the address range0xFFF00000 ~ 0xFFFFFFFF by default

Page 14: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

14/29

Address Decoding (2/2) If a logic needs a larger contiguous address

space, the MASK of a BAR can be used toincrease the default range of its spaceahbctrl: AHB arbiter/multiplexer rev 1ahbctrl: Common I/O area disabledahbctrl: AHB masters: 3, AHB slaves: 8ahbctrl: Configuration area at 0xfffff000, 4 kbyteahbctrl: mst0: Gaisler Research Leon3 SPARC V8 Processorahbctrl: mst1: Gaisler Research AHB Debug UARTahbctrl: mst2: Gaisler Research SVGA frame bufferahbctrl: slv0: European Space Agency Leon2 Memory Controllerahbctrl: memory at 0x00000000, size 512 Mbyte, cacheable, prefetchahbctrl: memory at 0x20000000, size 512 Mbyteahbctrl: memory at 0x40000000, size 1024 Mbyte, cacheable, prefetchahbctrl: slv1: Gaisler Research AHB/APB Bridgeahbctrl: memory at 0x80000000, size 1 Mbyteahbctrl: slv2: Gaisler Research Leon3 Debug Support Unitahbctrl: memory at 0x90000000, size 256 Mbyteahbctrl: slv6: CS NCTU CODESIGN Lab3 Motion Compensation Logicahbctrl: memory at 0xb0000000, size 1 Mbyte, prefetch

Page 15: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

15/29

Interface of Lab 3 Logicentity mcomp isgeneric (ahbndx : integer := 0;ahbaddr : integer := 0;addrmsk : integer := 16#fff#;verid : integer := 0;hirq_no : integer := 0

);port (rst : in std_ulogic;clk : in std_ulogic;ahbsi : in ahb_slv_in_type;ahbso : out ahb_slv_out_type

);end;

architecture rtl of mcomp is

constant hconfig : ahb_config_type := (0 => ahb_device_reg(VENDOR_NCTU, NCTU_MCOMP, 0, verid, hirq_no),4 => ahb_membar(ahbaddr, '1', '0', addrmask),others => zero32

);

Page 16: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

16/29

Vendor/Device ID Registration Your vendor and device ID can be stored in the

file $(GRLIB)/lib/grlib/amba/devices.vhd forsimulation purposes:-- HW/SW Co-design LAB vender code & device ID

constant VENDOR_NCTU : amba_vendor_type := 16#18#;constant NCTU_MCOMP : amba_device_type := 16#001#;

-- pragma translate_offconstant NCTU_DESC : vendor_description := "CS NCTU CODESIGN ";

constant nctu_device_table : device_table_type := (NCTU_ALU => "Lab3 Motion Compensation Logic ",others => "Unknown Device "

);

constant nctu_lib : vendor_library_type := (vendorid => VENDOR_NCTU,vendordesc => NCTU_DESC,device_table => nctu_device_table

);-- pragma translate_on

Page 17: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

17/29

Device Instantiation The device must be instantiated in the file$(GRLIB)/designs/leon3-gr-xc3s-1500/leon3mp.vhd

In this example, the device registered IRQ #14.Raising HIRQ(14) causes an interrupt to theinterrupt controller, and then to the ISR

my_mcomp : mcomp generic map (ahbndx => 6,ahbaddr => 16#B00#,verid => 1,hirq_no => 14)

port map (rstn, clkm, ahbsi, ahbso(6));

Page 18: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

18/29

Implement the HW Bit File Change directory to designs/leon3-gr-xc3s-1500,

logic implementation is done by typing:$ make ise

You can use iMPACT to configure the FPGA When you use “info sys”in GRMON, you should see

02.01:004 Gaisler Research LEON3 Debug Support Unit (ver 0x1)ahb: 90000000 - a0000000AHB trace 1 lines, stack pointer 0x43fffff0. . .

06.18:001 Unknown vendor Unknown device (ver 0x1)ahb: b0000000 - b0100000

01.01:00c Gaisler Research Generic APB UART (ver 0x1). . .

Page 19: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

19/29

Testing the Accelerator To test the motion compensation logic you can

build and run the C testbench program,mcomp_tb.c Building mcomp_tb.c does not require eCos

To run the complete video decoder with HW“acceleration,”simply drop-in replace the filebilinear8x8.c in previous lab source tree

Page 20: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

20/29

Calling HW Logic from SW Code

#define USE_HW_MC 1

#if USE_HW_MCvolatile xint *reg_a = (xint *)0xb0000000;volatile xint *reg_b = (xint *)0xb0000004;volatile xint *reg_c = (xint *)0xb0000008;volatile xint *reg_d = (xint *)0xb000000c;volatile xint *reg_r = (xint *)0xb0000010;

volatile xint *mc_2pt = (xint *)0xb0000014;volatile xint *mc_4pt = (xint *)0xb0000018;#elsexint sum;#endif

voidhalfpel8x8_h(uint8 * dst, uint8 * src,

xint stride, xint rounding){

xint row, col, idx;

idx = 0;for (row=0; row<(stride<<3);

idx=(row+=stride)){

for (col = 0; col < 8; col++, idx++){

#if USE_HW_MC*reg_a = (xint) src[idx];*reg_b = (xint) src[idx+1];*reg_r = (xint) rounding;dst[idx] = (uint8) (*mc_2pt);

#elsesum = (xint) src[idx] +

(xint) src[idx + 1] + 1- rounding;

dst[idx] = (uint8) (sum >> 1);#endif

}}

}

. . .

The current referenceinterface design isnaive:

Page 21: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

21/29

HW/SW Co-Simulation Sometimes, you want to use logic simulator to

debug your accelerator It is difficult to write comprehensive testbench in HDL

for complex systems (with both HW and SWcomponents)

Since we have the RTL model of the processor,we can perform HW/SW co-simulation using alogic simulator

Page 22: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

22/29

Installation of Logic Simulator For logic simulation, we use GHDL, an open-

source, free logic simulator for VHDL, the projecthome page is located at http://ghdl.free.fr/

The native Windows version of GHDL does notwork for LEON3; get the Cygwin version from:http://home.comcast.net/~bp_labs/software/ghdl/ Note: the command path to GHDL should be set after

/usr/bin, because LEON3 configuration script requiresgcc 3.x.x while GHDL has a built-in gcc 4.x.x that willcause problems if the path is not set properly

Page 23: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

23/29

Platform Setup for Simulation To set up LEON3 for simulation, type make xconfig under

the directory: $(GRLIB)/design/leon3-gr-xc3s-1500 In configuration menu, made the following modifications:

Processor Debug Support Unit Disable instr./AHB trace buffers Debug Link Enable only serial debug link VHDL Debugging Enable accelerated UART tracing Peripherals Disable everything except Memory Controller, UART,

Timer, Interrupt Controller

Page 24: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

24/29

Building Software for Co-simulation For HW/SW co-simulation, you must build the

software image so that it can be included in theSDRAM model image, sdram.srec.

If your software code is mcomp_tb.c, type thefollowing commands:$ sparc-elf-gcc -o mcomp_tb.elf mcomp_tb.c$ sparc-elf-objcopy -O srec mcomp_tb.elf sdram.srec

Now, sdram.srec can be put under design/leon3-gr-xc3s-1500 for simulation.

Page 25: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

25/29

Logic Simulation with GHDL GHDL is a VHDL compiler that generates a

executable, testbench.exe, for logic simulation.Under $(GRLIB)/design/leon3-gr-xc3s-1500 type:$ make ghdl$ ./testbench --vcd=waveform.vcd

All the runtime signals will be saved in thespecified vcd file, waveform.vcd

Page 26: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

26/29

Waveform Analysis with GTKWave You can analyze the signal file using GTKWave Downloadable from http://gtkwave.sourceforge.net/

Page 27: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

27/29

Issues with GHDL Simulation GHDL simulation is somewhat slow –it takes

about 10 minutes to co-simulate mcomp_tb.c andthe HW logic on a 2.4GHz dual core machine

VCD files are huge (over 200 MB for mcomp_tb.c) Structured VHDL signals not saved in vcd format You can, break out structured signals in your design

Another waveform format, ghw, saves structuredVHDL signals, but ghw file of LEON3 platformcrashes GTKWave

Page 28: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

28/29

GHDL Structured Signal Breakout If you plan to use vcd format, a structured signal

should be registered using simple signals:architecture rtl of mcomp is. . .

begin. . .

-- pragma translate_offhsel <= ahbsi.hsel;haddr <= ahbsi.haddr;hwrite <= ahbsi.hwrite;hwdata <= ahbsi.hwdata;hiready <= ahbsi.hready;

-- pragma translate_on. . .

-- sequential codes follows

end;

Page 29: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator

29/29

Final Remark

Doing digital circuit design is likecrafting a sculpture;

you have to visualize thewhole structure before you

start writing HDL code!