Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Lab-3: Simple AcceleratorDesign
National Chiao Tung UniversityChun-Jen Tsai
4/11/2011
2/29
Video Decoder Accelerator Goal: Implement a block-based motion compensation circuit,
Tasks: Extend the given accelerator from point-wise
operation to 88 block-based operation
Please give a demo (per group) to the TAs toshow that your system is working by 5/5 Each team hands in a three-page report to describe
your design, and provide the performanceimprovement numbers
3/29
mcomp
Architecture of the MCOMP Logic
hwdata
haddr
3
32
The architecture of the provided motion comp.module is composed of a register file and ainterpolation datapath
hrdata
Register filereg a
reg b
reg c
reg d
reg r
addr in
data in
addr[4:2]
Datapath
(a+b+1-r)/2, or(a+b+c+d+2-r)/4
data outhconfig
(832 ROM)
LEON3ahbctrl
256
4/29
Inside Lab 3 Package Download and unzip the package for lab3, you
will see the following directory structure:
Lab3_pkg|+- m4v_dec_ecos (the complete video decoder with HW accelerator API)|+- rtl (synthesizable RTL model of the motion compensation logic)|+- testbench (C program for waveform simulation or FPGA verification)
5/29
Functions to be Accelerated In bilinear8x8.c, there are three functions for
bilinear interpolation: halfpel8x8_h() halfpel8x8_v() halfpel8x8_hv()
Current code performs pixel-wise interpolation;you must modify it to block-based interpolation
p = (A + B + 1 –r)/2q = (A + C + 1 –r)/2o = (A + B + C + D + 2 –r)/4r = 0 or 1, is a “rounding control”
parameter (selected by the encoder).
A B
DC
p
oq
6/29
Pixel vs. Block-based Operations Take halfpel8x8_h() for example:
voidhalfpel8x8_h(uint8 * dst, uint8 * src, xint stride, xint rounding){
xint row, col, idx, sum;
idx = 0;for (row = 0; row < (stride << 3); idx = (row += stride)){
for (col = 0; col < 8; col++, idx++){
sum = (xint) src[idx] + (xint) src[idx+1] + 1 - rounding;dst[idx] = (uint8) (sum >> 1);
}}
}
cut-off point between HW and SW in your logic
Cut-off point between HW and SW in the provided logic
7/29
Block Interpolation Architecture
mcomp
hwdata
haddr
7
32
hrdata
Register file
pixel(0)
pixel(80)
addr in
data in
addr[8:2]
hconfig(832 ROM)
LEON3ahbctrl
256
..
.
type block_9x9 is array (0 to 80) of std_logic_vector(7 downto 0);signal pixel: block_9x9;
Controller
Block-basedInterpolator reg r
8/29
Adding Logic to the LEON Platform In Lab 3, we add an AHB slave logic to the
LEON 3 platform. The RTL model directorycontains some sample modifications:
rtl|+- nctu +- video_acc +- mcomp.vhd, video_acc.vhd| | +- vhdlsyn.txt| || +- dirs.txt|+- device.vhd (to replace the one in $(GRLIB)/lib/grlib/amba)|+- leon3mp.vhd (to replace the one in $(GRLIB)/design/leon3-gr-xc3s-1500)
Put this whole nctu directoryin $(GRLIB)/lib
Note: $(GRLIB) is the directory of the complete source tree of GRLIB.
9/29
Modify LEON Working Libraries To add our “nctu”video accelerator logic into the
LEON3 working library, we have to add thelibrary name “nctu”to the files libs.txt under thefollowing two directories: $(GRLIB)/designs/leon3-gr-xc3s1500/ $(GRLIB)/lib/
10/29
Generation of Build Scripts Enter the directory, designs/leon3-gr-xc3s-1500, in
GRLIB source three, and type the followingcommands:$ make distclean clean up previous builds
$ make scripts generate automatic build scripts
The command “make scripts”searches thedirectory tree $(GRLIB)/lib for libs.txt, dirs.txt,vhdlsyn.txt, and vhdlsim.txt in order to generateproper build scripts for all the hardware logics
11/29
Adding Logic to AHB Bus†
The AMBA bus used for LEON 3 has someextensions All HSEL signals are routed to all slave logics Some extra signals are added: HCACHE, HIRQ,
HCONFIG, and HINDEX The HCONFIG signal contains information about
the logic to be added Each HCONFIG signal is 84 bytes GRLIB allows 64 masters and 64 slaves, with their
HCONFIG record mirrored at 0xFFFFF000 ~0xFFFFFFFF
†For details, see Jiri Gaisler and Sandi Habinc, GRLIB IP Library User’s Manual, ver. 1.0.22, Aeroflex Gaisler, 2010
12/29
HCONFIG Record HCONFIG record controls the address map,
interrupt control, and caching control of a logic
13/29
Address Decoding (1/2) The memory area mapped to a logic is
determined by its bank address registers (BARs) There are two types of logics: memory and I/O For memory logic, each BAR specifies 1MB address
space in any location (except 0xFFF00000 ~0xFFFFFFFF) of the 32-bit address space by default
For I/O logic, each BAR specifies 256 bytes addressspace in any location in the address range0xFFF00000 ~ 0xFFFFFFFF by default
14/29
Address Decoding (2/2) If a logic needs a larger contiguous address
space, the MASK of a BAR can be used toincrease the default range of its spaceahbctrl: AHB arbiter/multiplexer rev 1ahbctrl: Common I/O area disabledahbctrl: AHB masters: 3, AHB slaves: 8ahbctrl: Configuration area at 0xfffff000, 4 kbyteahbctrl: mst0: Gaisler Research Leon3 SPARC V8 Processorahbctrl: mst1: Gaisler Research AHB Debug UARTahbctrl: mst2: Gaisler Research SVGA frame bufferahbctrl: slv0: European Space Agency Leon2 Memory Controllerahbctrl: memory at 0x00000000, size 512 Mbyte, cacheable, prefetchahbctrl: memory at 0x20000000, size 512 Mbyteahbctrl: memory at 0x40000000, size 1024 Mbyte, cacheable, prefetchahbctrl: slv1: Gaisler Research AHB/APB Bridgeahbctrl: memory at 0x80000000, size 1 Mbyteahbctrl: slv2: Gaisler Research Leon3 Debug Support Unitahbctrl: memory at 0x90000000, size 256 Mbyteahbctrl: slv6: CS NCTU CODESIGN Lab3 Motion Compensation Logicahbctrl: memory at 0xb0000000, size 1 Mbyte, prefetch
15/29
Interface of Lab 3 Logicentity mcomp isgeneric (ahbndx : integer := 0;ahbaddr : integer := 0;addrmsk : integer := 16#fff#;verid : integer := 0;hirq_no : integer := 0
);port (rst : in std_ulogic;clk : in std_ulogic;ahbsi : in ahb_slv_in_type;ahbso : out ahb_slv_out_type
);end;
architecture rtl of mcomp is
constant hconfig : ahb_config_type := (0 => ahb_device_reg(VENDOR_NCTU, NCTU_MCOMP, 0, verid, hirq_no),4 => ahb_membar(ahbaddr, '1', '0', addrmask),others => zero32
);
16/29
Vendor/Device ID Registration Your vendor and device ID can be stored in the
file $(GRLIB)/lib/grlib/amba/devices.vhd forsimulation purposes:-- HW/SW Co-design LAB vender code & device ID
constant VENDOR_NCTU : amba_vendor_type := 16#18#;constant NCTU_MCOMP : amba_device_type := 16#001#;
-- pragma translate_offconstant NCTU_DESC : vendor_description := "CS NCTU CODESIGN ";
constant nctu_device_table : device_table_type := (NCTU_ALU => "Lab3 Motion Compensation Logic ",others => "Unknown Device "
);
constant nctu_lib : vendor_library_type := (vendorid => VENDOR_NCTU,vendordesc => NCTU_DESC,device_table => nctu_device_table
);-- pragma translate_on
17/29
Device Instantiation The device must be instantiated in the file$(GRLIB)/designs/leon3-gr-xc3s-1500/leon3mp.vhd
In this example, the device registered IRQ #14.Raising HIRQ(14) causes an interrupt to theinterrupt controller, and then to the ISR
my_mcomp : mcomp generic map (ahbndx => 6,ahbaddr => 16#B00#,verid => 1,hirq_no => 14)
port map (rstn, clkm, ahbsi, ahbso(6));
18/29
Implement the HW Bit File Change directory to designs/leon3-gr-xc3s-1500,
logic implementation is done by typing:$ make ise
You can use iMPACT to configure the FPGA When you use “info sys”in GRMON, you should see
02.01:004 Gaisler Research LEON3 Debug Support Unit (ver 0x1)ahb: 90000000 - a0000000AHB trace 1 lines, stack pointer 0x43fffff0. . .
06.18:001 Unknown vendor Unknown device (ver 0x1)ahb: b0000000 - b0100000
01.01:00c Gaisler Research Generic APB UART (ver 0x1). . .
19/29
Testing the Accelerator To test the motion compensation logic you can
build and run the C testbench program,mcomp_tb.c Building mcomp_tb.c does not require eCos
To run the complete video decoder with HW“acceleration,”simply drop-in replace the filebilinear8x8.c in previous lab source tree
20/29
Calling HW Logic from SW Code
#define USE_HW_MC 1
#if USE_HW_MCvolatile xint *reg_a = (xint *)0xb0000000;volatile xint *reg_b = (xint *)0xb0000004;volatile xint *reg_c = (xint *)0xb0000008;volatile xint *reg_d = (xint *)0xb000000c;volatile xint *reg_r = (xint *)0xb0000010;
volatile xint *mc_2pt = (xint *)0xb0000014;volatile xint *mc_4pt = (xint *)0xb0000018;#elsexint sum;#endif
voidhalfpel8x8_h(uint8 * dst, uint8 * src,
xint stride, xint rounding){
xint row, col, idx;
idx = 0;for (row=0; row<(stride<<3);
idx=(row+=stride)){
for (col = 0; col < 8; col++, idx++){
#if USE_HW_MC*reg_a = (xint) src[idx];*reg_b = (xint) src[idx+1];*reg_r = (xint) rounding;dst[idx] = (uint8) (*mc_2pt);
#elsesum = (xint) src[idx] +
(xint) src[idx + 1] + 1- rounding;
dst[idx] = (uint8) (sum >> 1);#endif
}}
}
. . .
The current referenceinterface design isnaive:
21/29
HW/SW Co-Simulation Sometimes, you want to use logic simulator to
debug your accelerator It is difficult to write comprehensive testbench in HDL
for complex systems (with both HW and SWcomponents)
Since we have the RTL model of the processor,we can perform HW/SW co-simulation using alogic simulator
22/29
Installation of Logic Simulator For logic simulation, we use GHDL, an open-
source, free logic simulator for VHDL, the projecthome page is located at http://ghdl.free.fr/
The native Windows version of GHDL does notwork for LEON3; get the Cygwin version from:http://home.comcast.net/~bp_labs/software/ghdl/ Note: the command path to GHDL should be set after
/usr/bin, because LEON3 configuration script requiresgcc 3.x.x while GHDL has a built-in gcc 4.x.x that willcause problems if the path is not set properly
23/29
Platform Setup for Simulation To set up LEON3 for simulation, type make xconfig under
the directory: $(GRLIB)/design/leon3-gr-xc3s-1500 In configuration menu, made the following modifications:
Processor Debug Support Unit Disable instr./AHB trace buffers Debug Link Enable only serial debug link VHDL Debugging Enable accelerated UART tracing Peripherals Disable everything except Memory Controller, UART,
Timer, Interrupt Controller
24/29
Building Software for Co-simulation For HW/SW co-simulation, you must build the
software image so that it can be included in theSDRAM model image, sdram.srec.
If your software code is mcomp_tb.c, type thefollowing commands:$ sparc-elf-gcc -o mcomp_tb.elf mcomp_tb.c$ sparc-elf-objcopy -O srec mcomp_tb.elf sdram.srec
Now, sdram.srec can be put under design/leon3-gr-xc3s-1500 for simulation.
25/29
Logic Simulation with GHDL GHDL is a VHDL compiler that generates a
executable, testbench.exe, for logic simulation.Under $(GRLIB)/design/leon3-gr-xc3s-1500 type:$ make ghdl$ ./testbench --vcd=waveform.vcd
All the runtime signals will be saved in thespecified vcd file, waveform.vcd
26/29
Waveform Analysis with GTKWave You can analyze the signal file using GTKWave Downloadable from http://gtkwave.sourceforge.net/
27/29
Issues with GHDL Simulation GHDL simulation is somewhat slow –it takes
about 10 minutes to co-simulate mcomp_tb.c andthe HW logic on a 2.4GHz dual core machine
VCD files are huge (over 200 MB for mcomp_tb.c) Structured VHDL signals not saved in vcd format You can, break out structured signals in your design
Another waveform format, ghw, saves structuredVHDL signals, but ghw file of LEON3 platformcrashes GTKWave
28/29
GHDL Structured Signal Breakout If you plan to use vcd format, a structured signal
should be registered using simple signals:architecture rtl of mcomp is. . .
begin. . .
-- pragma translate_offhsel <= ahbsi.hsel;haddr <= ahbsi.haddr;hwrite <= ahbsi.hwrite;hwdata <= ahbsi.hwdata;hiready <= ahbsi.hready;
-- pragma translate_on. . .
-- sequential codes follows
end;
29/29
Final Remark
Doing digital circuit design is likecrafting a sculpture;
you have to visualize thewhole structure before you
start writing HDL code!