Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
ELE432ADVANCED DIGITAL DESIGN
HACETTEPE UNIVERSITYROUTING and FPGA MEMORY
In part from ECE 448 – FPGA and ASIC Design with VHDL
Organization of the Week
• Routing in FPGA• Memory Design in FPGA
IOB IOB IOB IOB
CLB CLB
CLB CLBIO
BIO
BIO
BIO
B
Wiring Channels
CLB - Configurable Logic Block◦ 5-input, 1 output function◦ or 2 4-input, 1 output functions◦ optional register on outputs
Built-in fast carry logic Can be used as memory RAM-programmable◦ can be reconfigured
Xilinx Programmable Gate Arrays
The Virtex CLB
Details of One Virtex Slice
Each slice contains two sets of the following:◦ Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM (SLICEM only) or 16-bit shift register (SLICEM only)◦ Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic◦ Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control
CLB Slice Structure
4-input function
3-input function;
registered
e.g. 9-input parity
Implement Some Larger Functions
LUT (Look-Up Table) Functionality• Look-Up tables
are primary elements for
logic implementation
• Each LUT can implement any
function of 4 inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
• The general architecture of Xilinx FPGAs consists of a two-dimensional array of programmable blocks, called Configurable Logic Blocks – CLBs,
• with horizontal and vertical routing channels between CLB’s rows and columns.
Xilinx Routing
Island Style Architectecture
Flexibility of Connection, Fc = 2, Can A connect to B?
Connection boxes
Switch Boxes
Fs, defines for a wiring segment entering the S block the number of other wiring segments it can be connected to
Routings using C and S Boxes
• Maze Router
• A* Search Routing
• The Pathfinder
Routing Algorithms
Example Maze Router
Memory Types
Memory Needs
Many applications require memory ‘Table-Driven’ code
Storage for tables
Signal Processing Storage for coefficients
Image Processing Storage for ‘windows’ in images
Text processing Storage for dictionaries
…• Unfortunately, memory tends to be a critical resource in
FPGA implementations!
Memory Types
Memory TypesMemory
Distributed (MLUT-based)
Block RAM-based(BRAM-based)
Inferred Instantiated
Memory
Manually Using Core Generator
Memory Organizations• Classified by number of ports
• Single• One reader or one writer at any time
• Dual• Two ports – simultaneous read or write
• You can simultaneously perform one read and one write operations to different locations where the write operation happens on port A and the read operation happens on port B.
• Conflicts can occur – software is usually responsible for ensuring that operation is ‘safe’
• Used for interfacing between two separate systems
eg communication between two processors
adddata
R/~W
adddata
R/~W
adddata
R/~W
adddata
R/~W
FPGA DistributedMemory
The configuration logic blocks(CLB) in most of the Xilinx FPGA's contain small single port ordouble port RAM. This RAM is normally distributed throughout the FPGA than as a singleblock(It is spread out over many LUT's) and so it is called "distributed RAM". A look up tableon a FPGA can be configured as a 16*1bit RAM , ROM, LUT or 16bit shift register.
For Spartan-3 series, each CLB contains upto 64 bits of single port RAM or 32 bits of dualport RAM.
As indicated from the size, a single CLB may not be enough to implement a large memory.Also the most of this small RAM's have their input and output as 1 bit wide. Forimplementing larger and wider memory functions you can connectseveral distributed RAM's in parallel.
The Design Warrior’s Guide to FPGAsDevices, Tools, and Flows. ISBN 0750676043
Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
Multipurpose LUT
Simplified logic cell (LC)
RAM16X1S
O
DWE
WCLKA0A1A2A3
RAM32X1S
O
DWEWCLKA0A1A2A3A4
RAM16X2S
O1
D0
WEWCLKA0A1A2A3
D1
O0
=
=LUT
LUT or
LUT
RAM16X1D
SPO
DWE
WCLKA0A1A2A3DPRA0 DPODPRA1DPRA2DPRA3
or
Distributed RAM
• CLB LUT’s are configurable as Distributed RAM
• A LUT equals 16x1 RAM• Cascade LUTs to increase RAM size
• Synchronous write• Can create a synchronous read by
using extra flip-flops
• Asynchronous read• Naturally, distributed RAM read is
asynchronous
• Two LUTs can make• 32 x 1 single-port RAM• 16 x 2 single-port RAM• 16 x 1 dual-port RAM
Single-port 64 x 1-bit RAM
Dual-port 64 x 1 RAM
Total Size of Distributed RAM
FPGA Block RAM
Block RAM
Spartan-3Dual-Port
Block RAM
Port A
Port BBlock RAM-or Embedded RAM
• Most efficient memory implementation• Dedicated blocks of memory
• Ideal for most memory requirements• 4 to 104 memory blocks
• 18 kbits = 18,432 bits per block (16 k without parity bits)• Use multiple blocks for larger memories
• Builds both single and true dual-port RAMs• Synchronous write and read (different from
distributed RAM)
RAM Blocks and Multipliers in Xilinx FPGAs
Spartan-6 Block RAM Amounts
Block RAM can have various configurations (port aspect ratios)
0
16,383
1
4,095
40
8,191
20
2047
8+10
1023
16+20
16k x 1
8k x 2 4k x 4
2k x (8+1)
1024 x (16+2)
Block RAM Port Aspect Ratios
Block RAM Interface
Block RAM Ports
• Inferring is the (recognized) automatically generated logic by the tool that you didn’t describe specifically
• instantiation is the tool generated logic which is described by you
InferredRAM
Distributed vs Block RAMs
• Distributed RAM: must be used for RAM descriptions with asynchronous (data is read from memory as soon as the address is given, doesn't wait for the clock edge) read.
• Block RAM: generally used for RAM descriptions with synchronous read.
• Synchronous write for both Distributed and Block RAMs (data is written to RAM only happens at rising edge of clock).
• Any size and data width are allowed in RAM descriptions.- Depending on resource availability
• Up to two write ports are allowed.
Examples:1. Distributed RAM with asynchronous read
2. Distributed RAM with "false" synchronous read
3. Block RAM with synchronous read
4. Distributed dual-port RAM with asynchronous read
More RAM examples from XST Coding Guidelines:http://toolbox.xilinx.com/docsan/xilinx4/data/docs/xst/hdlc
ode.html
Distributed vs Block RAMs
Distributed RAM with asynchronous read
Distributed RAM with asynchronous read
LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;
entity raminfr is generic (bits : integer := 32;
-- number of bits per RAM wordaddr_bits : integer := 3); -- 2^addr_bits = number of words in RAM
port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0));
end raminfr;
Distributed RAM with asynchronous readarchitecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0)
of std_logic_vector (bits-1 downto 0); signal RAM : ram_type;
begin process (clk) begin
if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di;
end if; end if;
end process; do <= RAM(conv_integer(unsigned(a)));
end behavioral;
Report from Implementation
Design Summary:Number of errors: 0Number of warnings: 0Logic Utilization:Logic Distribution:
Number of occupied Slices: 16 out of 768 2%Number of Slices containing only related logic: 16 out of 16 100%Number of Slices containing unrelated logic: 0 out of 16 0%
*See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs: 32 out of 1,536 2%
Number used as 16x1 RAMs: 32Number of bonded IOBs: 69 out of 124 55%Number of GCLKs: 1 out of 8 12%
Distributed RAM with "false" synchronous read
Distributed RAM with "false" synchronous read
LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;
entity raminfr is generic ( bits : integer := 32;
-- number of bits per RAM wordaddr_bits : integer := 3); -- 2^addr_bits = number of words in RAM
port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0));
end raminfr;
Distributed RAM with "false" synchronous read
architecture behavioral of raminfr is type ram_type is array (2**addr_bits-1 downto 0)
of std_logic_vector (bits-1 downto 0); signal RAM : ram_type;
begin process (clk) begin
if (clk'event and clk = '1') then if (we = '1') then
RAM(conv_integer(unsigned(a))) <= di; end if; do <= RAM(conv_integer(unsigned(a)));
end if; end process;
end behavioral;
Report from Implementation
Design Summary:
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Number of Slice Flip Flops: 32 out of 1,536 2%
Logic Distribution:
Number of occupied Slices: 16 out of 768 2%
Number of Slices containing only related logic: 16 out of 16 100%
Number of Slices containing unrelated logic: 0 out of 16 0%
*See NOTES below for an explanation of the effects of unrelated logic
Total Number of 4 input LUTs: 32 out of 1,536 2%
Number used as 16x1 RAMs: 32
Number of bonded IOBs: 69 out of 124 55%
Number of GCLKs: 1 out of 8 12%
Total equivalent gate count for design: 4,355
Block RAM with synchronous read (read through)
The following description implements a true synchronous read. A true synchronous read is the synchronization mechanism in Virtex device block RAMs, where the read address is registered on the RAM clock edge. Such descriptions are directly mappable onto block RAM, as shown in the diagram below.
Block RAM with synchronous read (read through)
LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;
entity raminfr is generic ( bits : integer := 32;
-- number of bits per RAM wordaddr_bits : integer := 3);
-- 2^addr_bits = number of words in RAM port (clk : in std_logic;
we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0));
end raminfr;
Block RAM with synchronous read (read through) cont'd
architecture behavioral of raminfr is
type ram_type is array (2**addr_bits-1 downto 0) ofstd_logic_vector (bits-1 downto 0);
signal RAM : ram_type;signal read_a : std_logic_vector(addr_bits-1 downto 0);
begin process (clk) begin if (clk'event and clk = '1') then if (we = '1') then RAM(conv_integer(unsigned(a))) <= di;
end if;
read_a <= a;end if;
end process; do <= RAM(conv_integer(unsigned(read_a)));
end behavioral;
Report from Implementation
Design Summary:
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Logic Distribution:
Number of Slices containing only related logic: 0 out of 0 0%
Number of Slices containing unrelated logic: 0 out of 0 0%
*See NOTES below for an explanation of the effects of unrelated logic
Number of bonded IOBs: 69 out of 124 55%
Number of Block RAMs: 1 out of 4 25%
Number of GCLKs: 1 out of 8 12%
Distributed dual-port RAM with asynchronous read
Distributed dual-port RAM with asynchronous read
library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; use ieee.std_logic_arith.all;
entity raminfr is generic ( bits : integer := 32;
-- number of bits per RAM wordaddr_bits : integer := 3); -- 2^addr_bits = number of words in RAM
port (clk : in std_logic; we : in std_logic; a : in std_logic_vector(addr_bits-1 downto 0); dpra : in std_logic_vector(addr_bits-1 downto 0); di : in std_logic_vector(bits-1 downto 0); spo : out std_logic_vector(bits-1 downto 0); dpo : out std_logic_vector(bits-1 downto 0));
end raminfr;
Distributed dual-port RAM with asynchronous read
architecture syn of raminfr is type ram_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0); signal RAM : ram_type;
begin process (clk) begin if (clk'event and clk = '1') then
if (we = '1') then RAM(conv_integer(unsigned(a))) <= di;
end if; end if;
end process; spo <= RAM(conv_integer(unsigned(a))); dpo <= RAM(conv_integer(unsigned(dpra)));
end syn;
Report from Implementation
Design Summary:Number of errors: 0Number of warnings: 0Logic Utilization:Logic Distribution:Number of occupied Slices: 32 out of 768 4%
Number of Slices containing only related logic: 32 out of 32 100%Number of Slices containing unrelated logic: 0 out of 32 0%*See NOTES below for an explanation of the effects of unrelated logic
Total Number of 4 input LUTs: 64 out of 1,536 4%Number used for Dual Port RAMs: 64
(Two LUTs used per Dual Port RAM)Number of bonded IOBs: 104 out of 124 83%Number of GCLKs: 1 out of 8 12%
Specification of memory types recognized by Synplify Pro program!
attribute syn_ramstyle : string;attribute syn_ramstyle of memory : signal is "block_ram";
attribute syn_ramstyle : string;attribute syn_ramstyle of memory : signal is “select_ram";
LUT-based Distributed Memory:
Block RAM Memory:
SIGNAL memory : vector_array;
Block RAM Initialization
Example 1type ram_type is array (0 to 127) of std_logic_vector(15 downto 0); signal RAM : ram_type := (others => ”0000111100110101”;
Example 2type ram_type is array (0 to 127) of std_logic_vector(15 downto 0); signal RAM : ram_type := (others => (others => ‘1’));
Example 3type ram_type is array (0 to 127) of std_logic_vector(15 downto 0); signal RAM : ram_type := (196 downto 100 => X”B9B5”,
others => X”3344”);
Block RAM Initialization from a File
rams_20c.data:001011000101111011110010000100001111101011000110011010101010110101110111…101011110111001011111000110001010000
128
GenericInferred
ROM
Distributed dual-port ROM with asynchronous read
LIBRARY ieee;USE ieee.std_logic_1164.all;USE ieee.std_logic_arith.all;USE ieee.std_logic_unsigned.all;
entity rominfr is generic ( bits : integer := 10;
-- number of bits per ROM wordaddr_bits : integer := 3); -- 2^addr_bits = number of words in ROM
port (a : in std_logic_vector(addr_bits-1 downto 0); do : out std_logic_vector(bits-1 downto 0));
end rominfr;
Distributed dual-port ROM with asynchronous readarchitecture behavioral of rominfr is type rom_type is array (2**addr_bits-1 downto 0) of std_logic_vector (bits-1 downto 0);
constant ROM : rom_type :=("0000110001","0100110100","0100110110","0110110000","0000111100","0111110101","0100110100","1111100111");
begin do <= ROM(conv_integer(unsigned(a)));
end behavioral;
Report from SynthesisResource Usage Report for rominfr
Mapping to part: xc3s50pq208-5Cell usage:VCC 1 useLUT2 2 usesLUT3 7 uses
I/O ports: 13I/O primitives: 13IBUF 3 usesOBUF 10 uses
I/O Register bits: 0Register bits not including I/Os: 0 (0%)
Mapping Summary:Total LUTs: 9 (0%)
Report from Implementation
Design Summary:Number of errors: 0Number of warnings: 0Logic Utilization:
Number of 4 input LUTs: 9 out of 1,536 1%Logic Distribution:
Number of occupied Slices: 5 out of 768 1%Number of Slices containing only related logic: 5 out of 5 100%Number of Slices containing unrelated logic: 0 out of 5 0%
*See NOTES below for an explanation of the effects of unrelated logicTotal Number of 4 input LUTs: 9 out of 1,536 1%
Number of bonded IOBs: 13 out of 124 10%
FPGAspecific memories
(Instantiation)
RAM 16x1 (1)library IEEE;use IEEE.STD_LOGIC_1164.all;
library UNISIM;use UNISIM.all;
entity RAM_16X1_DISTRIBUTED isport(
CLK : in STD_LOGIC;WE : in STD_LOGIC;ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC;DATA_OUT : out STD_LOGIC
);end RAM_16X1_DISTRIBUTED;
RAM 16x1 (2)architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is
-- part used by the synthesis tool, Synplify Pro, only;-- ignored during simulation attribute INIT : string;attribute INIT of RAM_16x1s_1: label is "0000”;
component ram16x1sgeneric(
INIT : BIT_VECTOR(15 downto 0) := X"0000");port(
O : out std_ulogic; -- note std_ulogic not std_logicA0 : in std_ulogic;A1 : in std_ulogic;A2 : in std_ulogic;A3 : in std_ulogic;D : in std_ulogic;WCLK : in std_ulogic;WE : in std_ulogic);
end component;
RAM 16x1 (3)begin
RAM_16x1s_1: ram16x1s generic map (INIT => X"0000")port map
(O => DATA_OUT,A0 => ADDR(0),A1 => ADDR(1),A2 => ADDR(2),A3 => ADDR(3),D => DATA_IN,WCLK => CLK,WE => WE
);
end RAM_16X1_DISTRIBUTED_STRUCTURAL;
RAM 16x8 (1)library IEEE;use IEEE.STD_LOGIC_1164.all;
library UNISIM;use UNISIM.all;
entity RAM_16X8_DISTRIBUTED isport(
CLK : in STD_LOGIC;WE : in STD_LOGIC;ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_IN : in STD_LOGIC_VECTOR(7 downto 0);DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0)
);end RAM_16X8_DISTRIBUTED;
RAM 16x8 (2)architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is-- part used by the synthesis tool, Synplify Pro, only; -- ignored during simulationattribute INIT : string;attribute INIT of RAM_16x1s_1: label is "0000";
component ram16x1sgeneric(
INIT : BIT_VECTOR(15 downto 0) := X"0000");port(
O : out std_ulogic;A0 : in std_ulogic;A1 : in std_ulogic;A2 : in std_ulogic;A3 : in std_ulogic;D : in std_ulogic;WCLK : in std_ulogic;WE : in std_ulogic);
end component;
RAM 16x8 (3)begin
GENERATE_MEMORY:for I in 0 to 7 generate
RAM_16x1_S_1: ram16x1sgeneric map (INIT => X"0000")port map
(O => DATA_OUT(I),A0 => ADDR(0),A1 => ADDR(1),A2 => ADDR(2),
A3 => ADDR(3),D => DATA_IN(I),WCLK => CLK,
WE => WE);
end generate;
end RAM_16X8_DISTRIBUTED_STRUCTURAL;
ROM 16x1 (1)library IEEE;use IEEE.STD_LOGIC_1164.all;
library UNISIM;use UNISIM.all;
entity ROM_16X1_DISTRIBUTED isport(
ADDR : in STD_LOGIC_VECTOR(3 downto 0); DATA_OUT : out STD_LOGIC
);end ROM_16X1_DISTRIBUTED;
ROM 16x1 (2)architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is
-- part used by the synthesis tool, Synplify Pro, only;
-- ignored during simulation
attribute INIT : string;
attribute INIT of rom16x1s_1: label is "F0C1";
component ram16x1s
generic(
INIT : BIT_VECTOR(15 downto 0) := X"0000");
port(
O : out std_ulogic;
A0 : in std_ulogic;
A1 : in std_ulogic;
A2 : in std_ulogic;
A3 : in std_ulogic;
D : in std_ulogic;
WCLK : in std_ulogic;
WE : in std_ulogic);
end component;
signal Low : std_ulogic := '0';
ROM 16x1 (3)begin
rom16x1s_1: ram16x1sgeneric map (INIT => X"F0C1")
port map(O=>DATA_OUT,
A0=>ADDR(0),A1=>ADDR(1),A2=>ADDR(2),A3=>ADDR(3),D=>Low,WCLK=>Low,WE=>Low
);
end ROM_16X1_DISTRIBUTED_STRUCTURAL;
Block RAM library componentsComponent Data Cells Parity Cells Address Bus Data Bus Parity Bus
Depth Width Depth Width
RAMB16_S1 16384 1 - - (13:0) (0:0) -
RAMB16_S2 8192 2 - - (12:0) (1:0) -
RAMB16_S4 4096 4 - - (11:0) (3:0) -
RAMB16_S9 2048 8 2048 1 (10:0) (7:0) (0:0)
RAMB16_S18 1024 16 1024 2 (9:0) (15:0) (1:0)
RAMB16_S36 512 32 512 4 (8:0) (31:0) (3:0)
Component declaration for BRAM (1)
-- Component Declaration for RAMB16_S1 -- Should be placed after architecture statement but before begincomponent RAMB16_S1 -- synthesis translate_offgeneric (
INIT : bit_vector := X"0"; INIT_00 : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; …………………………………INIT_3F : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; SRVAL : bit_vector := X"0"; WRITE_MODE : string := "WRITE_FIRST"); -- synthesis translate_on
port (DO : out STD_LOGIC_VECTOR (0 downto 0) ADDR : in STD_LOGIC_VECTOR (13 downto 0); CLK : in STD_ULOGIC; DI : in STD_LOGIC_VECTOR (0 downto 0); EN : in STD_ULOGIC; SSR : in STD_ULOGIC; WE : in STD_ULOGIC);
end component;
Genaral template of BRAM instantiation (1)
-- Component Attribute Specification for RAMB16_{S1 | S2 | S4}
-- Should be placed after architecture declaration but before the begin
-- Put attributes, if necessary
-- Component Instantiation for RAMB16_{S1 | S2 | S4}
-- Should be placed in architecture after the begin keyword
RAMB16_{S1 | S2 | S4}_INSTANCE_NAME : RAMB16_S1
-- synthesis translate_off
generic map (
INIT => bit_value,
INIT_00 => vector_value,
INIT_01 => vector_value,
……………………………..
INIT_3F => vector_value,
SRVAL=> bit_value,
WRITE_MODE => user_WRITE_MODE)
-- synopsys translate_on
port map (DO => user_DO,
ADDR => user_ADDR,
CLK => user_CLK,
DI => user_DI,
EN => user_EN,
SSR => user_SSR,
WE => user_WE);
INIT_00 : BIT_VECTOR := X"014A0C0F09170A04076802A800260205002A01C5020A0917006A006800060040";INIT_01 : BIT_VECTOR := X"000000000000000008000A1907070A1706070A020026014A0C0F03AA09170026";INIT_02 : BIT_VECTOR := X"0000000000000000000000000000000000000000000000000000000000000000";INIT_03 : BIT_VECTOR := X"0000000000000000000000000000000000000000000000000000000000000000";……………………………………………………………………………………………………………………………………INIT_3F : BIT_VECTOR := X"0000000000000000000000000000000000000000000000000000000000000000")
0000F0
0000F1
0000F2
0000F3
0000F4
0000FE
0000FF
INIT_3FADDRESS
002610
091711
03AA12
0C0F13
014A14
00001E
00001F
INIT_01ADDRESS
004000
000601
006802
006A03
091704
0C0F0E
014A0F
INIT_00ADDRESS
Addresses are shown in red and
data corresponding to the same
memory location is shown in black
ADDRESSDATA
Initializing Block RAMs 1024x16
Component declaration for BRAM (2)VHDL Instantiation Template for RAMB16_S9, S18 and S36 -- Component Declaration for RAMB16_{S9 | S18 | S36} component RAMB16_{S9 | S18 | S36} -- synthesis translate_off generic (
INIT : bit_vector := X"0"; INIT_00 : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_3E : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; INIT_3F : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; INITP_00 : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; INITP_07 : bit_vector :=
X"0000000000000000000000000000000000000000000000000000000000000000"; SRVAL : bit_vector := X"0"; WRITE_MODE : string := "WRITE_FIRST"; );
Component declaration for BRAM (2)
-- synthesis translate_on
port (DO : out STD_LOGIC_VECTOR (0 downto 0);
DOP : out STD_LOGIC_VECTOR (1 downto 0);
ADDR : in STD_LOGIC_VECTOR (13 downto 0);
CLK : in STD_ULOGIC;
DI : in STD_LOGIC_VECTOR (0 downto 0);
DIP : in STD_LOGIC_VECTOR (0 downto 0);
EN : in STD_ULOGIC;
SSR : in STD_ULOGIC;
WE : in STD_ULOGIC);
end component;
Genaral template of BRAM instantiation (2)
-- Component Attribute Specification for RAMB16_{S9 | S18 | S36}
-- Component Instantiation for RAMB16_{S9 | S18 | S36}
-- Should be placed in architecture after the begin keyword
RAMB16_{S9 | S18 | S36}_INSTANCE_NAME : RAMB16_S1
-- synthesis translate_off
generic map (
INIT => bit_value,
INIT_00 => vector_value,
. . . . . . . . . .
INIT_3F => vector_value,
INITP_00 => vector_value,
……………
INITP_07 => vector_value
SRVAL => bit_value,
WRITE_MODE => user_WRITE_MODE)
-- synopsys translate_on
port map (DO => user_DO,
DOP => user_DOP,
ADDR => user_ADDR,
CLK => user_CLK,
DI => user_DI,
DIP => user_DIP,
EN => user_EN,
SSR => user_SSR,
WE => user_WE);
Memory Organizations
• Shift Registers• Synchronous• Stores n words• For each word input, one word is output
clk
data dataExactly n words stored internally
Memory OrganizationsLIBRARY ieee; USE ieee.std_logic_1164.ALL; USE work.app_types.ALL;ENTITY shift_register IS
GENERIC( n : POSITIVE := 8 );PORT( data_in : IN word; data_out : OUT word;
clk : std_ulogic (;END ENTITY shift_register;
ARCHITECTURE a OF shift_register ISTYPE data_array IS ARRAY( 0 TO n-1 ) OF word;SIGNAL mem: data_array;BEGINPROCESS( clk )
BEGINIF clk'EVENT AND clk = '1' THEN
mem( mem'LOW ) <= data_in;data_out <= mem( mem'HIGH );FOR j IN 1 TO n-1 LOOP
mem( j ) <= mem( j-1 );END LOOP;
END IF;END PROCESS;
END a;
Memory Organizations
• First-In First-Out (FIFO)• Stores n words• Independent R and W ports• Used for matching data rates• Variants
• Synchronous• Asynchronous
• Need full and empty flags• Also commonly provide ‘almost full’, ‘almost empty’ flags• These allow a ‘busy’ response to be sent to the provider (input) several cycles
before the FIFO actually becomes full,eg we used it over the network in Achilles –the receiver end FIFO sends ‘busy’ when it’s almost full back through the net to the sender. This may take several cycles, but the sender can safely continue to send.
• Similarly ‘almost empty’ can tell the provider to ‘wake up’
LIBRARY ieee; USE ieee.std_logic_1164.ALL;USE work.app_types.ALL;
ENTITY FIFO_asynch ISGENERIC( n : POSITIVE := 8 );PORT( data_in : IN word; data_out : OUT word;
rd, wr, reset : std_ulogic;full, empty : OUT std_ulogic );
END ENTITY FIFO_asynch;
ARCHITECTURE a OF FIFO_asynch ISSUBTYPE index IS natural RANGE 0 TO n-1;TYPE data_array IS ARRAY( index'LOW TO index'HIGH ) OF word;SIGNAL mem: data_array;SIGNAL read_ix, write_ix : index;BEGINPROCESS( rd, wr )
VARIABLE r_ix, w_ix : natural := 0; BEGINIF reset = '1' THEN
read_ix <= 0;write_ix <= 0;
ELSIF rd = '1' THENIF ( read_ix /= write_ix ) THEN
data_out <= mem( read_ix );empty <= '0';read_ix <= read_ix + 1;
END IF;END IF;IF wr = '1' THEN
IF ( read_ix /= write_ix ) THENdata_out <= mem( read_ix );empty <= '0';write_ix <= write_ix - 1;
END IF;END IF;IF ( read_ix = write_ix ) THEN
empty <= '1';END IF;
END PROCESS;END a;
FIFO (synchronous) Model
Memory Organizations
Content Addressable memory ‘Dictionary’ style applications
Does the memory contain a given word? Ordinary memory requires O(n) time
• Each word is checked in turn Binary and tree searches can reduce this to O( log n )
Input : a search ‘key’ – one word of dataEach location of memory is searched in parallel Indicates whether or not a match occurred in O( 1 ) time!
• ReturnsEither• True / false = key found / not foundor• Index of (one) match• Allows lookup of data in accompanying data table
Expensive to implement• Requires O(n) comparators for an n-word store
key
match
Up to n words stored internally
search/~add
Example hardware implementation• Content-addressable memory is often used in
computer networking devices. • For example, when a network switch receives a data
frame from one of its ports, it updates an internal table with the frame's source MAC address and the port it was received on.
• It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out on that port.
• The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch's latency.
LIBRARY ieee; USE ieee.std_logic_1164.ALL; USE ieee.std_logic_arith.ALL;USE work.app_types.ALL;ENTITY CAM IS
GENERIC( n : POSITIVE := 8 );PORT( key : IN word; found : OUT std_ulogic;
search : IN std_ulogic; full : OUT std_ulogic;reset, clk : std_ulogic );
END ENTITY CAM;ARCHITECTURE a OF CAM IS
TYPE data_array IS ARRAY( 0 TO n-1 ) OF word;SIGNAL mem: data_array; SIGNAL top : natural;BEGINPROCESS( clk, reset )
VARIABLE match : BOOLEAN;BEGINIF reset = '1' THEN
top <= 0;full <= ‘0’;
ELSIF clk'EVENT AND clk = '1' THENIF search = '1' THEN -- Search mode
match := FALSE;FOR j IN data_array'RANGE LOOP
match := key = mem( j );END LOOP;IF match THEN found <= '1’;ELSE found <= '0’;END IF;
ELSE -- Add a new entryIF top = data_array'HIGH THEN
full <= '1’;ELSE
top <= top + 1;mem( top ) <= key;
END IF;END IF;
END IF;END PROCESS;
END a;
CAM implementation
Memory Limitations• Clearly these amounts are ‘small’ for modern demands
Example: • Image processing
• A ‘low’ resolution image• 1000x1000 = 106 pixels• 8 Mbits in B&W • 24 Mbits in colour
• All large FPGAs now provide ‘conventional’ memory blocks
• Some examples ….
Some examples
• ispXGA 1200• Block Memory or ‘SysMEM’ 414K bits
• 90 Blocks of ‘regular’ memory distributed throughout the device• Each block (EBR)• Dual port
• 256 x 36• 1024 x 9
• Quad port• 512 x 18• 1024 x 18 (using 2 blocks)
• FIFO• 256 x 36• 512 x 18• 1024 x 9
• Content Addressable memory
Alternative memory creation
• Use • Manufacturer’s library components
eg Altera LPM library : lpm_fifo, lpm_shiftreg, lpm_ram_dq, lpm_rom
• Memory generator programs• Alteras mega-function wizard• Xilinx’ CoreGen
LIBRARY altera_mf;
USE altera_mf.altera_mf_components.all;
ENTITY fff IS
PORT
(
address : IN STD_LOGIC_VECTOR (7 DOWNTO 0);
clock : IN STD_LOGIC := '1';
data : IN STD_LOGIC_VECTOR (7 DOWNTO 0);
wren : IN STD_LOGIC ;
q : OUT STD_LOGIC_VECTOR (7 DOWNTO 0)
);
END fff;
ARCHITECTURE SYN OF fff IS
SIGNAL sub_wire0 : STD_LOGIC_VECTOR (7 DOWNTO 0);
BEGIN
q <= sub_wire0(7 DOWNTO 0);
altsyncram_component : altsyncram
GENERIC MAP (
clock_enable_input_a => "BYPASS",
clock_enable_output_a => "BYPASS",
intended_device_family => "Cyclone V",
lpm_hint => "ENABLE_RUNTIME_MOD=NO",
lpm_type => "altsyncram",
numwords_a => 256,
operation_mode => "SINGLE_PORT",
outdata_aclr_a => "NONE",
outdata_reg_a => "CLOCK0",
power_up_uninitialized => "FALSE",
read_during_write_mode_port_a => "NEW_DATA_NO_NBE_READ",
widthad_a => 8,
width_a => 8,
width_byteena_a => 1
)
PORT MAP (
address_a => address,
clock0 => clock,
data_a => data,
wren_a => wren,
q_a => sub_wire0
);
END SYN;
Still Not enough embedded RAM?
• External Memory• Large FPGAs have 200+ I/O pins
• All configurable as IN or OUT• Interfacing with external devices is straightforward
• Static RAM• FIFOs are particularly easy!• Some manufacturers provide ‘cores’ (packaged solutions) for SDRAM,
DDRAM, etc
• Only problem• You don’t have quite the same flexibility when using an external
memory chip(s)• Size• Data width
Not enough embedded RAM?