View
222
Download
5
Tags:
Embed Size (px)
Citation preview
1
Heterogeneous Logic Blocks
1. Mixture of two different sizes of LUTs: Larger LUT and cluster sizes: higher speed Smaller sizes: more area efficient
− Up to the CAD tool to select the resource
2. Mixture of PAL-like LBs and LUT-based LBs: PAL blocks: improved circuit speed LUT blocks: area efficiency
3. Mixture of “specific-purpose logic” and general-purpose LBs: SP LBs: superior area, speed, and power
consumption If the function is not used, the silicon area is wasted
2
Heterogeneous Logic Blocks
• Key questions:
1. Which kinds of SP functions?
2. What should be the ratio: SP/GP?
3. What can be done about SP LBs not used in a specific application?− Rose’s golden rule: “build structures that are always
useful, even if that use is less than perfectly efficient.”− “The more useful a hard structure is, across a wider
range of applications, then the greater its net benefit - provided the cost of the extra functionality is not excessive.”− Rose. Hard vs. Soft: The Central Question of Pre-Fabricated
Silicon. In Proceedings of the 34th International Symposium on Multiple-Valued Logic (ISMVL’04), 2004.
3
Hard Blocks
• Common hard blocks in modern FPGAs: Memory Multipliers MAC for DSP applications Microprocessors
Embedded Memories
5
Memory in Altera Flex10K
6
Memory in FLEX 10K
7
Memory in FLEX 10K
8
Heterogeneous Logic Blocks
• Each EAB: 2048 bits if used as memory
− Dual port RAM, ROM, FIFO, … 10-600 gates if used as logic
9
پيكر بندي به عنوان حافظه
A[10..0] D0
2048x1
D[7..0]A[7..0]
256x8
A[8..0] D[3..0]
512x4
A[9..0] D[1..0]
1028x2
10
پيكر بندي به عنوان حافظه
• Can be used independently
• Can be combined for a larger memory
A[8..0] D[3..0]512x4
A[8..0] D[3..0]
512x4
D[7..0]A[8..0]
11
Altera Cyclone III Architecture
12
Cyclone III
13
پيكر بندي به عنوان تابع منطقي
جذرگير به كار رود: مثل LUTمي تواند به عنوان • خروجي(.8 ورودي EAB 8)با يك
(: تأخير LE)نسبت به پياده سازي با چند مزيت•قابل پيش بيني و سرعت بيشتر.
ترکيب EABمي تواند مستقًال: استفاده شود يا چند •شوند و تابع پيچيده تري را پياده سازي کنند.
Remember:
3. What can be done about SP LBs not used in a specific application?
14
Cyclone III M9K
15
Memory Modes
• Embedded shift register mode
• ROM mode
• FIFO buffer
• Single/dual-port
16
Memory Volume in Cyclone III
17
Memory Modes
• Simple dual-port mode: Supports simultaneous read
and write operation to different locations.
• True dual-port mode: Supports any combination of
two-port operations: − two reads,
− two writes,
− one read and one write,
at two different clock frequencies.
18
Memory Block Megafunctions
• Can instantiate memory blocks by Quartus MegaWizard
• Can instantiate them in your VHDL/Verilog code. Refer to
− “RAM Megafunction User Guide,” 2007, http://www.altera.com/literature/ug/ug_ram.pdf
19
Altera Stratix II Embedded Memory
20
TriMatrix Memory Structure
21
Stratix II RAM Blocks
22
Stratix IV RAM Blocks
23
Embedded Memory کاربردهايام B: )يا هر تابع رياضي پيچيده: ريشة 4x4ضرب کننده •
(Aعدد
و چند 4x4براي ضرب کننده هاي بزرگتر، از چند ضرب کننده ي •جمع کننده استفاده مي کنيم.
24
Embedded Memory کاربردهاي
و سيستمهاي کنترلي(:DSPضرب کننده ي ثابت )در • خواهد بود.EABمقدار ثابت تعيين کننده ي الگوي محتويات •
اگر مقدار ثابت در حين اجرا تغيير کند مي توان الگوي جديد • لود کرد. EABرا در
دقت ضرب کننده را مي توان با تنظيم تعداد بيتهاي خروجي •تنظيم کرد )براي صرفه جويي(
25
Embedded Memory کاربردهاي
•FSM( هاي با تغيير حالتtransition:هاي پيچيده)
•FSM( عمومي general purpose:)
26
Memory کاربردهاي
27
Embedded Memory کاربردهاي
:Transcendentalتوابع •سينوس، ...، لگاريتم، ... که محاسبه شان با الگوريتم و پياده •
سازي سخت افزاريشان مشکل است.
آرگومان تابع: ورودي خطوط آدرس.•
نتيجه: روي خروجي داده.•
28
Embedded Memory کاربردهاي
مبدل کدهاي بزرگ:•
بيتي به عدد 8مبدل کد عدد • بيتي10
29
Xilinx Virtex II Pro
(Digital Clock Manager)
30
Xilinx Virtex II Pro
31
Xilinx Virtex 4
32
Virtex 5
Computation-Oriented Tiles
34
Virtex Family
35
18*18ضرب كننده هاي
DSPبراي كارهاي محاسباتي و •
37
18*18ضرب كننده هاي
• In Virtex 5:• DSP48E slices
- 25 x 18, two’s complement multiplication- One adder, one subtracter and an accumulator
38
Multipliers in Altera Cyclone III
39
Embedded Multipliers
40
Embedded Multipliers
• Can configure each embedded multiplier as one 18 × 18 or two 9 × 9.
• For > 18 × 18, the Quartus II software cascades.• No restriction on the data width
but the greater the data width, the slower the multiplication process.
• Can also implement soft multipliers using Cyclone III M9K memory blocks. Increase the number of multipliers.
41
Number of Multipliers
42
Multiplier Block Architecture
43
9-Bit Mode
44
Multiplier Megafunctions
• For instantiating multipliers, refer to: Quartus User Guide, Synthesis,
http://www.altera.com/literature/hb/qts/qts_qii5v1_03.pdf
45
Stratix II DSP Blocks
46
Stratix II DSP Blocks
47
Stratix II DSP Blocks
48
Stratix II DSP Blocks
49
50
Stratix Architecture
51
Ratio-Based Architectures
• If multipliers not needed by an application, the multipliers provide little benefit. One way: multiple sub-families within a device family with different
ratios of soft logic to hard-logic. Designer can select the device with the most appropriate ratio
− minimize “wasted” area− FPGA vendor must support a larger number of devices
223 449 275 373soft/hard ratio
52
Ratio-Based Architectures
• Virtex 4/Virtex 5 sub-families:1. LX: focus on soft logic and memory 2. SX: focus on arithmetic computational units3. FX: with a processor and high-speed serial interface focus
• Virtex 6:
1. LXT: High-performance logic with advanced serial connectivity2. SXT: Highest signal processing capability with advanced serial
connectivity3. HXT: Highest bandwidth serial connectivity
53
Xilinx Virtex 4
54
Virtex 5
Embedded Processors
56
System-Level Design
Until recently, CPU and its peripheral: as discrete chips.
• Two Scenarios:
Memory Connected to CPU via general-purpose processor bus
Tightly-coupled memory (TCM) connected to processor via dedicated bus
57
Embedded System Design
Dedicated chips for CPU and peripherals − High area cost,− Low reliability.
For relatively small amount of memory, integrated memory in FPGA is used.
58
Challenges
• Challenges:Decision on hardware/software partitioning.Design environment must support
hardware/software co-verification.
59
SoPC
• SoC: A chip that integrates the major functional elements of
a complete end product.
• Complex FPGAs : CPU Memory Arithmetic units (multipliers, …) Peripheral modules Logic
Whole system on a chip (SoPC)
60
Microprocessor Cores
• Two types:Hard Core
− Implemented as hardwired component− E.g. PowerPC in Xilinx− E.g. Arm in Altera− E.g. MIPS in QuickLogic
Soft Core− Configure logic blocks to act as
microprocessor(s)− E.g. MicroBlaze in Xilinx− E.g. NiosII in Altera− E.g. Q90C1 in QuickLogic
61
Hard Microprocessor Cores• Two Scenarios:
1. Locate it in a strip to the side of FPGA fabric.
Easier for tools because the main FPGA fabric is identical for devices with or without hard code
FPGA vendor can embed a lot of additional functions in the strip to complement the micro.
Altera: ARM in Excalibur
62
Hard Microprocessor Cores• Two Scenarios:
2. Embed core(s) directly into the main FPGA fabric Design tools must consider presence of these blocks in the fabric. Memory used by the core from embedded RAM blocks Speed advantages by proximity to the main FPGA fabric. Xilinx: PowerPC in Virtex II-Pro, Virtex 4, and Virtex 5.
63
Hard Microprocessor Cores2. (cont.) Embed core(s) directly into the main FPGA fabric
No dedicated processor bus or peripheral bus. These buses must be implemented using FPGA logic.
Advantage: flexibility to define the architecture of the embedded system.
Disadvantage: the processor cannot perform useful work without configuring the FPGA logic
64
Soft Processor Core
• Disadvantages: Generally slower Larger
• Advantage: can often be customized to exactly suit the needs
of the application − Gains back some of the lost performance and
area efficiency.
65
Soft Microprocessor Cores
• Firm or Soft: Soft: if in the form of RTL netlist that will be
synthesized, Firm: if placed and routed.
• Peripherals in soft or firm form: E.g. Memory controllers, interrupt controllers,
communication functions, timer counters. Refer to library of FPGA vendor.
• Xilinx MicroBlaze: 32-bit microprocessor (~1000 logic cells) PicoBlaze: 8-bit microprocessor (~150 logic cells)
• Altera: NiosII: 32-bits
66
References
• [Xilinx] www.xilinx.com
• [Altera] www.altera.com