Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and...

Preview:

Citation preview

Reconfigurable Computing Using

Content Addressable Memory (CAM)

for Improved Performance and Resource Usage

Group Members:

Anderson Raid

Marie Beltrao

Raphael Christian

Outline

• Introduction – Literature Review

– Coarse Grain

– CAM

• Objectives of the paper

• CAM based computing scheme

• MCB – hardware

• Multi-MCB Communication

• Application Mapping Process

• Estimation of Cycle time and performance

• Design and Organization of a Ternary CAM (T CAM)

• Hybrid CAM-LUT

• MCB - Delay and power components

• SIMULATION RESULTS

Introduction – Literature Review

• Traditional FPGA> Significant design overhead and poor scalability with process technology

> LUT

> 80% or more of power >> programmable interconnects

• Multi-cycle Memory Based> Reduction in memory requirement

> Little or no degradation in performance

> CAM

Introduction – Literature Review

»Fine grain x coarse grain

• Fine control over bit-width• Bit-level operations• CAD tools Available• Flexible• Speed, Power Consumption• Time to Configure

Less Routing.• Better Instruction Density.• Better cycle times.• Small configuration sizes.• Little CAD support• Less flexible!

Why coarse grain?

• In order to achieve improvement in both performance and reliability of operation

• Significantly reduce the configuration memory and time

• Improve routing overhead and poor routability• Improve area and delay by minimizing the

contribution of the programmable interconnects.

Spacial Computing + Multi-cycled Computing(LUT) trade off (CAM)

Introduction – Literature Review

What is CAM?

• “Content Addressable Memory”

• word length ranging from 36 to 144 bits• address space from 7 to 15 bits• access times as low as 0.25ns

Embedded System Block (ESB) of the APEX20K from Altera Corporation incorporates such an embedded memory!!

But cannot exploit the optimization obtained by consideration of don’t care

terms

Introduction – Literature Review

Objective of the paper

Implement “(…) a multi-cycle Memory Based Computational methodology that utilizes Content Addressable Memory (CAM) as the

underlying reconfigurable fabric”

•Implement a large application efficiently

•Proposes a CAM-based implementation of reconfigurable computing.

•Discusses the circuit implementation and develops a scalable hardware framework that allows mapping of a large design to multiple computational units.

•Proposes a hybrid LUT-CAM based function representation that can further optimize the memory requirement by selectively storing some partitions in CAM, while the others in LUT.

CAM based computing scheme

Storages functional responses

MCB – hardwareMemory-based Computational Block

• Store and evaluate up to 128 partitions, 32 in each bank, with each partition having 12 inputs and outputs.

Multi-MCB Communication

Functional block diagram for memory based computing

Multi-MCB Communication

• A MCB node alone has limited memory resource = scalability restrictions for larger applications

• Multi-MCB communication tend to minimize interconnect overhead

• Hierarchical interconnect architecture

Application Mapping Process

• Partitioninga) Greedy heuristic-based portioning approach = multi input-output logic

blocks

b) It’s an optimization problem = evaluation time as objective and memory requirement as constrain

Application Mapping Process

• Partitioning

Application Mapping Process

• Scheduling

a) Multi-cycle evaluation at each MCB = heuristic-based algorithm for scheduling the execution of the partitions

b) Static Scheduling

c) Minimize the number of evaluation cycles

Estimation of Cycle time and performance

• Simulations were carried out using 70nm technology model

• It estimated cycle time for a LUT based MCB framework

• Improvement of 56.3% in processing time

• Cost of 23.6% increase in the energy/vector

Estimation of Cycle time and performance

Estimation of Cycle time and performance

• The Performance improvement offered by the proposed framework was also validated for two algorithm-specific applications:

DCT: Discrete Cosine Transform

FIR: Finite Impulse Response

Design and Organization of a Ternary CAM (T CAM)

• Allows pattern matching with the use of “don’t cares.”

• Attractive for implementing longest-prefix-match searches in routing tables

Hybrid CAM-LUT

• The proposed framework contain both PLA and LUT based representation and is advantageous for memory-efficient realization of all classes of function (hybrid CAM/LUT-based).

• A hybrid approach can potentially improve the total memory requirement.

 MCB - Delay and power

components

SIMULATION RESULTS

SIMULATION RESULTS

Questions?

Thank you!

Recommended