28
Mattausch, CMOS Design, H19/7/6 1 Memory Circuits (Part 2) Memory Access Bandwidth – Access-Bandwidth Definition – Possibilities for Increased Access Bandwidth – Necessity for Increased Access Bandwidth – Multi-Ported Memories • Memory with Best-Match Content-Based Access – Widely Used Best-Match Distance-Measures – Conventional Best-Match Content-Addressable-Memory (CAM) Architectures – New Architectures: Time-Domain and Mixed Analog-Digital CMOS Logic Circuit Design http://www.rcns.hiroshima-u.ac.jp Link(リンク): センター教官講義ノート の下 CMOS論理回路設計

CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 1

Memory Circuits (Part 2)

• Memory Access Bandwidth – Access-Bandwidth Definition– Possibilities for Increased Access Bandwidth

– Necessity for Increased Access Bandwidth– Multi-Ported Memories

• Memory with Best-Match Content-Based Access– Widely Used Best-Match Distance-Measures

– Conventional Best-Match Content-Addressable-Memory (CAM) Architectures– New Architectures: Time-Domain and Mixed Analog-Digital

CMOS Logic Circuit Designhttp://www.rcns.hiroshima-u.ac.jp

Link(リンク): センター教官講義ノート の下 CMOS論理回路設計

Page 2: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 2

Access-Bandwidth Definition for Memories

Definition of Access Bandwidth: Maximum number of bits per secondwhich can be written into the memory orwhich can be read from the memory.

Units of Access Bandwidth: bit/sec

Random Access Bandwidth: Number of independent words (of W bit each), which can be accessed (read orwrite) per sec. Unit is word/sec.

Page 3: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 3

Factors Determining the Access Bandwidth

Access Bandwidth = (Time-per-Access)-1∗ (Bit-per-Access or Wordlength)∗ (Number-of-Access-Ports)

= TA-1∗ W∗ NP

Memory

W, TA W, TA W, TA

NP

Memory-access bandwidth is determined by three factors: access time (TA), wordlength (W) and port number (NP).

Page 4: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 4

Memory-Port Number and Achievable Bandwidth

A random access bandwidth beyond 100 Gbit/sec relies strongly on the availability of memories with multiple ports.

Port Number (N)

109

1010

1011

1012

10 20 30 40 50 60

500MHz Clock / 64Bit Wordlength 50MHz Clock / 32Bit Wordlength

Ban

dwid

th (

Bit/

s) Practical Limit for 1-Port Memories

Insights:

a) Exploitation of pipelining, interleaving and larger wordlength cannot improve the bandwidth of 1-port memories above 50-100 Gb/sec.

b) Multiport memories are able to extend the limits of system bandwidth by orders of magnitude.

c) Tb/s bandwidth can be realized with 32 port memories.

d) High bandwidth is possible even at low clock frequencies.

Page 5: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 5

Memory-Access Bandwidth- Access-Bandwidth Definition- Possibilities for Increased Access

Bandwidth- Necessity for Increased Access Bandwidth

• In Computers• In Networks

- Multi-Ported Memory Architectures

Page 6: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 6

Typical Single-Processor Architecture

Today an important limit for microprocessor performance is the access bandwidth of the various memory components.

Main Memory

Storage Unit

(Cache, TLB, Virtual Address

Translator, Write Through Buffers, etc.)

Execution Unit (Multiple ALUs, Register

File, Multiple Issue Logic)

Instruction Unit (Fetch Unit, Decode Logic,

Program Counter)

Data exchange

CPU

Data exchange

Data exchange

TLB = Translation Lookaside Buffer (CAM for recent Virtual Address Translations)

ALU = Arithmetic Logic Unit (Actually performs the processor operations)

Page 7: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 7

ALU and Registerfile

Advanced microprocessors need registerfiles (fast SRAMs) with high random-access bandwidth i.e. large port numbers.

1-Port Register File

ALU

REG

ALU

3-Port Register File

3n-Port Register File

ALU 1 ALU 2 ALU n

ConventionalMicroprocessor

Architecture

RISCMicroprocessor

Architecture

n-Issue Super-ScalarMicroprocessor

Architecture(Pentium 4 has n=4)

Page 8: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 8

Multiprocessor Architectures

Advanced shared-memory multiprocessors need main memories with high random-access bandwidth.

Shared-MemoryArchitecture

1-port memory

1-port memory

1-port memory

switching network

proc-essor

proc-essor

proc-essor

multiport main memory !!

1-port memory

proc-essor

1-port memory

proc-essor

1-port memory

proc-essor

Message-PassingArchitecture

Page 9: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 9

Structure of the Worldwide Internet

An internet exchange has to handle many random-data streams simultaneously.

Network Exchange

Network Exchange

Network Exchange

Network Exchange

Page 10: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 10

Operation Principle of an ATM Internet Exchange

The random-access bandwidth of the storage part limits the performance of an internet exchange.

Structure of ATM (Asynchronous Transfer Mode) Data Packages

Destination Code (40 Bit) Data to be Transmitted (384 Bit)

ATM Network-Exchange OperationW

rite

to

nex

t fr

ee

stor

age

loca

tion

Storage Part Computing Part

ATM-Package 1ATM-Package 2ATM-Package 3

ATM-Package Nfreefree

Calculate Output Data Line

for next ATM-Package

from Destination Code S

end

to d

ata

line

of

des

tinat

ion

Data-Line 1 (in)

Data-Line 2 (in)

Data-Line M (in)

Data-Line 1 (out)

Data-Line 2 (out)

Data-Line M (out)

Page 11: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 11

Memory-Access Bandwidth- Access-Bandwidth Definition- Possibilities for Increased Access

Bandwidth- Necessity for Increased Access Bandwidth- Multi-Ported Memory Architectures

• Multi-Port Memory Cell• Switching-Network Multi-Bank Architecture• Hierarchical Multi-Bank Architecture

Page 12: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 12

Port Implementation in the Memory Cell

Transistor and interconnection-line numbers in an N-port SRAM cell increase linearly with N.

1-portSRAM cell

B1 B1

W1

B1 B1 B2 BN B2 BN

W1

W2

WN

N-portSRAM cell

Page 13: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 13

Design Examples of SRAM and ROM Cells

The size of an N-port memory cell increases faster than linearly with port number N.

1-, 2-, 4- and 8-portSRAM cells

20µm

1 Port 2 Ports (max)2 Ports (min)

4 Ports (max)4 Ports (min)

8 Ports (max)8 Ports (min)

VSSVSS B1 B1

VDD

W1

VSS B1 B2 VSSB2

VDD

W1

W2

VSS B1 B2 VSSB2

B1

VDD

W1

W2

VSS B4 B1 VSSB1

B4

VDD

W1

W2

B2 B3 B3

B2

W3

W4

VSS B1 B4 VSSB4

B1

VDD

W1

W2

B2 B3 B3

B2

W3

W4

VSS B4 B5 VSSB5

B4

VDD

W1

W2

B1 B8 B6

B1

W3

W4

W5

W6

W7

W8

B8

B2

B3

B7B6 B2 B3 B7

VSS B6 B7 VSSB7

B6

VDD

W1

W2

B3 B1 B5

B3

W3

W4

W5

W6

W7

W8

B1

B8

B4

B2B5 B8 B4 B2

B2

10µm

1 Port 2 Ports

4 Ports

8 Ports

VSS

W1

B1

VSS

W1

W2

B1 B2

VSS

W1

W2

W3

W4

B1B2 B3 B4

W1

W2

W3

W4

W5

W6

W7

W8

VSS

B1B2 B3 B4 B5B6 B8 B7

1-, 2-, 4- and 8-portROM cells

Page 14: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 14

Analysis of Multi-Port-Cell Area Increase

Area-increase of N-port-storage cells is unacceptable for larger storage capacities.

Quadratic area increase as a function of port number !!

0

5

10

15

20

25

30

35

2 4 6 8 10

SRAMm i n

SRAMm ax

ROM

N-P

ort-

cell

Are

a no

rmal

ized

to 1

-Por

t-ce

ll ar

ea

Port Number (N)

Estimation for 32 ports:~ factor 100 increase

for SRAM ~ factor 400 increase

for ROM

Page 15: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 15

Switching-Network Multi-Bank Architecture

The switching-network architecture reduces the area problem but complexity and the conflict problem increase.

Advantage:Sub-linear area increasePort (N-1)

Port 3

Port 2

Port 1

Port (N-2)

Port N

switching network

(e.g. bus-system, crossbar switch, multistage interconnection network)

1-port memory

2

1-port memory

1

1-port memory

M

1-port memory

M-1

Disadvantages:• Access-conflict probabilitybecomes higher

• Complexity of the switchingnetwork increases fast with the bank number M

Page 16: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 16

Hierarchical Multi-Bank Architecture

The hierarchical multi-bank architecture has a regular/modular structure and solves the complexity problem.

Row

-Sel

ecto

r (N

Por

ts)

Conflict Resolver

Column-Selector (N Ports)

Y-D

ecod

er

(1 P

ort)

X-Decoder (1 Port)

1-to-N-Port Transition

Hierarchy Level 1

Cell

Cell Cell

Cell

1 PortHierarchy Level 2

1-Port Block

N Ports

Data Level 2

Data Level 1

Addresses Level 1

Addresses Level 2

Port Transition Port Transition

Port TransitionPort Transition

1-Port Block

1-Port Block

1-Port Block

Page 17: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 17

Multi-Port-Cell and Multi-Bank Architecture Area

The hierarchical multi-bank architecture enables enormous area-reductions of the multi-port memory.

0.01

0.1

1

102 103 104 105

4-Port Design 4-Port Estimate 8-Port Estimate16-Port Estimate32-Port Estimate

Are

a-R

educ

tion

Fac

tor

for

Mul

tipor

t S

RA

Ms

Storage-Capacity K on first Hierarchy Level

M ultiport-C ell A rchitecture

R eduction

to

1/20

8-Port

D esign

16-Port

D esign

Page 18: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 18

Recent Article about Memory Research at RCNS

From Nikkei Microdevices published in April 2004STARC(半導体理工学研究センター)は日本の半導体企業(NEC、 東芝、日立、ソニー、他7社)による産学連携のための組織。

Page 19: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 19

Memory with Best-Match Content-Based Access

- Widely Used Best-Match DistanceMeasures

- Conventional Best-Match Content-Addressable-Memory (CAM) Architectures

- New Architectures: Time Domain andMixed Analog-Digital

Page 20: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 20

Widely Used Best-Match Distance Measures

The Hamming distance is implemented with simple EXOR gates, while the Manhattan Distance needs subtractors.

Di = IN j− REFijj =1

W

Di is call Hamming distance, if INj and REFij are 1-bit binaries.

DH, i = IN j ⊗ REFijj=1

W

DM, i = {(IN j− REFij ) ⋅signj=1

W

∑ (Inj− REFij )}

Di is call Manhattan distance, if INj and REFij are n-bit binaries with n>1.

Page 21: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 21

Reference-Pattern R

Reference-Pattern 1

Reference-Pattern 2

Reference-Pattern 3

Word-Serial Best-Match CAM

The word-serial CAM determines the best-match pattern (winner) in R clock cycles, which is insufficient if R is large.

Input Pattern

DistanceCalculator

Current ReferencePattern

DistanceComparator

Dc<Dwin

CurrentDistance Dc

Transfer Circuit

Current WinnerPattern

Current WinnerDistance Dwin

yes

Reference Pattern Memory

Sequential Best-Match Calculation(Winner-take-all Circuit)

En

Page 22: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 22

Partially Parallel (p-fold parallel) Best-Match CAM

The partially word-parallel CAM reduces the search time for the best-match pattern (winner) to R/p clock cycles, but

becomes soon very complex.

R/pReference

Pattern MemoryInput Pattern

Sequential Best-Match Circuit

Partial Winner

R/pReference

Pattern Memory

Sequential Best-Match Circuit

Partial Winner

R/pReference

Pattern Memory

Sequential Best-Match Circuit

Partial Winner

Global Sequential Best-Match Circuit

Global Winner

Page 23: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 23

Fully-Parallel Best-Match CAM

Parallel digital distance calculation requires too much hardware, so that other solutions are required.

Input Data IN={IN1, IN2,…,INW}

Reference Data 1 REF1={REF11, REF12,…, REF1W}

Distance Calculation D1(IN, REF1)Reference Data 2

REF2={REF21, REF22,…, REF2W}Distance Calculation D2(IN, REF2)

Reference Data R REFR={REFR1, REFR2,…, REFRW}

Distance Calculation DR(IN, REFR)

Best Match

Calculation

MIN(D1,…,DR)

Winner ofContent Matching

Page 24: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 24

Memory with Best-Match Content-Based Access

- Widely Used Best-Match DistanceMeasures

- Conventional Best-Match Content-Addressable-Memory (CAM) Architectures

- New Architectures: Time Domain andMixed Analog-Digital

Page 25: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 25

Time-Domain Conversion for Best-Match Search

The search signal stops for one clock cycle at non-matching reference bits. The distance is the latency of clock-cycles.

SearchSignal

In

Clock

ReferenceBit 1

EXOR

IN1 IN1

ReferenceBit 2

EXOR

IN2 IN2

ReferenceBit W

EXOR

INW INW

Distance isthe Number ofClock Cycles until Out=1

Out

Time-DomainWord Comparator

Edge-TriggeredD-Flip-Flop

Page 26: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 26

Digital-Analog Conversion for Best-Match Search

Word comparators generate analog signals and differences between best-match word and other words are amplified.

SC11

Search Word (W bit)

C1(Comp.Signal)

F (Feed-back)

WinnerTakeAll

Circuit(WTA)

O(R)

En

(Enable)

R-R

ow

Dec

od

e

W-Column Decode and Read/Write

Mat

ch S

ign

als

(Bit Store)BC11

(Bit Comp.)

SC1W

BC1W

SCR1

BCR1

SCRW

BCRW

LA1

CR

F

LAR

M1

MR

WC1 (Word Comp.)

WCR

WinnerLine-Up

Amplifier(WLA)

O(R)

Page 27: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 27

Line-Up-Regulation Principle

The best-match (winner) output is aligned to the maximum amplification region of the amplifier for all search cases.

OutputVoltage

(LAi)

Input Voltage (Ci)

IncreasingWinner-Input

Distance

AmplifierCharacteristic

Winner

Nearest LoserOther Losers

DistanceAmplification

Page 28: CMOS Logic Circuit Design - Hiroshima University...Mattausch, CMOS Design, H19/7/6 2 Access-Bandwidth Definition for Memories Definition of Access Bandwidth: Maximum number of bits

Mattausch, CMOS Design, H19/7/6 28

Best-Match CAM CMOS-Chip Design(RCNS: Research Center for Nanodevices and Systems)

矢野祐二 (M2) の研究成果。

(システムLSIを実現するためのハード設計資産およびソフト設計資産を対象とす る、主要半導体メーカー10社等からの賞。)

Minimum Distance Search Time : < 240nsPower Dissipation : <260mW at 10MHz(34.7mW/mm2)Processing Performance : 170GOPS (Giga Operations Per Second)(20GOPS/mm2)Application: Real-Time Motion-Picture Compression