10
<Insert Picture Here> RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

Embed Size (px)

Citation preview

Page 1: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

<Insert Picture Here>

RAPID Memory Compiler Evaluation

by David Artz

Oracle LabsNovember 2011

Page 2: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

2

<Insert Picture Here>

Overview of <Vendor A> SRAM/RF Compilers

• A competitive assessment against TSMC memory compilers is not available at this time as we are still waiting on legal access to said technology.• Features:

• Speed and/or high density• Aspect ratio control for floor planning• Memory operation and retention at low frequency• Low active power and leakage-only standby power• Views/models• Extra Margin Adjustment (EMA)• Soft Error Repair (SER)• Redundancy• Over-the-cell power routing• Maximum static power dissipation corner• Power gating• Pipeline register• Advanced Test Feature (ATF)

Page 3: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

3

Speed and/or High DensitySingle Port SRAM, High Speed, 0.374um2 Bit Cell, 64 Rows/Bank

8 16 32min 256 512 1024max 4096 8192 16384step 32 64 128min 4 4 4max 144 72 36step 1 1 1

Number of Bits/Word

Words

ParametersMuxing

Single Port SRAM, High Density, 0.229um2 Bit Cell, 256 Rows/Bank

8 16 32min 256 512 1024max 4096 8192 16384step 32 64 128min 4 4 4max 144 72 36step 1 1 1

Number of Bits/Word

Words

ParametersMuxing

Dual Port SRAM, High Density, 0.589um2 Bit Cell, 256 Rows/BankParameters

4 8 16min 64 128 256max 2048 4096 8192step 32 64 128min 4 4 4max 144 72 36step 1 1 1

Words

Number of Bits/Word

Muxing

Memory Solution ArchitectureMaximum Size

(Kbits)Mux Options

Single Port Speed 576 8, 16, 32Single Port Density 1152 8, 16, 32Dual Port Density 320 4, 8, 16

40 nm SRAM Memory Compilers

Memory Solution ArchitectureMaximum

Size (Kbits)Mux Options

Single Port Speed 32 2, 4, 8Single Port Density 144 2, 4, 8 Two Port Density 72 1, 2, 4

40nm Register File Memory Compilers

Page 4: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

4

Aspect Ratio Control for Floor Planning

8 16 32Note: Memories must

be rotated ± 90°for

poly orientation rule

Page 5: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

5

Memory Operation and Retention at Low Frequency

Before going into retention mode, the memory needs to be in standby mode by setting CEN =1.

The CLK pin must be held low before a high or low transition of RET1N, in accordance with the timing arcs specified in the Liberty model. Once this is accomplished, set RET1N=0. The power is still supplied to the memory core and the periphery.

The word lines are clamped low. VDDPE can now be shut down and VDDCE may be varied within the limits required in Power gating compiler manual.

Page 6: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

6

Views/Models

Standard Views Output FilesVerilog model <instance_name>.vSER Verilog <instance_name>_rtl.va

<instance_name>.vclef<instance_name>_ant.lef<instance_name>_ant.clf

GDSII layout file <instance_name>.gds2Synopsys (Liberty) for each corner <instance_name>_<ccs/nldm/ecsm>_corner_syn.libb,c,d

LVS netlist <instance_name>.cdlSynopsys (Liberty) TetraMAX <instance_name>.tvVoltageStorm N/AHercules N/Ae

CeltIC enablement N/Af

ASCII datatable <instance_name>_corner.datPostscript <instance_name>.ps

VCLEF footprint

Datasheets

a) The word-write mask option should be set to off when SER is selected as either 1db1bc or 2bd1bc.

b) You can create timing models using any of the process-voltage-temperature (PVT) corners for which the memory compiler was characterized. The compiler may support more than four characterization corners, however, you can create timing models for only four corners at a time. The characterization corner name (slow, fast, fast@-40C, fast@125C, typical) is inserted into the output filename (for example, sram_sp_<corner>_syn.lib).

c) Synopsys models are generated with maximum alternate current (AC) values for each supported corner. Depending on chip design, overall chip level worst case power conditions can occur under the fast corner (PVT conditions) or under the “maximumstatic power” corner condition. The worst case static power occurs under the maximum temperature, fast process, and maximum VDD. The static power corner models both AC and static power under this condition. You may need to perform chip level power analysis under both the fast and static power corners to determine the maximum overall power dissipation, AC plusstatic, for your design.

d) Includes CCS timing , noise, and power data.

e) Hercules LVS/DRC is supported in the case of available decks.

f) CeltIC views or files are not provided by the compiler.

Page 7: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

7

Extra Margin Adjustment (EMA)

• Extra margin adjustment pins provide the option of adding delays into internal timing pulses. There are three sets of EMA pins: EMA[2:0], EMAS, EMAW[1:0].

• Use of the EMA[2:0] pins provides extra time for memory read and write operations by slowing down the memory access. There are three input pins, named EMA[2], EMA[1], EMA[0], for each instance. The access time and cycle times are progressively increased as the pins are driven from 000 to 111 respectively. The EMA[2:0] pins are always visible. Margin sequentially increases as EMA sequentially increments from 000 through 111. Setting 000 is the fastest setting and 111 is the slowest setting. Minimum EMA setting for given operating range is documented in the model .lib file.

• When enabled, the EMAS pin extends the pulse width of the sense-amp enable signal. The default setting is low but when driven high the pulse is extended. The setting on this pin does not affect the access time, but it will affect cycle time in the read cycle.

• When enabled, the EMAW[1:0] pins add delay for write cycles. They do not affect the access and cycle time during read operation (GWEN=1). The write access and cycle time is the sum of EMA[2:0] and EMAW[1:0].

Page 8: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

8

Over The Cell Power Routing

memory periphery memory core memory periphery

chip level m5 power routing

over the cell m4 routing grid

VDD for core (VDDCE)

VDD for periphery (VDDPE)

VSS

m5

m4

Via 4-5

VSS

You must route chip-level ground (VSS) and VDD to the memory instance and drop vias down to the m4 straps.

In order to maintain power density for each strap, use multiple top-level grid connections with a maximum spacing of 15um and a minimum width to provide coverage for a via array count of three.

The top supply metal in SRAM compilers is m4. To meet instance IR drop requirements, m5 straps at least 0.21um wide for VDDPE, VSSE, and VDDCE must be located over the instance, perpendicular to the instance m4 supply strap direction, and within 10um of the instance edge.

In addition, a pattern of VDDPE, VSSE, and VDDCE m5 straps, each at least 0.21um wide, must be repeated across the instance at 15um intervals. Each intersection of instance supply m4 and overlapping, perpendicular supply strap m5 should be maximally contacted.

Page 9: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

9

Power Gating

Mode VDDCE VDDPE PGEN RET1N RET2N RETP Operation Outputs DescriptionB1 *VDDCE <= *VDDCE 0 X X X Normal Q (normal outputs) This is normal operational mode.

B2 *VDDCE <= *VDDCE 1 0 X X Retention 1 Clamped to VDDPE

Here the row lines are clamped low and all outputs are clamped to VDDPE. The active power gates in the core are switched on. You directly control the core retention voltage through VDDCE. The periphery power gates are off. This shuts down the periphery power. You vary VDDCE, but VDDCE must be greater than or equal to VDDPE at all times.

B3 *VDDCE <= *VDDCE 1 1 0 0 Retention 2a Clamped to VDDPE

Here the row lines are clamped low and all outputs are clamped to VDDPE. The active power gates in the core are switched off and the HVT retention devices are on. The retention voltage of the core is controlled by the diode-connected HVT pMOS devices. The periphery power gates are off. This shuts down the periphery power. Both VDDCE and VDDPE should remain powered up.

B4 *VDDCE <= *VDDCE 1 1 0 1 Retention 2b Clamped to VDDPE

Here the row lines are clamped low and all outputs are clamped to VDDPE. The active power gates in the core are switched off and the SVT retention devices are on. The retention voltage of the core is controlled by the diode-connected SVT pMOS devices. The periphery power gates are off which shuts down the periphery power. Both VDDCE and VDDPE should remain powered up.

B5 *VDDCE <= *VDDCE 1 1 1 X Power Down Clamped to VDDPE

Here both the core array and the periphery are powered down. This shuts down the memory and all state information is lost. All outputs are clamped to VDDPE. Both VDDCE and VDDPE should remain powered up.

Periphery

Core

Memory Macro

VDDCE

VSSE

VDDPE

VSSE

Net Name Overview Details

VDDCEExternal core voltage domain

This external core voltage supply is the supply seen by the off-chip. It connects to all core domain N-Wells. It also connects to all core domain pMOS power gate sources.

vddc,vddc<#:#>

Internal core voltage domain

This internal core voltage domain connects to all core domain pMOS sources unless the source is never connected to a virtual power rail.

VDDPEExternal periphery voltage domain

This external periphery voltage supply is the supply seen by the off-chip. It connects to all periphery N-Wells. It also connects to the periphery domain pMOS power gate sources for the rowline drivers.

vddpInternal periphery voltage domain

This internal periphery voltage supply connects to the periphery pMOS sources unless the gate is never connected to a virtual power rail. This means in only the row driver NOR gates will use vddp.

VSSEExternal ground domain

This external ground supply is the supply seen by the off-chip. It connects to all P-Wells. It also connects to all nMOS sources in the core domain as well as nMOS sources in the periphery domain which are never connected to a virtual ground rail.

vssInternal ground domain

This internal ground domain will connect to all periphery domain nMOS sources unless the source is never connected to a virtual ground rail.

Page 10: RAPID Memory Compiler Evaluation by David Artz Oracle Labs November 2011

10

Power Gating (cont).

0.01

0.10

1.00

10.00

100.00

Curr

ent (

mA

)

Modes

64Kbit SRAM w/Power Gating

Periphery

Core

0.01

0.10

1.00

10.00

100.00

Curr

ent (

mA

)

Modes

64Kbit SRAM w/No Power Gating

Periphery

Core

Note: the additional

power gating modes

come at an extreme

cost in performance for

all power saving states

(i.e., ~ 200 clock cycles

to negotiate retention

modes)

B1 B2 B3 B4 B5

Core Peri Total Core Peri TotalPeak 3.558 51.914 55.473 3.665 52.972 56.637

CEN=1, 50% Activity 0.203 1.943 2.146 0.202 1.9432.146

CEN=1, 0% Activity 0.203 0.091 0.294 0.204 0.095 0.299B2 Retention 0.203 0.006 0.208 0.201 0.038 0.239B3 Retention 2A #N/A #N/A #N/A 0.063 0.038 0.101B4 Retention 2B #N/A #N/A #N/A 0.074 0.038 0.112

Transition #N/A #N/A #N/A 0.201 0.038 0.239B5 Power-Down #N/A #N/A #N/A 0.031 0.038 0.069

Mode ConditionBuilt w/Power Gating Option

Off On

B1

B1 B2 B3 B4 B5

Note: We can

implement our own

power down via header

switches