24
SCRATCHPAD MEMORIES: A DESIGN ALTERNATIVE FOR CACHE ON-CHIP MEMORY IN EMBEDDED SYSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

Embed Size (px)

Citation preview

Page 1: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

SCRATCHPAD MEMORIES: A DESIGN ALTERNATIVE FOR CACHE ON-CHIP MEMORY IN EMBEDDED SYSTEMS

- Nalini Kumar

Gaurav Chitroda

Komal Kasat

Page 2: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

2

Spring 2010, EE

L 6935, Em

bedded Systems

OUTLINE

Introduction Scratch pad memory Cache memory Proposed methodology Results Conclusions

04/09/2010

Page 3: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

3

Spring 2010, EE

L 6935, Em

bedded Systems

INTRODUCTION Scratch pad memory Cache memory Proposed methodology Results Conclusions

04/09/2010

Page 4: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

4

Spring 2010, EE

L 6935, Em

bedded Systems

INTRODUCTION Scratch pad memory:

A high speed internal memory used for temporary storage of calculations, data and other work in progress.

It is next closest memory to the ALU after the internal registers.

Scratch pad based systems have NUMA(Non-Uniform Memory Access) latencies, and use explicit instructions to move data. DMA based data transfer is often used.

On chip caches using SRAM consume power in the range of 25% to 45% of the total chip power

Current embedded processors for multimedia applications have on-chip scratch pad memories

04/09/2010

Page 5: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

5

Spring 2010, EE

L 6935, Em

bedded Systems

INTRODUCTION

Scratchpad vs. Cache: A scratchpad doesn’t contain a copy of data that is stored

in the main memory. Scratchpad memory is directly manipulated by

applications. In cache memory systems mapping of program elements

is done during runtime, in scratch pad memory systems it is done either by the user or by the compiler using a suitable algorithm

Prior studies on scratch pad memories do not address the impact on area

04/09/2010

Page 6: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

6

Spring 2010, EE

L 6935, Em

bedded Systems

CONTRIBUTIONS

The paper proposes scratchpad memory as an alternative to cache memory as on-chip memory for computationally intensive applications.

CACTI tool is used for computing area and energy for AT91M40400 target architecture.

The results establish scratchpad memory as a low power alternative in most situations with an average energy reduction of 40%

04/09/2010

Page 7: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

7

Spring 2010, EE

L 6935, Em

bedded Systems

Introduction SCRATCH PAD MEMORY Cache memory Proposed methodology Results Conclusions

04/09/2010

Page 8: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

Spring 2010, EE

L 6935, Em

bedded Systems

8

SCRATCH PAD MEMORY 04/09/2010

Memory array with the decoding and the column circuitry logic

Memory objects are mapped to the scratch pad in the last stage of the compiler

It occupies one distant part of the memory address space. No need to check for data/instr. availability in the scratch pad

Reduces the comparator and the signal miss/hit acknowledging circuitry

Figure: Scratch Memory Array

6 Transistor Static RAM

Memory Array

Memory Cell

Page 9: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

9

Spring 2010, EE

L 6935, Em

bedded Systems

SCRATCH PAD MEMORY

Area of scratchpad, As

As = Asde + Asda + Asco + Aspr + Asse + Asou

Energy Consumption is estimated from the energy consumption of the components

Escratchpad = Edecoder + Ememcol

Components: Data decoder, data array area, column multiplexers, pre charge circuit, data sense amplifiers, output driver circuitry

Memory array is the major consumer of energy CACTI tool first computes the capacitances for each

unit then estimates the energy

04/09/2010

Page 10: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

10

Spring 2010, EE

L 6935, Em

bedded Systems

ESTIMATING THE ENERGY CONSUMPTION

For the memory array:Ememcol = Cmemcol * Vdd

2 * P0->1

Cmemcol is the capacitance of the memory array unit and is calculated as

Cmemcol = ncols * (Cpre + Creadwrite)

P0->1 is the probability of bit toggle, 0.5 Only two word lines are switched regardless of the

change in the address bits Total energy spent in the scratch pad memory is

Esptotal = SPaccess * E scratchpad

The only case that holds good is read or write access

04/09/2010

Page 11: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

11

Spring 2010, EE

L 6935, Em

bedded Systems

Introduction Scratch pad memory CACHE MEMORY Proposed methodology Results Conclusions

04/09/2010

Page 12: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

Spring 2010, EE

L 6935, Em

bedded Systems

12

CACHE MEMORY 04/09/2010

Area model is based on the transistor count in the circuitry

Area of the cache,Ac = Atag + Adata

where

Atag = Adt + Ata + Aco + Apr + Ase +

Acom + Amu and Adata = Ade + Ada + Acol + Apre + Asen + Aout

Figure: Cache Memory Organization

Tag Array Data Array

Page 13: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

13

Spring 2010, EE

L 6935, Em

bedded Systems

Introduction Scratch pad memory Cache memory PROPOSED METHODOLOGY Results Conclusions

04/09/2010

Page 14: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

14

Spring 2010, EE

L 6935, Em

bedded Systems

EXPERIMENTAL SETUP

Compare same size cache with scratchpad memory (the delay of cache is higher than scratchpad for the same technology)

Identification and Assignment of critical data structures to scratch pad in based on a packing algorithm

Total number of clock cycles determines the performance

Larger the number of clock cycles, lower the performance because on-chip configuration doesn’t change the clock period

04/09/2010

Page 15: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

15

Spring 2010, EE

L 6935, Em

bedded Systems

SCRATCH PAD MEMORY ACCESS

Performance estimation from the trace file. An appropriate latency is added to the overall

program delay on scratchpad access: one for scratch pad read/write access, one cycle and one wait cycle for 16 bit main memory

access, one cycle plus three wait states for main memory 32 bit

access

04/09/2010

Access Number of Cycles

Cache Using Cache calculations

Scratch Pad 1 cycle

Main memory 16 bit 1 cycle + 1 wait cycle

Main memory 32 bit 1 cycle + 1 wait cycle

Page 16: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

16

Spring 2010, EE

L 6935, Em

bedded Systems

CACHE MEMORY ACCESS Authors assume a write through cache

Read Hit: Tag array is accessed. No write to cache and no access to main memory

Read Miss: One cache read operation, L (line size) words written to cache. One main memory read event of size L and no main memory write

Write Hit: Cache write followed by memory write Write Miss: One cache tag read and main memory write. No

cache update.

04/09/2010

Access type

Caread Cawrite Mmread Mmwrite

Read hit 1 0 0 0

Read miss 1 L L 0

Write hit 0 1 0 1

Write miss

1 0 0 1

Page 17: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

17

Spring 2010, EE

L 6935, Em

bedded Systems

C Benchmark

Mapping Algorithm

CACTI

Cache/Scratch Pad Size

Cache Number of

Cycles

Scratchpad Number of

cyclesTrace Analysis

Energy Aware Compiler

ARMulator trace analysis

FLOW DIAGRAM04/09/2010

Analytical model

Energy Estimates

Area Estimates

Compiler Support

Page 18: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

18

Spring 2010, EE

L 6935, Em

bedded Systems

EXPERIMENTAL SETUP Target architecture:

AT91M40400, based on embedded ARM 7TDMI embedded processor

High performance RSIC processor with a very low power consumption

On-chip scratch memory of 4KB. 32 bit data path and two instruction sets.

encc – energy aware complier, uses a special packing algorithm- knapsack algorithm for assigning code and data blocks to the scratch pad memory

The binary output of the compiler is simulated on the ARMulator to produce a trace file.

ARMulator accepts the cache size as a parameter for on-chip cache configuration and generates the performance as number of cycles.

The area and performance estimates are made for the 0.5um technology

04/09/2010

Page 19: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

19

Spring 2010, EE

L 6935, Em

bedded Systems

Introduction Scratch pad memory Cache memory Proposed methodology RESULTS Conclusions

04/09/2010

Page 20: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

20

Spring 2010, EE

L 6935, Em

bedded Systems

RESULTS 04/09/2010

Cache per access(2kB) 4.57 nJ

Scratch pad per access(2kB) 1.53 nJ

Main memory read access, 2 bytes 24.00 nJ

Main memory read access, 4 bytes 49.30 nJ

Main memory write access, 4 bytes 41.10 nJ

Size Bytes Area Cache

Area Scratchpad

CPU cycles Cache

CPU cycles, Scratchpad

Area reduction

Time reduction

Area-time product

64 6744 4032 481.9 347.5 0.40 0.28 0.44

128 11238 7104 302.4 239.9 0.37 0.21 0.51

256 21586 14306 264.0 237.9 0.34 0.10 0.55

512 38630 26722 242.6 237.9 0.31 0.10 0.61

1024 74680 53444 241.7 192.0 0.28 0.21 0.55

2048 142224 102852 241.5 192.0 0.28 0.20 0.57

Average 0.33 0.18 0.54

Table: Energy per access of various devices

Table: Area/Performance ratios for bubble-sort

The average area, time and AT product reductions are 34% 18% and 46%

Page 21: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

Spring 2010, EE

L 6935, Em

bedded Systems

21

RESULTS 04/09/2010

Figure: Energy consumed by the memory system

Figure: Comparison of cache and scratch pad memory area

Page 22: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

22

Spring 2010, EE

L 6935, Em

bedded Systems

Introduction Scratch pad memory Cache memory Proposed methodology Results CONCLUSION

04/09/2010

Page 23: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

23

Spring 2010, EE

L 6935, Em

bedded Systems

CONCLUSION Presents an approach for selection of on-chip memory

configurations Results show that scratch pad based compile time

memory outperforms cache-based run-time memory on almost counts.

40% average reduction for the application considered Authors propose study of DRAM based memory

comparisons since memory bandwidth and on-chip memory capacity are limiting factors for many applications.

Also, the energy models for both cache and scratchpad need to be validated by real measurements

04/09/2010

Page 24: S CRATCHPAD M EMORIES : A D ESIGN A LTERNATIVE FOR C ACHE O N - CHIP M EMORY IN E MBEDDED S YSTEMS - Nalini Kumar Gaurav Chitroda Komal Kasat

24

Spring 2010, EE

L 6935, Em

bedded Systems

QUESTIONS

04/09/2010