21
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto Macii Enrico Macii Massimo Poncino Proceedings of the Design ,Automation and Test in Europe Conference and Exhibition Presenter : Hung Yu Chen

Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

Improving the Efficiency of Memory Partitioning by Address Clustering

Alberto Macii Enrico Macii Massimo Poncino

Proceedings of the Design ,Automation and Test in Europe Conference and Exhibition

Presenter : Hung Yu Chen

Page 2: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 2/21

Abstract

Memory partitioning is a effective approach to memory energy optimization in embedded systems. Spatial locality of the memory address profile is the key property that partitioning exploits to determine an efficient multi-bank memory architecture.This paper presents an approach, called address clustering, for increasing the locality of given memory access profile, and thus improving the efficiency of partitioning.Results obtained on several embedded applications running on an ARM7 core show average energy reductions of 25% (maximum 57%) w.r.t a partitioned memory architecture synthesized without resorting to address clustering.

Page 3: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 3/21

Outline

What’s the problem? Memory Energy Memory Partitioning Address Clustering Experimental Result Conclusions

Page 4: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 4/21

What’s the problem?

Modern SoC platforms usually contain one or more processors. the increasing gap between processor and memory

speed. Various types of on-chip embedded memories

providing shorting latencies and wider interfaces. Problem:

Ubiquity of embedded memories makes them the largest contributor to the overall energy budget of a chip.

Page 5: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 5/21

Memory Energy

Model: Emen = ∑Ni=1 Cost(i);

N: number of accesses during the computation. Cost(i) : cost of an access due to the memory

organization and the cost of the physical access given by technology.

Memory energy optimization:1. Reducing Cost(i):

build low-energy memory architecture.

2. Reducing N: modify the memory access pattern.

3. Both two.

Page 6: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 6/21

Memory Partitioning

memory partitioning technique.

Page 7: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 7/21

Memory Partitioning (cont.)

Figure 1-a: The whole address space of the application is map

ped to a single SRAM memory array. Figure 1-b:

A dynamic access profile. Figure 1-c:

The partitioned memory. Notice that we need to account for the power c

onsumed in the entire partitioned memory system.

Page 8: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 8/21

Address Clustering-Example

MPEG Decoding application for ARM7 core Instruction stream

Page 9: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 9/21

Address Clustering-Example (cont.) Figure 2 show :

Total number of addresses : 31,233 (range from 0 to 124,892) Memory cut has 1,952 rows * 512 columns.

Power consumes 170mJ. (44.4 million total read) Memory partitioning :

Three memory blocks of sizes 736*256 696*512 892*512

Power consumes 96mJ. (inclusive of the overhead) 43.5% Energy reduction :

696*512 : keep the majority (82%) of the memory accesses. (36 million out of 44.4)

Page 10: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 10/21

Address Clustering-Example (cont.) Figure 3 : Clustered Address Profile of a MPEG Decoder

Two memory block sizes : 212*128 1900*512 Power : 42mJ. (an additional 56% of energy saved) 99% of the memory access. (43.99 million out of 44.4 )

Page 11: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 11/21

Address Clustering-Problem

Find a relocation of a proper subset of the address space. Maximize the locality of the dynamic trace. Minimizing the energy consumption of the memory

architecture Cost Metrics

Dynamic access profile C = {c0,c1,….,cN-1}

D(C,W) = maxi (Si) , i = 0, 1, …, N-W (Si) = ∑W-1

j=0 ci+j , W : a sliding window of size

d(C,W) = D(C,W) / Tot. Tot = ∑N

i=0 Ci

Page 12: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 12/21

Address Clustering-Problem (cont.) Figure4 shows the values of d(C,W) for w = 32, 64,

128, 256, 512, about Figure2.

80%

Page 13: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 13/21

Address Clustering-Exploration

High-level pseudo-code : Explore : find a good value of W

Page 14: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 14/21

Address Clustering-Clustering Algorithm Cluster : returns a

modified trace whose first M locations contain the M most visited addresses.

Page 15: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 15/21

Address Clustering-Encoder

Hardware Encode : the swap of address pair -> 2M Cluster Address. f(X) represents a function if X belongs to the set of 2M. Clustering address X’ = R(X). 32 input, combinational network.

Page 16: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 16/21

Experimental Result

Benchmarks are taken from the Ptolemy distribution, others come from the MediaBench suite.

Platform : ARM software development kit. Table1 :

#Addr : total number of distinct addresses. Emono : the energy of the monolithic memory that

contains all the data/instructions. Epartitioned : total memory energy of a partitioned

memory architecture. M = 256, 512, 1024 : memory partitioning combined

with address clustering.

Page 17: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 17/21

Experimental Result (cont.)

Page 18: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 18/21

Experimental Result (cont.)

Original vs. Clustering (Energy)

Page 19: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 19/21

Encoder Overhead Analysis

Encoders have been synthesized with Synopsys DesignCompier on a 0.25um technology by STMicroelectronics

Power figure (Figure 8) are obtained with Synopsys PowerCompier.

The energy figures over the various applications is relatively small

1.The complexity of the decoder is basically independent of the set of addresses that are clustered.

2.The switching activity of the address lines is very similar for all benchmarks.

Page 20: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 20/21

Encoder Overhead Analysis (cont.) 16K memory which dissipates about 375 mW

frequency of 150Mhz. Power = 7.5 mW for M = 1024.

Page 21: Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and

112/04/18 Hung-Yu Chen 21/21

Conclusions

Energy reduction achievable by memory partitioning technology can be improved sensibly by increasing the locality of the trace. Proposed an architectural solution, called Address

Clustering. Experimental results on a set of typical

embedded applications running on an ARM-based system. Address Clustering is able to reduce the energy

consumption of a partitioned memory architecture by 25% on average (maximum 57%) with respect to the partitioning driving by the original trace.