1
About Holst Centre Independent open-innovation R&D centre Develops generic technologies Partnership with industry and academia Shared roadmaps and programs Visit us at www.holstcentre.com Energy Efficiency using Loop Buffer based Instruction Memory Organizations Abstract Energy consumption in embedded systems is strongly influenced by the consumption of the instruction memory organization. Based on this, any architectural enhancement in this component of the system will cause a substantial reduction in the total energy consumption of the system. Loop buffering is an effective scheme to reduce energy consumption in the instruction memory organization. In this work, an energy design space exploration is performed focusing on different architecture variants based on the loop buffer concept. Their energy impacts on different application scenarios are also analyzed. Motivation References/ Acknowledgements [1] F. Catthoor, P. Raghavan, A. Lambrechts, M. Jayapala, A. Kritikakou, and J. Absar, Ultra-low power domain-specific instruction-set processors. Springer Publishing Company, Incorporated, 2010. [2] J. Villarreal, R. Lysecky, S. Cotterell, and F. Vahid, ¨A Study on the Loop Behavior of Embedded Programs¨, University of California, Riverside, Tech. Rep. UCR-CSE-01-03, December 2001. This research is carried out with support from Holst Centre / imec Netherlands, and partly carried out at Holst Centre. Thanks for their support to Filipa Duarte, Maryam Ashouei, Jos Huisken, José L. Ayala, David Atienza and Francky Catthoor. Conclusions • Energy savings due to the introduction of the central loop buffer architecture are directly related with the loop body size and the number of iterations. • Multiple loop buffer architectures achieve higher energy savings due to a better adaptation of the loop buffer sizes to the sizes of the loops that form the application. • There is not energy improvement using distributed loop buffer architectures with incompatible loop- nesting organization instead of multiple loop buffer architectures with shared loop-nesting organization. • A trade-off exists between the complexity of the loop buffer architecture and the energy saving. • Careful evaluation is needed when loops are not dominating the application execution time. Antonio Artes 1 2 1 Facultad de Informatica, Universidad Complutense de Madrid , Spain Email: [email protected] 2 imec / Holst Centre, Eindhoven, the Netherlands Email: [email protected] Design space exploration • Biomedical wireless sensor nodes require sustained operation for long periods of time with minimal recharging of the battery. Therefore, a reduction in energy consumption and an increase in the lifetime reliability are required. Memories are one of the most power consuming elements of the digital signal processing part of the sensor nodes. Loop buffer concept 77% of the execution time of an application is spent in loops with 32 instruction or less. 84% of the execution time of an application is spent in loops with 32000 iteration or less. Case studies PMEM DMEM CORE DMEM+FIFOS PMEM Dynamic 9.2 mW CORE DMEM+FIFOS PMEM Leakage 4.95 uW [ DSP processor designed in Holst Centre to process ECG signals, 2009] The architectural models mimic loops found in real embedded applications of different size and number of iterations Central loop buffer architecture for single processor organization Multiple loop buffer architecture with shared loop-nest organization Distribute loop buffer architecture with incompatible loop-nest organization General-purpose design ASIP design [Simulation methodology] [Power consumption per access in 16-bit instruction word commercial 90nm SRAMs] [PowerStone Benchmarks] General-purpose design ASIP design Crypto (AES) algorithm I. Tsekoura, Design exploration of Application Specific Instruction-Set Cryptographic Processors for resources constrained systems. Master thesis, Dept. of Computer Eng. And Informatics, Univ. of Patras and IMEC, 2010 Heart beat detection algorithm Y. H. Yassin, Ultra Low Power Application Specific Instruction-Set Processor design for a cardiac beat detector algorithm. Master thesis, Dept. of Electronics and Telecommunications, Norwegian University of Science and Technology and IMEC, 2009

About Holst Centre Independent open-innovation R&D centre Develops generic technologies Partnership with industry and academia Shared roadmaps and programs

Embed Size (px)

Citation preview

Page 1: About Holst Centre Independent open-innovation R&D centre Develops generic technologies Partnership with industry and academia Shared roadmaps and programs

About Holst Centre• Independent open-innovation R&D centre

• Develops generic technologies

• Partnership with industry and academia

• Shared roadmaps and programs

Visit us at www.holstcentre.com

Energy Efficiency using Loop Buffer based Instruction Memory Organizations

AbstractEnergy consumption in embedded systems is strongly influenced by the consumption of the instruction memory organization. Based on this, any architectural enhancement in this component of the system will cause a substantial reduction in the total energy consumption of the system. Loop buffering is an effective scheme to reduce energy consumption in the instruction memory organization. In this work, an energy design space exploration is performed focusing on different architecture variants based on the loop buffer concept. Their energy impacts on different application scenarios are also analyzed.

Motivation

References/ Acknowledgements[1] F. Catthoor, P. Raghavan, A. Lambrechts, M. Jayapala, A. Kritikakou, and J. Absar, Ultra-low power domain-specific instruction-set

processors. Springer Publishing Company, Incorporated, 2010.[2] J. Villarreal, R. Lysecky, S. Cotterell, and F. Vahid, ¨A Study on the Loop Behavior of Embedded Programs¨, University of California,

Riverside, Tech. Rep. UCR-CSE-01-03, December 2001.

This research is carried out with support from Holst Centre / imec Netherlands, and partly carried out at Holst Centre. Thanks for their support to Filipa Duarte, Maryam Ashouei, Jos Huisken, José L. Ayala, David Atienza and Francky Catthoor.

Conclusions• Energy savings due to the introduction of the central loop buffer architecture are directly related with the loop body size and the number of iterations.• Multiple loop buffer architectures achieve higher energy savings due to a better adaptation of the loop buffer sizes to the sizes of the loops that form the application.• There is not energy improvement using distributed loop buffer architectures with incompatible loop-nesting organization instead of multiple loop buffer architectures with shared loop-nesting organization.• A trade-off exists between the complexity of the loop buffer architecture and the energy saving. • Careful evaluation is needed when loops are not dominating the application execution time.

Antonio Artes1 2

1 Facultad de Informatica, Universidad Complutense de Madrid , SpainEmail: [email protected]

2 imec / Holst Centre, Eindhoven, the NetherlandsEmail: [email protected]

Design space exploration• Biomedical wireless sensor nodes require sustained operation for long periods of time with minimal recharging of the battery. Therefore, a reduction in energy consumption and an increase in the lifetime reliability are required. • Memories are one of the most power consuming elements of the digital signal processing part of the sensor nodes.

Loop buffer concept

• 77% of the execution time of an application is spent in loops with 32 instruction or less.• 84% of the execution time of an application is spent in loops with 32000 iteration or less.

Case studies

PMEM DMEM

CORE

DMEM+FIFOS

PMEM

Dynamic 9.2 mW

CORE

DMEM+FIFOS

PMEM

Leakage 4.95 uW

[ DSP processor designed in Holst Centre to process ECG signals, 2009]

The architectural models mimic loops found in real embedded applications of different size and number of iterations

Central loop buffer architecture for single processor organization

Multiple loop buffer architecture with shared loop-nest organization

Distribute loop buffer architecture with incompatible loop-nest organization

General-purpose design ASIP design

[Simulation methodology]

[Power consumption per access in 16-bit instruction word commercial 90nm SRAMs]

[PowerStone Benchmarks]General-purpose design ASIP design

Crypto (AES) algorithmI. Tsekoura, Design exploration of Application Specific Instruction-Set Cryptographic Processors for resources constrained systems. Master thesis, Dept. of Computer Eng. And Informatics, Univ. of Patras and IMEC, 2010

Heart beat detection algorithmY. H. Yassin, Ultra Low Power Application Specific Instruction-Set Processor design for a cardiac beat detector algorithm. Master thesis, Dept. of Electronics and Telecommunications, Norwegian University of Science and Technology and IMEC, 2009