View
216
Download
1
Tags:
Embed Size (px)
Citation preview
P. Marwedel: Embedded Software:How to make it efficient? Slide -1 -
Embedded Software:How to make it efficient?
Peter Marwedel University of Dortmund
Informatik 12 44221 Dortmund, Germany
P. Marwedel: Embedded Software:How to make it efficient? Slide -2 -
What is an embedded system?
These are not the embedded systems we will talk about!
P. Marwedel: Embedded Software:How to make it efficient? Slide -3 -
Embedded Systems
Main reason for buying is not information processingCharacteristics:• not recognised as information processing• frequently real-time behaviour required • must be dependable & guarantee privacy• many of these systems are mobile systems• fundamental technology for pervasive computing/
ambient intelligence, implemented in complex software
Embedded systems = information processing systems embedded into a larger product
P. Marwedel: Embedded Software:How to make it efficient? Slide -4 -
Views on embedded software
For many products in the area of consumer electronics the amount of code is doubling every two years [Fritz Vaandrager in: Rozenberg, Vaandrager (eds.): Lectures on Embedded Systems, LNCS, Vol. 1494, 1998]
„On Nanoscale Integration and Gigascale Complexity in the Post - .com world“ [de Man, Keynote, DATE 2002]
... it is now common knowledge that more than 70% of the development cost for complex systems such as automotive electronics and communication systems are due to software development[A. Sangiovanni-Vincentelli, 1999]
P. Marwedel: Embedded Software:How to make it efficient? Slide -5 -
The energy/flexibility conflict- Intrinsic Power Efficiency -
Technology
[H. de Man, Keynote, DATE‘02;T. Claasen, ISSCC99]
Operations/Watt[MOPS/mW]
Processors
Reconfigurable Computinghardwired muxed
1
0.1
0.01
0.13µ
Necessary to optimize software; otherwise the prize for software flexibility cannot be paid!
Necessary to optimize software; otherwise the prize for software flexibility cannot be paid!
Ambient Intelligence
0.07µ
DSP-ASIPs
µPs
10
0.25µ0.5µ1.0µ
poor software generation techniques
P. Marwedel: Embedded Software:How to make it efficient? Slide -6 -
„Power is considered as the most important constraint in embedded systems“[in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW]
Importance of Power and Energy Consumption
Current UMTS phones can hardly be operated for more than an hour, if data is being transmitted.[from a report of the Financial Times, Germany, on an analysis by Credit Suisse First Boston; http://www.ftd.de/tm/tk/9580232.html?nv=se]
P. Marwedel: Embedded Software:How to make it efficient? Slide -7 -
Key requirements for embedded software
Hardware/software efficiency run-time efficiency, code-size efficiency, energy efficiency, power consumption, .....
Many standards published as „reference implementations“ (just provide the correct results; do not care about efficiency)
proposal of the „software washing machine“ (Catthoor)
„dirty“ unoptimized software in
„clean“ optimized software out
P. Marwedel: Embedded Software:How to make it efficient? Slide -8 -
Generating efficient software requireswork at all levels
Algorithmic level(using the most efficient algorithm + data structures)
High-level source code transformations Compiler optimizations Code-Compression Operating system support
(e.g. for minimizing power consumption)
P. Marwedel: Embedded Software:How to make it efficient? Slide -9 -
Algorithmic level
Choosing best decoding/filtering etc. algorithm+data structures
Example: MPEG-2 data structures: Inverse Discrete Cosine Transform (IDCT) most power/cycle hungry hot spot.
Transformations: Replacing „double“ by „float“
[still acceptable quality] Energy consumption reduced to 34%,
cycles reduced to 35 % Standard IDCT „Fast IDCT“ („double float“ „integer“),
[significant loss of precision]. Energy consumption reduced to 4.86%,
cycles reduced to 5.10% [T. Huels, Inf 12, UniDo, 2002]
P. Marwedel: Embedded Software:How to make it efficient? Slide -10 -
High-level transformations
Example: Separation of margin handling
+
many if-statements for margin-checking
no checking,efficient
only few margin elements to be processed
P. Marwedel: Embedded Software:How to make it efficient? Slide -11 -
if (x>=10||y>=14) for (; y<49; y++) for (k=0; k<9; k++) for (l=0; l<9;l++ ) for (i=0; i<4; i++) for (j=0; j<4;j++) { then_block_1; then_block_2}else {y1=4*y; for (k=0; k<9; k++) {x2=x1+k-4; for (l=0; l<9; ) {y2=y1+l-4; for (i=0; i<4; i++) {x3=x1+i; x4=x2+i; for (j=0; j<4;j++) {y3=y1+j; y4=y2+j; if (0 || 35<x3 ||0 || 48<y3) then-block-1; else else-block-1; if (x4<0|| 35<x4||y4<0||48<y4) then_block_2; else else_block_2;}}}}}}
Loop nest splitting at University of DortmundLoop nest from MPEG-4 full search motion estimation
for (z=0; z<20; z++) for (x=0; x<36; x++) {x1=4*x; for (y=0; y<49; y++) {y1=4*y; for (k=0; k<9; k++) {x2=x1+k-4; for (l=0; l<9; ) {y2=y1+l-4; for (i=0; i<4; i++) {x3=x1+i; x4=x2+i; for (j=0; j<4;j++) {y3=y1+j; y4=y2+j; if (x3<0 || 35<x3||y3<0||48<y3) then_block_1; else else_block_1; if (x4<0|| 35<x4||y4<0||48<y4) then_block_2; else else_block_2;}}}}}}
for (z=0; z<20; z++) for (x=0; x<36; x++) {x1=4*x; for (y=0; y<49; y++)
analysis of polyhedral domains, selection with genetic algorithm
[H. Falk et al., Inf 12, UniDo, 2002]
P. Marwedel: Embedded Software:How to make it efficient? Slide -12 -
Results for loop nest splitting- Execution times -
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
Cavity Motion Estimation QSDPCM
[H. Falk et al., Inf 12, UniDo, 2002]
P. Marwedel: Embedded Software:How to make it efficient? Slide -13 -
Results for loop nest splitting- Code sizes -
0%
20%
40%
60%
80%
100%
120%
140%
160%
180%
200%
Sun
Pentiu
m HPM
IPS
PowerPC
DEC Alp
ha
TriMed
ia
TI C6x
ARM7
thm
b
ARM7
arm
Averag
e
Cavity Motion Estimation QSDPCM
[H. Falk et al., Inf 12, UniDo, 2002]
P. Marwedel: Embedded Software:How to make it efficient? Slide -14 -
Generating efficient software requireswork at all levels
Algorithmic level(using the most efficient algorithm + data structures)
High-level source code transformations Compiler optimizations Code-Compression Operating system support
(e.g. for minimizing power consumption)
P. Marwedel: Embedded Software:How to make it efficient? Slide -15 -
Compilers: Translation from C critical bottleneck
(Real-time) UML or equiv.(Real-time) UML or equiv.
StateCharts/SDLStateCharts/SDL
(sets of) C-programs(sets of) C-programs
Assembly levelAssembly level
RT-JavaRT-Java
Assembly levelAssembly level
VHDLVHDL
HWHW
(Real-time) UML or equiv.(Real-time) UML or equiv.
P. Marwedel: Embedded Software:How to make it efficient? Slide -16 -
Overhead of compilers for DSP processors
DSPStone (Zivojnovic et al.). Example: ADPCMCycle overhead [× n]
DSP56001TI-C51 ADI-2101
1.0
2.0
3.0
4.0
5.0
8.0
7.0
6.0
Optimizations exploiting architectural features of embedded processors.
Current focus: VLIW processors (powerful multimedia processors).
In this talk: focus on energy consumption.
P. Marwedel: Embedded Software:How to make it efficient? Slide -17 -
Larger & off-chip memories need more energythan smaller & on-chip memories
0
0.5
1
1.5
2
2.5
64 128 256 512 1024 2048 4096 8192
Memory size
En
erg
y p
er a
cce
ss
[nJ
]
Example (CACTI Model):
[Steinke et al., Inf 12, UniDo, 2002]
P. Marwedel: Embedded Software:How to make it efficient? Slide -18 -
Example: Off-chip vs. on-chip memories
ARM7TDMI cores, well-known for low power consumption
ARM Atmel Evaluation Board
Processor
On-chip memory
board
On-board memory
P. Marwedel: Embedded Software:How to make it efficient? Slide -19 -
On-chip vs. off-chip current
Current32 Bit-Load Instruction (Thumb)
48,2 50,9 44,4 53,1
11677,2 82,2
1,16
0
50
100
150
200
Prog Off-Chip/Data Off-Chip
Prog Off-Chip/Data On-Chip
Prog On-Chip/Data Off-Chip
Prog On-Chip/Data On-Chip
mA
Core+On-Chip-Memory Current (mA) Off-Chip-Memory Current (mA)
Example: Atmel ARM-Evaluation board
Processor
On-chip memory
board
On-board memory
current reduction:
/ 3.02
current reduction:
/ 3.02
P. Marwedel: Embedded Software:How to make it efficient? Slide -20 -
On-chip vs. off-chip energy
Energy32 Bit-Load Instruction (Thumb)
115,8
51,6
76,5
16,4
0,020,040,060,080,0
100,0120,0140,0
Prog Off-Chip/Data Off-Chip
Prog Off-Chip/Data On-Chip
Prog On-Chip/Data Off-Chip
Prog On-Chip/Data On-Chip
10
nJ
Energy
Example: Atmel ARM-Evaluation board
€
Off-chip access takes more cycles savings (86%) are larger than for the current.
energy reduction:/ 7.06
energy reduction:/ 7.06
P. Marwedel: Embedded Software:How to make it efficient? Slide -21 -
Exploitation of on-chip memory
Which segment (array, loop, etc.) to be stored in on-chip memory?
Gain gi and size si for each segment i.
Maximise gain G = gi, respecting constraint K si.
Static memory allocation:
Solution: knapsack algorithm.
Dynamic reloading:
Where to insert calls to copy function? IP-model
Processor
On-chip memory,capacity K
board
On-board memory
?
For i .{ }
for j ..{ }
while ...
Repeat
call ...
Array ...
Int ...
Array
Example:
P. Marwedel: Embedded Software:How to make it efficient? Slide -22 -
Why not just use a cache ?
0
1
2
3
4
5
6
7
8
9
256 512 1024 2048 4096 8192 16384
memory size
En
erg
y p
er
ac
ce
ss
[n
J]
.
Scratch pad
Cache, 2way, 4GB space
Cache, 2way, 16 MB space
Cache, 2way, 1 MB space
Energy consumption in tags, comparators and muxes significant.
[R. Banakar, S. Steinke, B.-S. Lee, 2001]
P. Marwedel: Embedded Software:How to make it efficient? Slide -23 -
Results for optimization algorithm
Energy saving
0,00% 10,00% 20,00% 30,00% 40,00% 50,00%
Be
nch
ma
rk
Onchip/MemSize
Energy Saving
[Steinke et al., Inf 12, UniDo, 2002]
0.5%
P. Marwedel: Embedded Software:How to make it efficient? Slide -24 -
Total energy reduction for MPEG-2 [%]
100
33.97
31.83
21.68
6.21
4.87
0 20 40 60 80 100
Original
Algorithm (float)
High-level opt.
Compiler opt.
Cache
Scratch pad (static)
[T. Huels, Inf 12, UniDo, 2002]
P. Marwedel: Embedded Software:How to make it efficient? Slide -25 -
Optimization technique for microcontrollers and network processors: Bit-field detection
Assembly:mov b, 1, a, 0, 3 # Cost: 1
a
1
«
|
&
7
&
b 0xF1
b a
[Wagner, Inf 12, UniDo, 2002]
b
=
P. Marwedel: Embedded Software:How to make it efficient? Slide -26 -
Results available to industry?
„Center of excellence“ (IMEC)
„Center of excellence“ (IMEC)
Informatik 12, UniDo
Informatik 12, UniDo
Design houses/ semiconductor vendors
Design houses/ semiconductor vendors
ICD e.V.(technology transfer center)
ICD e.V.(technology transfer center)
CAD vendors
CAD vendors
partner‘s of the trinity model
yes!
P. Marwedel: Embedded Software:How to make it efficient? Slide -27 -
Generating efficient software requireswork at all levels
Algorithmic level(using the most efficient algorithm + data structures)
High-level source code transformations Compiler optimizations Code-Compression Operating system support
(e.g. for minimizing power consumption)
P. Marwedel: Embedded Software:How to make it efficient? Slide -28 -
Code compression/decompression
ROM
µP
decompressor
µP
ROM
Key idea:
Very good survey: Rik van de Wiel: The Code Compaction Bibliography, www.extra. research.philips.com/ccb/
Addr Addr
P. Marwedel: Embedded Software:How to make it efficient? Slide -29 -
Variable-voltage/frequency example: INTEL Xscale
Fro
m I
nte
l’s W
eb
Site
OS should schedule distribution of the energy budget.
P. Marwedel: Embedded Software:How to make it efficient? Slide -30 -
Conclusion
At the algorithmic level At the level of high-level transformations Within the compiler At the code compression level Within the Embedded OS
Making embedded software efficient requires efforts at alllevels:
The focus of this talk was on compilers and energy efficiency;
using new algorithms, the energy consumption can be significantly reduced..