View
217
Download
0
Category
Preview:
Citation preview
8/14/2019 Optimization for Embedded System
1/1
Optimization forEmbedded System
Performance I ndices
Peformance
Memory Usage
Power
Layers
Embedded processor architecture
80/20
80% of time is spent in standby mode
20% of time is spent in operating mode
80% of operating time is a subset ofperipherals and functions used
20% of operating time is spent with allfunctions in use
ArchitectureVon Neumann
Harvard
Power
Software Techniques
Saving power at boot timeScaling voltage and frequency
Using sleep modes and idling clock domain
Coordinating sleep and scaling
Chip level
Reduce power supply voltage
Run at low clock frequency
Disable function units with control signals when not in use
Disconnect part from power supply when not in use
Pick Criteria
Optimal performance key algorithm
Low power for key algorithm
Efficient management of memory (cache)
Memory hierarchy
Register
Different levels of cache
Main memory
Disc space
OS layerPower Manager
Loosely coupled to OS kernel
Set of library coutines; execute in client's context
Configurable as necessary
Use platform-specific adaptation library for V/F scaling
Application, drivers, CLK register for notifications
Application trigger actions
Algorithm
Code
Prevent obsessive optimization
Developing eff icient code
Inline functions
Table lookups
Hand coded assembly
Register variables
Polling
Fixed point arithmetic
Decreasing code size
Avoid standard library routine
Native word size
C++
Subtopic
Top 10 SW power optimization
Architecture SW to have natural "idle" points
Use interrupt-driven programming
Code and data placement close toprocessor to minimize off-chip accesses
Smart placement to allow frequencylyaccessed code/data close to CPU
Size optimizations to reduce footprint,memory, and corresponding leakage
Optimize for speed for more CPU idlemode or reduce CPU frequency
Don't over calculate
Use DMA for efficient transfer
Use co-processors to efficientlyhandle/accerlate frequent/specializedprocessing
Use more buffering and batch processingto allow more computation at once andmore time in low power modes
Optimization levels
-o0 (register)
Simplifies control flow
Allocates variables to registersEliminates unused code
Simplifies expressions and statements
Expands calls to inline functions
-o1 (local)
Performs local copy/constant propagation
Removes unused assignments
Eliminates local common expressions
-o2 (global)
Performs local loop o ptimizations
Eliminates global commonsub-expressions
Eliminates global unused assignments
Perform loop unrolling
-o3 (file)
Removes functions that are never called
Simplifies functions with return value that are never used
Reorder functions so that attributes ofcalled function are known when caller isoptimized
Identifies file-level variable characteristics
ptimization for Embedded System.mmap - 2007/8/26 -
Recommended