Upload
jackharish
View
194
Download
0
Tags:
Embed Size (px)
Citation preview
Profiling Tools 1
Optimization Techniques
Session -1
2
Content
Program optimization – introduction Optimization techniques for embedded systems
development C for Embedded systems
3
Session Objectives
To learn the importance of optimization of the program To know the different optimization techniques for
embedded systems design To understand why to use c for embedded system
development
Profiling Tools 4
Introduction
5
The Problem
PC speed increased 500 times since 1981, but today’s software is more complex and still hungry for more resources
How to run faster on same hardware and OS architecture? Highly optimized applications run tens times faster
than poorly written ones Using efficient algorithms and well-designed
implementations leads to high performance applications
6
Writing Fast Programs
Use a fast algorithm It does not make sense to optimize a bad algorithm
Implement it efficiently Detect hotspots using profiler and fix them
Understanding of target system architecture is often required – such as cache structure
Use platform-specific compiler extensions – memory pre-fetching cache control-instruction branch prediction SIMD instructions
Write multithreaded applications
7
Writing Fast Programs
Use good coding practices Use good data structures Apply appropriate optimization techniques Optimizing code takes time and reduces source code
readability
8
Optimizing Embedded Software
Embedded software often runs on processors with limited computation power, thus optimizing the code becomes a necessity
Program can be made either faster or smaller, but not both An improvement in one of these areas can have a
negative impact on the other It is up to the programmer to decide which of these
improvements is most important to her/him Recommendation: reduce the size of your program
9
Optimizing For Program Size
Goal: Reduce hardware cost of memory Reduce power consumption of memory units
Two opportunities: Data
Reuse constants, variables, data buffers in different parts of code
Requires careful verification of correctnessGenerate data using instructions
InstructionsAvoid function inliningChoose CPU with compact instructionsUse specialized instructions where possible
10
Cost Of High Performance
11
Performance: Where To Look
“Maximize performance - who knows where to optimize and where not to optimize”
Spend your time optimizing the portions of code where the most time is taken Run a compiled program to learn where that
program spends its time May profile other computational resource usage -
Space, Power, I/O Not easy to estimate this resource usage by static
analysis (requires dynamic)
12
Performance: Where To Look
Problem: You're given a program's source code (which someone else wrote) and asked to improve its performance by at least 20%
Where do you begin? Look at source code and try to find
inefficient C code Try rewriting some of it in assembly Rewrite using a different algorithm (Remove random portions of the code)
13
Performance: Where To Look
How to figure out where a program is spending its time? Count every static instruction - to know which routines
(functions) were the biggest Big deal, large functions that aren't executed often
don't really matter Count every dynamic instruction – to know which
routines executed the most instructions Excellent! It tells the “relative importance” of each
function But doesn't account for memory system
Count how many cycles were spent in each routine - to know which routines took the most amount of time
14
The Software Optimization Process
Find hotspots
Modify application
Retest using benchmark Investigate causes
Create benchmark
Hotspots are areas in your code that take a long time to execute
15
Extreme Optimization Pitfalls
Large application’s performance cannot be improved before it runs
Build the application then see what machine it runs on Runs great on my computer… Debug versus release builds Performance requires assembly language
programming Code features first then optimize if there is time
leftover
16
Key Point:
Software optimization doesn’tbegin where coding ends –
It is ongoing process that starts at design stage and
continues all the way through development
17
90/10 Rule
90% of execution time is spent in 10% of code
So the ‘hot’ 10% is the code that must be optimized
Optimization takes time, but gives efficient code – so
only use for 10%
Simple interpretation is quick, but gives slow code –
use for 90%
Tradeoff – need to get balance right!
18
How To Find Performance Bottlenecks
Determine how the system resources are being utilized to identify system-level bottlenecks
Measure the execution time for each module and function in the application
Determine how the various modules running on the system affect the performance of each other
Identify the most time-consuming function calls and call sequences within the application
Determine how the application is executing at the processor level to identify microarchitecture-level performance problems
19
Improving Program Performance
Compiler writers try to apply several standard optimizations - Do not always succeed
Compiler writers sometimes apply aggressive optimizations Often not “informed” enough to know that change
will help rather than hurt Optimizations based on specific
architecture/implementation characteristics can be very helpful Much harder for compiler writers because it
requires multiple, generally very different, “back end” implementations
20
Improving Program Performance
How can one help? Better code, algorithms and data structures (of
course) Re organize code to help compiler find opportunities
for improvement Replace poorly optimized code with assembly code
(i.e., bypass compiler)
21
Writing Efficient C code
To write efficient C code, you must be aware of areas The C compiler has to be conservative The limits of the processor architecture the C
compiler is mapping to The limits of a specific C compiler - dependent on
the compiler vendor look at the compiler’s documentation or
experiment with the compiler
22
Performance Tools Overview
Timing mechanisms Stopwatch : UNIX time tool
Optimizing compiler (easy way) System load monitors
vmstat , iostat , perfmon.exe, Vtune Counter Software profiler
Gprof, VTune, Visual C++ Profiler, IBM Quantify Memory debugger/profiler
Valgrind , IBM Purify, Parasoft Insure++
23
Optimization Techniques
• Bad memory management has serious impacts• Poor data locality causes high power dissipation• Poor memory throughput leads to poor
performance• Optimization techniques
• Platform independent• Loop transformation • Data reuse• Processor partitioning
24
Optimization Techniques
Architecture specificMemory modeling optimization
Register allocation – graph coloring Custom memory architecture
Memory address generationGeneral compilers – generated addresses are
periodicEmbedded systems – address sequence might
not be periodic
25
Optimization Techniques
The "scope" of the optimization: Local optimizations - Performed in a part of one procedure.
Common sub-expression elimination (e.g. those occurring when translating array indices to memory addresses.
Using registers for temporary results, and if possible for variables.
Replacing multiplication and division by shift and add operations. Global optimizations - Performed with the help of data flow
analysis and split-lifetime analysis. Code motion (hoisting) outside of loops Value propagation Strength reductions
Inter-procedural optimizations
26
Optimization Techniques
What is improved in the optimization: Space optimizations - Reduces the size of the
executable/object. Constant pooling Dead-code elimination.
Speed optimizations - Most optimizations belong to this category
27
Optimization Techniques
There are important optimizations not covered above, e.g. the various loop transformations: Loop unrolling - Full or partial transformation of a
loop into straight code Loop blocking (tiling) - Minimizes cache misses by
replacing each array processing loop into two loops, dividing the "iteration space" into smaller "blocks"
Loop interchange - Change the nesting order of loops, may make it possible to perform other transformations
Loop distribution - Replace a loop by two (or more) equivalent loops
Loop fusion - Make one loop out of two (or more)
Profiling Tools 28
C Language In Embedded Systems
29
C Language In Embedded Systems
A number of causes to the increased popularity of C in embedded system area:
The ever-increasing complexity of applications drives programmers from assembly to the high-level languages
The high-level programming language C offers good support for high-speed, low-level I/O operations Programmers of embedded applications particularly
appreciate this mixed high/low-level approach In comparison to other high-level language compilers,
C language compilers tend to deliver more condensed code size
30
C Language In Embedded Systems
Virtually all mathematical modeling tools generate C source code
C offers significant productivity gains with opportunities for Code re-use Improved code maintenance Ongoing developments over the life of the application
C can be written in a structured manner that reduces the chance of producing errors C can also be written in a very condensed manner,
which is hard to comprehend and dramatically increases the likelihood of introducing errors
31
C Language In Embedded Systems
The compiler does not necessarily detect small typing errors The operators &&, &, ||, |, +=, =, and ==, and think
of the ease with which a typo will still lead to perfectly valid C code
Not every programmer is fully aware of the effects of all the possible constructs in the C language Casts (implicit or explicit) can cause both confusion
and errors
32
C Language In Embedded Systems
One of the main reasons that C compilers do a great job of generating compact, efficient code is because of the limited run-time checking in C There are no provisions in C that would prevent
arithmetic exceptions such as divide by zero, overflow, validity of addresses or pointers, or surpassing array boundaries from causing a runtime software failure
It is therefore easy to understand that programmers with a special interest in writing robust, consistent code have a concern with the programming language C
33
C Language In Embedded Systems
Many of the companies developing safety-related
embedded applications have written guidelines to
restrict the use of error-prone C constructs with the
intention of reducing the probability of errors
The goal of these standards is to increase portability,
reduce maintenance, and above all improve clarity
Mixed coding style is harder to maintain than bad
coding style
34
C Language In Embedded Systems
These standards recognize that individual
programmers have the right to make judgments about
how best to achieve the goal of code clarity
All code should be ANSI standard and should compile
without warning under at least its principal compiler
Any warnings that cannot be eliminated should be
commented in the code
Profiling Tools 35
Optimizing C Code
36
Help From The Compiler
Always use compiler optimization settings to build an application for use with performance tools
Understanding and using all the features of an optimizing compiler is required for maximum performance with the least effort
Use a compiler that supports your CPU Avoid compiler optimization when debugging Compiler optimization may:
Cause certain variables to vanish Prevent stepping through each line of the code Make it impossible to place breakpoints freely
Identify your machine to the compiler gcc -march=athlon
37
Help From The Compiler
Ask the compiler to unroll loops gcc -funroll-loops gcc -funroll-all-loops
Ask the compiler to generate procedures inline gcc -finline-functions
Ask the compiler to generate conditional expressions in place of branches gcc -O
Use hand tuned library calls for your platform There is very little gain in optimizing the string copy
function... Someone already did this for you
38
Gcc Optimization Levels O0
don’t optimize reduce cost of compilation make debugging possible
O1 basic optimizations for execution time and space reduction only functions declared as inline are expanded inline only variables declared as register are placed in registers
O2 most optimization flags are turned on compiler optimizes variable reister usage does not do any space-speed trade-offs (ie no inlines)
O3 turns on all available optimization flags compiler will attempt inlining for all compact functions code generated is much larger than 02 but only slightly faster
39
Optimizing Compiler : Choosing Optimization Flags Combination
40
Optimizing Compiler’s Effect
41
Helping The Compiler
Variables Avoid complicated pointer arithmetic; use array
indexes Use aliases Use const and register where appropriate Use integer arithmetic in place of floating point Use local variables in place of function arguments Use word sized variables if possible Avoid globals; use static variables as a last resort Avoid volatile unless you mean it
42
Helping The Compiler
Functions Declare compact functions as inline Declare local functions as static Avoid function calls in tight and frequent loops Avoid indirect calls Avoid recursion, unless necessary Use __attribute__ ((noreturn)) Use __attribute__ ((const))
43
Helping The Compiler
Control flow Simple design will often prevent extra branches Fewer branches leads to more effective branch
prediction Faster for loop If..else… Switch Loop breaking
44
Helping The Compiler
Files Keep closely related functions together Little optimization is done (by ld) at the linking stage
Libraries Use functions best suited for the task memcpy can be faster than strcpy if you know the
length puts is faster than printf
45
Summary
Software optimization doesn’t begin where coding ends – It is ongoing process that starts at design stage and continues all the way through development
• Optimization techniques• Platform independent• Loop transformation • Data reuse• Processor partitioning