Upload
jackharish
View
213
Download
0
Embed Size (px)
Citation preview
8/14/2019 C-Programming-Optimization Techniques Class 4
1/87
1
Optimization Techniques
Session-4
8/14/2019 C-Programming-Optimization Techniques Class 4
2/87
2
Session Topics
Compute-bound checking Memory-bound checking IO-bound checking Safe C programs
8/14/2019 C-Programming-Optimization Techniques Class 4
3/87
3
Session Objectives
To know the different optimization techniques forembedded systems design when compiler is not
enough
To understand how to write safe C code
8/14/2019 C-Programming-Optimization Techniques Class 4
4/87
4
When Compiler Is Not Enough
Locate spots that could be improved dont waste time improving code that is rarely used!
Profiling tools: gprof(1) gcov (a gcc utility) OProfile Linux Trace Toolkit (LTT)
Determine what to concentrate on Use time(1) to determine if program is compute-bound, memory-bound, io-bound, or not bound atall.
8/14/2019 C-Programming-Optimization Techniques Class 4
5/87
Profiling Tools 5
Compute-Bound
8/14/2019 C-Programming-Optimization Techniques Class 4
6/87
6
Compute-Bound
Choose a Better Algorithm Write Clear, Simple Code Perspective Understand Compiler Options Inlining
Loop Unrolling Loop Jamming Loop Inversion Strength Reduction Loop Invariant Computations Code for Common Case Tail Recursion Elimination Table Lookup Sorting
Variables Function Calls Digestibility String Operations FP Parallelism
Get a Better Compiler Stack Usage Code it in Assembly Shared Library Overhead Machine-Specific Optimization
8/14/2019 C-Programming-Optimization Techniques Class 4
7/87
7
Compute-Bound
Most compute-bound programs can be translated tomemory-bound ones with the use of lookup tables
When all else fails... rewrite slow code in assembly
Hand-coded assembly Some software modules are best written in assembly
language This gives the programmer an opportunity to make them as
efficient as possible
Though most C/C++ compilers produce much better machinecode than the average programmer, a good programmer canstill do better than the average compiler for a given function
8/14/2019 C-Programming-Optimization Techniques Class 4
8/87
8
Choose A Better Algorithm
Also choose an appropriate data structure
If you'll be doing a lot of insertions and deletions
at random places then a linked list would be good
If you'll be doing some binary searching, an arraywould be better.
8/14/2019 C-Programming-Optimization Techniques Class 4
9/87
9
Write Clear, Simple Code
Some of the very things that make code clearand readable to humans also make it clearand readable to compilers
Complicated expressions are harder to
optimize and can cause the compiler to"fallback" to a less intense mode ofoptimization
Part of the clarity is making hunks of code intofunctions when appropriate The cost of a function call is extremely small on
modern machines, so optimization is NOT a validexcuse for writing ten-page functions.
8/14/2019 C-Programming-Optimization Techniques Class 4
10/87
10
Perspective
A sure sign of misunderstanding is thisfragment:
if (x != 0) x = 0;The intent is to save time by not initializing x if
it's already zero In reality, the test to see whether it's zero or
not will take up about as much time as settingit to zero itself would have
x = 0;has the same effect and will be somewhat
faster.
8/14/2019 C-Programming-Optimization Techniques Class 4
11/87
11
Understand Your Compiler
Options
Some compilers have special #pragmasor keywords (for example, inline) which
also affect optimization.
8/14/2019 C-Programming-Optimization Techniques Class 4
12/87
12
In-lining of Functions
eplacing a call to a function with the function's code iscalled in-lining
Benefit: reduction in procedure call overheads andopportunity for additional code optimizations
Danger: code bloat and negative instruction cacheeffects Appropriate when small and/or called from a small
number of sites
8/14/2019 C-Programming-Optimization Techniques Class 4
13/87
13
Loop Unrolling
This can make a BIG difference. It is well known that unrolling loops can produce considerable savings,
e.g.
for(i=0; i
8/14/2019 C-Programming-Optimization Techniques Class 4
14/87
14
Loop Unrolling
Compilers will often unroll simple loopslike this, where a fixed number ofiterations is involved, but something like
for(i=0;i
8/14/2019 C-Programming-Optimization Techniques Class 4
15/87
15
Loop Unrolling
Simplest effect of loop unrolling: fewer test/jumpinstructions (fatter loop body, less loop overhead)
Fewer loads per flop May lead to threaded code that uses multiple FP units
concurrently (instruction-level parallelism) How are loops handled that have a trip count which is
not a multiple of the unrolling factor? Already fat loops do hardly benefit from unrolling
(instruction cache capacity!) Very short loops may suffer from unrolling or benefit
strongly
8/14/2019 C-Programming-Optimization Techniques Class 4
16/87
16
Loop Unrolling
Doing multiple iterations of work in each iteration iscalled loop unrolling
Benefit: reduction in looping overheads and
opportunity for more code opts.
Danger: code bloat, negative instruction cache effects,and non-integral loop div.
Appropriate when small and/or called from small
number of sites
8/14/2019 C-Programming-Optimization Techniques Class 4
17/87
17
Loop Unrolling: Making Fatter Loop
Bodies
8/14/2019 C-Programming-Optimization Techniques Class 4
18/87
18
Loop Unrolling :Improving Flop/Load
Ratio
Analysis of the flop-to-load-ratio often unveilsanother benefit of unrolling:
do i= 1,N
do j= 1,M
y(i)=y(i)+a(j,i)*x(j)
enddo
enddo
Innermost loop: two loads and two flopsperformed; i.e., we have one load per flop
8/14/2019 C-Programming-Optimization Techniques Class 4
19/87
19
Loop Unrolling :Improving Flop/Load
Ratio
do i= 1,N,2 Both loops unrolled twicet1= 0
t2= 0
do j= 1,M,2t1= t1+a(j,i) *x(j) +a(j+1,i) *x(j+1)
t2= t2+a(j,i+1)*x(j) +a(j+1,i+1)*x(j+1)enddo
y(i) = t1
y(i+1)= t2
enddo
Innermost loop: 8 loads and 8 flops! Exposes instruction-level parallelism How about unrolling by 4? Watch register spill!
8/14/2019 C-Programming-Optimization Techniques Class 4
20/87
20
Loop Jamming
Never use two loops where one will suffice:
for(i=0; i
8/14/2019 C-Programming-Optimization Techniques Class 4
21/87
21
Loop Jamming
It would be better to do:for(i=0; i
8/14/2019 C-Programming-Optimization Techniques Class 4
22/87
22
Loop Inversion
Some machines have a special instruction fordecrement and compare with 0
Assuming the loop is insensitive to direction, try thisreplacment:for (i = 1; i
8/14/2019 C-Programming-Optimization Techniques Class 4
23/87
23
Strength Reduction
Strength reduction is the replacement of an expressionby a different expression that yields the same valuebut is cheaper to compute
Many compilers will do this for you automatically
The classic examples: x = w % 8;y = pow(x, 2.0);z = y * 33;
for (i = 0; i < MAX; i++)
{
h = 14 * i;printf("%d", h);
}
8/14/2019 C-Programming-Optimization Techniques Class 4
24/87
24
Strength Reduction
It would be better to do:x = w & 7; /* bit-and cheaper than remainder */
y = x * x; /* mult is cheaper than power-of */
z = (y
8/14/2019 C-Programming-Optimization Techniques Class 4
25/87
25
Induction Variables and Strength
Reduction
A variable X is called an induction variable of a loop Lif every time the variable X changed value, it isincremented or decremented by some constant
When there are 2 or more induction variables in aloop, it may be possible to get rid of all but one
It is also frequently possible to perform strengthreduction on induction variables the strength of an instruction corresponds to its
execution cost Benefit: fewer and less expensive operations
t4 = 0
label_XXX
t4 += 4
t5 = a[t4]
if (t5 > v) goto label_XXX
t4 = 0
label_XXX
j = j + 1
t4 = 4 * j
t5 = a[t4]
if (t5 > v) goto label_XXX
AfterBefore
8/14/2019 C-Programming-Optimization Techniques Class 4
26/87
26
Loop Invariant Computations
Any part of a computation that does not depend on theloop variable and which is not subject to side effectscan be moved out of the loop entirely
Try to keep the computations within the loop simpleanyway, and be prepared to move invariantcomputations out yourself: there may be somesituations where you knowthe value won't vary, butthe compiler is playing it safe in case of side-effects
"Computation" here doesn't mean just arithmetic; array
indexing, pointer dereferencing, and calls to purefunctions are all possible candidates for moving out ofthe loop.
8/14/2019 C-Programming-Optimization Techniques Class 4
27/87
27
Loop Invariant Computations
In loops which call other functions, you might be ableto get some speedup by ripping the subroutines apartand figuring out which parts of them are loop-invariantfor that particular loop in their callerand calling thoseparts ahead of time
This is not very easy and seldom leads to muchimprovement unless you're calling subroutines whichopen and close files repeatedly or malloc and freelarge amounts of memory on each call or somethingelse drastic.
A common but not-always-optimized-away case is therepeated use of an expression in successivestatements
8/14/2019 C-Programming-Optimization Techniques Class 4
28/87
28
Loop Invariant Computations
Old code:
total =
a->b->c[4]->aardvark +
a->b->c[4]->baboon +
a->b->c[4]->cheetah +
a->b->c[4]->dog;
New code:
struct animals * temp = a->b->c[4];
total =
temp->aardvark +temp->baboon +
temp->cheetah +
temp->dog;
8/14/2019 C-Programming-Optimization Techniques Class 4
29/87
29
Code For Common Case
In a section of code which deals with severalalternative situations, place at the beginningthe tests and the code for the situations whichoccur most often
Frequently, this takes the form of a long trainof mutually exclusive if-then-else's, of whichonly one will get executed.
By placing the most likely one first, fewer if's
will need to be performed over the long term.But if the conditions are simple things like x ==
3, consider using a switch statement
8/14/2019 C-Programming-Optimization Techniques Class 4
30/87
30
Tail Recursion Elimination (TRE)
When a recursive function calls itself, an optimizercan, under some conditions, replace the call with anassembly level equivalent of a "goto" back to the top ofthe function
The saves the effort of growing the stack, saving andrestoring registers, and any other function calloverhead
For very small recursive functions that make zillions ofrecursive calls, TRE can result in a substantial
speedup With proper design, the TRE can take a recursive
function and turn it into whatever is the fastest form ofloop for the machine
8/14/2019 C-Programming-Optimization Techniques Class 4
31/87
31
Tail Recursion Elimination (TRE)
int isemptystr(char * str){
if (*str == '\0') return 1;
else if (! isspace(*str)) return 0;
else return isemptystr(++str);
}
The above can have TRE applied to the final return statement becausethe returned value from this invocation of isemptystr will be exactly that ofthe n+1th invocation, with no further computation.
And now a counterexample:
int factorial(int num)
{
if (num == 0) return 1;else return num * factorial(num - 1);
}
8/14/2019 C-Programming-Optimization Techniques Class 4
32/87
32
Tail Recursion Elimination (TRE)
The above cannot have TRE applied because thereturned value is not used directly: it is multiplied bynum after the call, so the state of that invocation mustbe maintained until after the return. Even a compilerthat supports TRE cannot use it here.
And now a counter-counterexample, a rewrite of thefactorial program to allow TRE optimization.
int factorial(int num, int factor)
{
if (num == 0) return factor;else return factorial(num - 1, factor * num);
}
8/14/2019 C-Programming-Optimization Techniques Class 4
33/87
33
Table Lookup
Consider using lookup tables especially if a computation is iterative or recursive,e.g. convergent series or factorial. (Calculations that take constant time can oftenbe recomputed faster than they can be retrieved from memory and so do notalways benefit from table lookup.)
Old code:
long factorial(int i)
{
if (i == 0)
return 1;
else
return i * factorial(i - 1);
} New code:
static long factorial_table[] =
{1, 1, 2, 6, 24, 120, 720 /* etc */};long factorial(int i)
{
return factorial_table[i];
}
8/14/2019 C-Programming-Optimization Techniques Class 4
34/87
34
Sorting
For nearly all situations, the library qsortfunction is speedy enough to make
implementation of your own sort
algorithm unnecessaryOften the strcmp optimizations are
helpful.
8/14/2019 C-Programming-Optimization Techniques Class 4
35/87
35
Variables
Avoid referring to global or static variablesinside the tightest loops
Don't use the volatile qualifier unless you reallymean it
Avoid passing addresses of your variables toother functions
The optimizer has to assume that the calledfunction is capable of stashing a pointer to thisvariable somewhere and so the variable couldget modified as a side effect of calling whatseems like a totally unrelated function.
8/14/2019 C-Programming-Optimization Techniques Class 4
36/87
36
Variables
Example:a = b();
c(&d);Because d has had its address passed to
another function, the compiler can no longerleave it in a register across function calls.
It can however leave the variable a in aregister
The register keyword can be used to trackdown problems like this; if d had beendeclared register the compiler would have towarn that its address had been taken.
8/14/2019 C-Programming-Optimization Techniques Class 4
37/87
37
Function Calls
Function calls interrupt an optimizer's train ofthought in a drastic way Any references through pointers or to global
variables are now "dirty" and need to be
saved/restored across the function call Local variables which have had their address taken
and passed outside the function are also now dirty
There is some overhead to the function call
itself as the stack must be manipulated andthe program counter altered by whatevermechanism the CPU uses.
8/14/2019 C-Programming-Optimization Techniques Class 4
38/87
38
Function Calls
If the function being called happens to bepaged out, there will be a very long delaybefore it gets read back in For functions called in a loop it's unusual for the
called function to be paged out until the loop is
finished, but if virtual memory is scarce, calls toother functions in the same loop may demand thespace and force the other function out, leading tothrashing
Most linkers respect the order in which you list
object files, so you can try to get functions neareach other in hopes that they'll land on the samepage.
8/14/2019 C-Programming-Optimization Techniques Class 4
39/87
39
Digestibility
Straight-line code, even with an extrastatement or two, will run faster than
code full of if's, &&'s, switch's, and goto's
Pipelining processors are much happierwith a steady diet of sequential
instructions than a bunch of branches,
even if the branches skip someunnecessary sections.
8/14/2019 C-Programming-Optimization Techniques Class 4
40/87
40
String Operations
Most of the C library str* and mem*functions operate in time proportional to
the length(s) of the string(s) they are
givenIt's quite easy to loop over calls to these
and wind up with a significant bottleneck.
8/14/2019 C-Programming-Optimization Techniques Class 4
41/87
41
String Operations
strlen Avoid calling strlen() during a loop involving the string itself Even if you're modifying the string, it should be possible to
rewrite it so that you set x = strlen() before the loop and thenx++ or x-- when you add or remove a character.
strcat When building up a large string in memory using strcat, it will
scan the full (current) length of the string on each call If you've been keeping track of the length anyway (see above)
you can index directly to the end of the string and strcpy or
memcpy to there. strcmp
You can save a little time by checking the first characters ofthe strings in question before doing the call
8/14/2019 C-Programming-Optimization Techniques Class 4
42/87
42
Stack Usage
A typical cause of stack-related problems ishaving large arrays as local variables
In that case the solution is to rewrite the code
so it can use a static or global array, orperhaps allocate it from the heap
A similar solution applies to functions which
have large structs as locals or parameters
Recursive functions, even ones which havefew and small local variables and parameters,
can still affect performance
8/14/2019 C-Programming-Optimization Techniques Class 4
43/87
43
Stack Usage
int func1(){
int a, b, c, etc;
do_stuff(a, b, c)if (some_condition)
return func2();
else
return 1;
}
8/14/2019 C-Programming-Optimization Techniques Class 4
44/87
44
Code It In Assembly
Estimates vary widely, but a competenthuman writing assembly-level code can
produce code which runs about 10%
faster than what a compiler with fulloptimization on would produce from well-
written high-level source
8/14/2019 C-Programming-Optimization Techniques Class 4
45/87
45
Shared Library Overhead
Calling a dynamically linked function isslightly slower than it would be to call it
statically
8/14/2019 C-Programming-Optimization Techniques Class 4
46/87
46
Machine-specific Optimization
As with other machine-specific code, youcan use #ifdef to set off sections of code
which are optimized for a particular
machineCompilers don't predefine RISC or
SLOW_DISK_IO or HAS_VM or
VECTORIZING so you'll have to comeup with your own and encode them into
a makefile or header file.
8/14/2019 C-Programming-Optimization Techniques Class 4
47/87
47
Optimizing sorts
Almost 60% of time spent in strcmp called by insert_sort strcmp compares two strings and returns int
0 if equal, negative if first is ``less than'' second, positive
otherwise
Replace strcmp(a,b) call with some initialcompares
8/14/2019 C-Programming-Optimization Techniques Class 4
48/87
48
Optimizing sorts
if (a[0] < b[0]) {result is neg
}
if (a[0] == b[0]) {
if (a[1] < b[1]) {
result is neg
}if (a[1] == b[1]) {
if (strcmp(a,b)
8/14/2019 C-Programming-Optimization Techniques Class 4
49/87
Profiling Tools 49
Memory-Bound
8/14/2019 C-Programming-Optimization Techniques Class 4
50/87
50
Memory-Bound
Locality of Reference Column-major Accessing Don't Copy Large Things Split or Merge Arrays Reduce Padding Increase Padding March Forward Beware the Power of Two Memory Leaks Be Stringy Hinting Fix the Problem in Hardware Cache Profilers
8/14/2019 C-Programming-Optimization Techniques Class 4
51/87
51
Locality of Reference
Locality of reference significantly improvesmemory performance through the use of
caches
Temporal localitymost recently used data items or instructions are
more likely to be available in cache
Spacial locality
data items or instructions that are close together inmemory are more likely to be in cache when
needed
Locality Of Reference : Using The
8/14/2019 C-Programming-Optimization Techniques Class 4
52/87
52
Locality Of Reference : Using The
Cache
Try to keep data as close to the CPU as possible align data structures and data access to cacheline boundaries
__attribute__ ((aligned (L1_CACHE_BYTES) ))
place most frequently used structure members first allow compiler to pad structure members to the CPUs
preferred data alignment avoid array sizes and array strides that are integer multiples
of the cache size
The cacheline prefetch use the prefetch instruction ahead of a memory access to
guarantee that data is available in cache when its needed(see linux/prefetch.h)
using prefetch instruction is not portable to all platforms
Locality Of Reference : Using Virtual
8/14/2019 C-Programming-Optimization Techniques Class 4
53/87
53
Locality Of Reference : Using Virtual
Memory
TLB - Translation Lookaside Buffer this cache is used to store recent translations
between virtual and physical addresses the fewer memory pages used the more effective
utilization of the TLB for every miss in this table the kernel is called to
make the translation -- this is an expensiveoperation
Pagingmemory pages can be swapped out to disk when
physical memory is used up programs should manipulate data in small working
sets to minimize page faults
Locality Of Reference : Better Stack
8/14/2019 C-Programming-Optimization Techniques Class 4
54/87
54
Locality Of Reference : Better Stack
Use
Reduce function call penalties instead of passing many parameters to a
function use a structure pointer
ask the compiler to pass upto X parametersusing registers__attribute__ ((regparm (X) ))
8/14/2019 C-Programming-Optimization Techniques Class 4
55/87
55
Don't Copy Large Things
Instead of copying strings, arrays, or largestructs, consider copying a pointer to them
ANSI C now requires that structs are pass-by-
value like everything else If you have extraordinarily large structs, or are
making millions of function calls on medium-sized
ones, you might consider passing the struct's
address instead, after modifying the called function
so that it doesn't perturb the contents of the struct.
8/14/2019 C-Programming-Optimization Techniques Class 4
56/87
56
Split Or Merge Arrays
If the parts of your program making heaviestuse of memory are doing so by accessingelements in "parallel" arrays you can combinethem into an array of structs so that the data
for a given index is kept together in memory. If you already have an array of structs, but find
that the critical part of your program isaccessing only a small number of fields in
each struct, you can split these fields into aseparate array so that the unused fields do notget read into the cache unnecessarily.
8/14/2019 C-Programming-Optimization Techniques Class 4
57/87
57
Reduce Padding
Arrange similarly-typed fields together in a structure with the mostrestrictively aligned types first - there may still be padding at theend
New code:/* sizeof = 48 bytes */
struct foo {
double b;
double d;long f;
long h;
float a;
float c;
int j;
int l;
short e;
short g;
char i;
char k;
};
Old code:
/* sizeof = 64 bytes */
struct foo {
float a;
double b;float c;
double d;
short e;
long f;
short g;
long h;
char i;int j;
char k;
int l;
};
8/14/2019 C-Programming-Optimization Techniques Class 4
58/87
58
Increase Padding
Increasing the size and alignment of a data structureto match (or to be an integer fraction or multiple of) thecache line size may increase performance.
The alignment is harder to control, but usually one ofthese techniques will work:
Use malloc instead of a static array. Some mallocsautomatically allocate storage suitably aligned for cache lines Allocate a block twice as large as you need, then point
wherever in it that satisfies the alignment you need. Use an alternate allocator (e.g. memalign) which guarantees
minimal alignment.
Use the linker to assign specific addresses or alignmentrequirements to symbols. Wedge the data into a known position inside another block
which is already aligned.
8/14/2019 C-Programming-Optimization Techniques Class 4
59/87
59
March Forward
Theoretically, it makes no differencewhether you iterate over an arrayforwards or backwards, but somecaches are of a "predictive" type thattries to read in successive cache lineseven before you need them
Because these caches must work
quickly, they tend to be fairly dim andrarely have the extra logic for predictingbackwards traversal of memory pages.
8/14/2019 C-Programming-Optimization Techniques Class 4
60/87
60
Beware The Power Of Two
Direct-mapped 1MB cache with 128-byte cache linesand a program which uses 16MB of memory, allhappening on a machine with 32 bit addresses
The simplest way for the cache to map the memoryinto the cache is to mask off the first 12 and the last 7
bits of the address, then shift to the right 7 bits What we end up with is a cache that maps any twoaddresses exactly 8192 (2^13) bytes apart in mainmemory to the same cache line
If the program happens to use an array of 8192 byte
structs, and refers to just one element in each onewhile processing the whole array, every access willmap to the same cache line and force a reload, whichis considerable delay
8/14/2019 C-Programming-Optimization Techniques Class 4
61/87
61
Reducing Memory Usage
Because ROM is usually cheaper than RAM (on a per-
byte basis), one acceptable strategy for reducing the
amount of global data might be to move constant data
into ROM
This can be done automatically by the compiler if you
declare all of your constant data with the keyword const
Most C/C++ compilers place all of the constant global
data they encounter into a special data segment that is
recognizable to the locator as ROM-able
8/14/2019 C-Programming-Optimization Techniques Class 4
62/87
62
Reducing Memory Usage
This technique is most valuable if there are lots ofstrings or table-oriented data that does not change at
runtime
Stack size reductions can also lower program's RAM
requirement Be especially conscious of stack space if you are
using a real-time operating system
Most operating systems create a separate stack for
each task These stacks are used for function calls and interrupt
service routines that occur within the context of a task
R d i M U
8/14/2019 C-Programming-Optimization Techniques Class 4
63/87
63
Reducing Memory Usage
To reduce the stack size: You can determine the amount of stack required for
each task stack: fill the entire memory area reserved for the stack
with a special data patternThen, after the software has been running for a
while-preferably under both normal and stressfulconditions-use a debugger to examine themodified stack
The part of the stack memory area that stillcontains your special data pattern has neverbeen overwritten, so it is safe to reduce the sizeof the stack area by that amount
8/14/2019 C-Programming-Optimization Techniques Class 4
64/87
R d i M U
8/14/2019 C-Programming-Optimization Techniques Class 4
65/87
65
Reducing Memory Usage
If the heap is too small, your program will not be ableto allocate memory when it is needed, so always be
sure to compare the result ofmallocor new with NULL
before dereferencing it
If you've tried all of these suggestions and your
program is still requiring too much memory, you might
have no choice but to eliminate the heap altogether
8/14/2019 C-Programming-Optimization Techniques Class 4
66/87
Profiling Tools 66
IO-Bound
IO B d
8/14/2019 C-Programming-Optimization Techniques Class 4
67/87
67
IO-Bound
Things you can try: Sequential Access
Random Access
Terminals
Sockets
SFIO
Tune your file-descriptors and sockets
there are many options you could tweakSome io-bound programs can be translated to
memory-bound with the use of mmap(2).
IO B d
8/14/2019 C-Programming-Optimization Techniques Class 4
68/87
68
IO-Bound
I/O (in Unix) usually puts your process to sleepfor a time
The request for I/O may finish fairly quickly but
perhaps some other process is busy using the
CPU and your process has to wait
Because the length of the wait is somewhat
arbitrary and depends on what exactly the
other processes are doing, I/O boundprograms can be tricky to optimize.
S ti l A
8/14/2019 C-Programming-Optimization Techniques Class 4
69/87
69
Sequential Access
Buffered I/O is usually (but not always) fasterthan unbuffered.
If you aren't worried about portability you can
try using lower level routines like read() and
write() with large buffers and compare their
performance to fread() and fwrite() Using read() or write() in a single-character-at-a-
time mode is especially slow on Unix machinesbecause of the system call overhead.
S ti l A
8/14/2019 C-Programming-Optimization Techniques Class 4
70/87
70
Sequential Access
Consider using mmap() if you have it This can save effort in several ways The data doesn't have to go through stdio which
saves a buffer copy Depending on the sophistication of the paging
hardware, the data need not even be copied intouser space; the program can just access anexisting copy
mmap() also lends itself to read-ahead;theoretically the entire file could be read into
memory before you even need it Lastly, the file is can be paged directly off the
source disk and doesn't have to use up virtualmemory.
R d A
8/14/2019 C-Programming-Optimization Techniques Class 4
71/87
71
Random Access
Consider mmap() if you have it If there's a trade off between I/O bound and
memory bound in your program, consider alazy-free of records: when memory gets
tight, free unmodified records and write outmodified records (to a temporary file if needbe) and read them in later
Though if you take the disk space you'd use
to write the records out and just add it to thepaging space instead you'll save yourself alot of hassle.
A h I/O
8/14/2019 C-Programming-Optimization Techniques Class 4
72/87
72
Asynchronous I/O
You can set up a file descriptor to be non-blocking (see ioctl(2) or fcntl(2) man pages)and arrange to have it send your process asignal when the I/O is done; in the meantimeyour process can get something else done,
including perhaps sending off other I/Orequests to other devices Significant parallelism may result at the cost of
program complexity.
Multithreading packages can aid in theconstruction of programs which utilizeasynchronous I/O.
T i l
8/14/2019 C-Programming-Optimization Techniques Class 4
73/87
73
Terminals
If your program spews out a lot of data to thescreen, it's going to run slow on a 1200 baud
line
Waiting for the screen to catch up stops your
program.
This doesn't add to the CPU or disk time as
reported for accounting purposes, but it sure
seems slow to the userA general solution is to provide a way for the
user to squelch out irrelevant data
Gotchas
8/14/2019 C-Programming-Optimization Techniques Class 4
74/87
74
Gotchas
1. Programmers tend to over-estimate theusefulness of the programs they write. The
approximate value of an optimization is:
number of runs number of users time
savings user's salary - time spent optimizing
programmer's salary
even if the program will be run hundreds of
times by thousands of users, an extra dayspent saving 40 milliseconds probably isn't
going to help.
Gotchas
8/14/2019 C-Programming-Optimization Techniques Class 4
75/87
75
Gotchas
2. Machines are not created equal. What's fast on one machine maybe slow on another.
3. Don't get into the habit of writing code according to the above
rules of optimization. Only apply them afteryou have discovered
exactly which function is the problem. Some of the rules if applied
globally would make the program even slower.4. Spending a week optimizing a program can easily cost thousands
of dollars in programmer time. Sometimes, it's easier to just buy a
faster CPU or more memory or a faster disk and solve the
problem that way.
5. Novices often assume that writing lots of statements on a singleline and removing spaces and tabs will speed things up. While
that may be a valid technique for some interpreted languages, it
doesn't help at all in C.
Architectural/Code Optimizations
8/14/2019 C-Programming-Optimization Techniques Class 4
76/87
76
Architectural/Code Optimizations
Often, it is important to understand the architecture'simplementation in order to effectively optimize code
Much more difficult for compilers to do because it requires a
different compilerback-end for every implementation
One example of this is the ARM barrel shifter
Can convert Y * Constant into series of adds and shifts Y * 9 = Y * 8 + Y * 1
Assume R1 holds Y and R2 will hold the result
ADD R2, R1, R1, LSL #3 ; LSL #3 is same as * by 8
Another example is the ARM 7500 write buffer specifics
8/14/2019 C-Programming-Optimization Techniques Class 4
77/87
Profiling Tools 77
Safe C
Safety Violation
8/14/2019 C-Programming-Optimization Techniques Class 4
78/87
78
Safety Violation
incorrect type castsdangling-pointer dereferences
data races
Uninitialized memory
NULL-pointer dereferences
array-bounds violations
incorrect use of unions
Type Equality for Parameters
8/14/2019 C-Programming-Optimization Techniques Class 4
79/87
79
Type Equality for Parameters
void f2(void **p, void *x) { *p = x; } Type safety requires that p points to a value with the
same type as x Without this equality, a use of f2 can violate memory
safety:
int y = 0;
int * z = &y;
f2(&z, 0xABC);
*z = 123; Other functions with the same type, such as f2ok,
could allow &z and 0xABC as arguments:
void f2ok(void **p, void *x) { if(*p==x) printf("same"); }
Dangling Stack Pointers
8/14/2019 C-Programming-Optimization Techniques Class 4
80/87
80
Dangling Stack Pointers
dereference a dangling pointer, i.e., access a data object after it has been
deallocated. Acall to g attempts to write 123 to address 0xABC.
int * f1() {
int x = 0;
return &x;
}
int ** f2() {
int * y = 0;
return &y;
}
void g() {
int * p1 = f1();
int ** p2 = f2();
*p1 = 0xABC;**p2 = 123;
} To avoid memory exhaustion, a garbage collector reclaims memory implicitly.
data races Pointer Race Condition
8/14/2019 C-Programming-Optimization Techniques Class 4
81/87
81
data races -Pointer Race Condition
int g1 = 0;int g2 = 0;
int * gp = &g1;
void f1(int **x) { *x = &g2; }int f2() { spawn(f1,&gp,sizeof(int*)); return
*gp; }
If an invocation of f2 reads gp while aninvocation of f1 writes gp, the read couldproduce an unpredictable bit-string
Uninitialized Memory
8/14/2019 C-Programming-Optimization Techniques Class 4
82/87
82
Uninitialized Memory
void f() {int * p1;
int ** p2 = malloc(sizeof(int*));
*p1 = 123;
**p2 = 123;
}
NULL Pointers
8/14/2019 C-Programming-Optimization Techniques Class 4
83/87
83
NULL Pointers
The compiler inserts only one check into thiscode:
int f(int *p, int *q, int **r) {
int ans = 0;
if(p == NULL) return 0;
ans += *p;
ans += *q; // inserted check
*r = NULL;ans += *q;
}
Array Bounds Checking
8/14/2019 C-Programming-Optimization Techniques Class 4
84/87
84
Array-Bounds Checking
void write_v(int v, unsigned sz, int *arr) {for(int i=0; i < sz; ++i)
arr[i] = v;
}
To violate safety, clients can pass a
value for sz greater than the length of
arr.
incorrect use of unions
8/14/2019 C-Programming-Optimization Techniques Class 4
85/87
85
incorrect use of unions
C programs that use the same memory fordifferent types of data need casts or union
types, but both are notoriously unsafe
It is common to use an int (or enum) field to
record the type of data currently in the
memory; this field discriminates which variant
occupies the memory
Programmers must correctly maintain andcheck the tag
Types of Development Tools
8/14/2019 C-Programming-Optimization Techniques Class 4
86/87
86
Types of Development Tools
Compilation and building: make
Managing files: RCS, SCCS, CVS
Editors: vi, emacs
Archiving: tar, cpio, pax, RPM
Configuration: autoconf
Debugging: gdb, dbx, prof, strace, purify
Programming tools: yacc, lex, lint, indent
8/14/2019 C-Programming-Optimization Techniques Class 4
87/87