View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Peephole Optimization
Final pass over generated code:
examine a few consecutive instructions: 2 to 4
See if an obvious replacement is possible: store/load pairs
MOV %eax => memaMOV mema => %eax
Can eliminate the second instruction without needing any global knowledge of mema
Use algebraic identities
Special-case individual instructions
Algebraic identities
worth recognizing single instructions with a constant operand:
A * 2 = A + A
A * 1 = A
A * 0 = 0
A / 1 = A
More delicate with floating-point
Is this ever helpful?
Why would anyone write X * 1?
Why bother to correct such obvious junk code?
In fact one might write
#define MAX_TASKS 1...a = b * MAX_TASKS;
Also, seemingly redundant code can be produced by other
optimizations. This is an important effect.
Replace Multiply by Shift
A := A * 4; Can be replaced by 2-bit left shift (signed/unsigned)
But must worry about overflow if language does
A := A / 4; If unsigned, can replace with shift right
But shift right arithmetic is a well-known problem
Language may allow it anyway (traditional C)
Addition chains for multiplication
If multiply is very slow (or on a machine with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective: X * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which may
be faster than one multiply Note similarity with efficient exponentiation method
The Right Shift problem
Arithmetic Right shift:
shift right and use sign bit to fill most significant bits
-5 111111...1111111011
SAR 111111...1111111101
which is -3, not -2
in most languages -5/2 = -2
Prior to C99, implementations were allowed to truncate towards or away from zero if either operand was negative
Folding Jumps to Jumps
A jump to an unconditional jump can copy the target address
JNE lab1 ...lab1 JMP lab2
Can be replaced by JNE lab2 As a result, lab1 may become dead (unreferenced)
Jump to Return
A jump to a return can be replaced by a return JMP lab1
... lab1 RET
Can be replaced by RET lab1 may become dead code
Tail Recursion Elimination
A subprogram is tail-recursive if the last computation is a call to itself:
function last (lis : list_type) return lis_type is
begin
if lis.next = null then return lis;
else return last (lis.next);
end;
Recursive call can be replaced with
lis := lis.next;
goto start; -- added label
Advantages of tail recursion elimination
saves time: an assignment and jump is faster than a call with one parameter
saves stack space: converts linear stack usage to constant usage.
In languages with no loops, this may be a required optimization: specified in Scheme standard.
Tail-recursion elimination at the Instruction Level
Consider the sequence on the x86: CALL func
RET CALL pushes return point on stack, RET in body
of func removes it, RET in caller returns Can generate instead:
JMP func Now RET in func returns to original caller,
because single return address on stack
Peephole optimization in the REALIA COBOL compiler
Full compiler for Standard COBOL, targeted to the IBM PC.
Now distributed by Computer Associates
Runs in 150K bytes, but must be able to handle very large programs that run on mainframes
No global optimization possible: multiple linear passes over code, no global data structures, no flow graph.
Multiple peephole optimizations, compiler iterates until code is stable. Each pass scan code backwards to minimize address recomputations
Typical COBOL code: control structures and perform blocks.
Process-Balance. if Balance is negative then perform Send-Bill else perform Record-Credit end-if.Send-Bill. ...Record-Credit. ...
Simple Assembly: perform equivalent to call
Pb: cmp balance, 0 jnl L1 call Sb jmp L2L1: call RcL2: retSb: ... retRc: ... ret
Fold jump to return statement
Pb: cmp balance, 0 jnl L1 call Sb jmp L2 -- jump to return
L1: call Rc L2: retSb: ... retRc: ... ret
Corresponding Assembly
Pb: cmp balance, 0 jnl L1 -- jump to unconditional jump call Sb ret -- folded L1: jmp Rc -- will become useless L2: ret Sb: ... retRc: ... ret
code following a jump is unreachable
Pb: cmp balance, 0 jnl Rc -- folded jmp Sb ret -- unreachable L1: jmp Rc -- unreachable Sb: ... retRc: ... ret
Jump to following instruction is a noop
Pb: cmp balance, 0 jnl Rc jmp Sb -- jump to next instruction Sb: ... retRc: ... ret
Final code
Pb: cmp balance, 0 jnl RcSb: ... retRc: ... ret
Final code as efficient as inlining. All transformations are local. Each optimization
may yield further optimization opportunities Iterate till no further change
Arcane tricks
Consider typical maximum computation if A >= B then
C := A;else C := B;end if;
For simplicity assume all unsigned, and all in registers
Eliminating max jump on x86
Simple-minded asm code CMP A, B
JNAE L1 MOV A=>C JMP L2L1: MOV B=>CL2:
One jump in either case
Computing max without jumps on X86
Architecture-specific trick: use subtract with borrow instruction and carry flag
CMP A, B ; CF=1 if B > A, CF = 0 if A >= B SBB %eax,%eax ; all 1's if B > A, all 0's if A >= B MOV %eax, C NOT C ; all 0's if B > A, all 1's if A >= B AND B=>%eax ; B if B>A, 0 if A>=B AND A=>C ; 0 if B >A, A if A>=B OR %eax=>C ; B if B>A, A if A>=B
More instructions, but NO JUMPS
Supercompiler: exhaustive search of instruction patterns to uncover similar tricks