Upload
mikasi
View
28
Download
0
Embed Size (px)
DESCRIPTION
Retargetting of VPO to the tms320c54x - a status report. Presented by Joshua George Advisor: Dr. Jack Davidson. Status. Register assignment and allocation Common sub-expression elimination Constant propagation/Copy propagation Induction variable elimination Code motion - PowerPoint PPT Presentation
Citation preview
Retargetting of VPO to the tms320c54x - a status report
Presented by Joshua GeorgeAdvisor: Dr. Jack Davidson
Status Register assignment and allocation Common sub-expression elimination Constant propagation/Copy
propagation Induction variable elimination Code motion Recurrence detection
Status (continued) Strength reduction Instruction selection Dead code elimination Constant folding (simp()) Branch minimization Support for repeat blocks
The tms320c54x 1 40-bit ALU, 2 40-bit accumulators
(A,B) (r[0],r[2] in vpo) 1 17x17bit parallel multiplier with
adder for single cycle MAC operation
1 barrel shifter 8 16-bit address registers (AR0-
AR7) (w[0]..w[7] in vpo)
Compiler writer woes Address arithmetic – can only add a constant
to an address register. Causes complications in optimizer (eg. in strength reduction code).
Interesting note:r[0]=(w[0]{24)}24;r[0]=r[0]+1;w[1]=r[0]; /* w[1]=w[0]+1 gets rejected */W[w[1]]=50; /* by instruction selection */---------------------------w[0]=w[0]+1; W[w[0]+1]=50;The first sequence cannot normally collapse into the more efficient second sequence. But after minimize_registers, instruction selection
isable to fold them into a single instruction.
Compiler writer woes 16 bit word addressing – required
special case handling in lcc frontend. Only 2 accumulator registers.
Local Register Assigner had to be fixed to handle this.
Lots of spills. Refined vpo to use memory disambiguation techniques in instruction selection (maybe_same()).
Compiler writer woes No pipeline interlocks => unprotected
pipeline conflicts. 40 bit accumulator. Needed major
change to simp(). Complicated machine description with sign-extends and ANDs.
Global data placed in special cinit section and is relocated to RAM at run-time. VISTA/EASE code instrumentation had to be done differently from other targets.
Compiler writer woes Compare and jump has the induction
variable and the value to compare with, spread over two instructions. All targets till now had a simple compare and jump. Resulted in small change to vpo lib/md interface. Eg. AR1 (w[1]) is the induction variable and runs
from 0 to 9. The loop exit check –SSBX SXM // s[0]=1; (set sign-ext
on)LD *(AR1),A ; // r[0]=(w[1]{24)}24;SUB #10,A,A ; // r[0]=r[0]-10;BC L1,ALT ; // PC=r[0],0?L1;
Timeline of progress on this project Spring 2002
Code-expander completed. Only basic addressing modes and instructions
supported. Stack layout Calling sequence Data declarations Structure operations
Passes ctests/ptests with instruction selection. Support for stdargs added.
Timeline of progress on this project Fall 2002
Major changes to simp() to handle 40 bit arithmetic.
Enabled Register Coloring and CSE. Lot of work on comp() to allow better
instruction selection and other optimizations.(eg. w[1]=( (w[1]{24)}24)+1 ) & 65535
folds down to w[1]=w[1]+1; <- only now strength
reduction can detect the induction variable) Integrated VISTA into mainline vpo.
Timeline of progress on this project Spring 2003
Enabled Code motion & Strength reduction.
Further refined the machine description/grammar.
Started work on Zero Overhead Loop Buffer (ZOLB) support.
Second merge of VISTA with vpo done. Retargeted VISTA to the tms320c54x.
To-Dos/Future work Parallel instructions Issues with ZOLB (details later) Scheduling The banz instruction (very useful
for loops) – allows comparison of an address register with zero
Circular addressing
TI’s compiler cl500 has.. Inter-procedural analysis
For eg. if the parameters to a function are constants or globals, the actual parameters are substituted into the function, thus avoiding expensive stack frame setup.
Inline expansion of runtime-support library functions.
Code comparison
Code Fragment: Get address of local _ar[2]=(w[7]{24)}24;r[2]=r[2]+_l0_2_a;w[3]=r[2]&65535; // w[3]=w[7]+_l0_2_a
----------------------------w[3]=w[7];w[3]=w[3]+_l0_2_a;
VPO
cl500 (TI-compiler)
Code comparison Code fragment:for (i = 0; i < STRUCTSIZE; i++) // STRUCTSIZE=2 sum += b.field[i];
Because vpo maintains the running sum in a 16 bit register (address register) we use 2 extra instructions and lose the opportunity for converting into a repeat single instruction.The TI-compiler maintains the sum in an accumulator register.
AR3 (w[3]) points to start of array.AR1 maintains the running count.brc=1;rptb .L10_rpt_end-1.L10:ld *(AR1),A // r[0]=(w[1]{24)}24;add *AR3+,A // r[0]=r[0]+(W[w[3]]{24)}24;w[3]=w[3]+1;stl A,*(AR1) // w[1]=r[0]&65535;.L10_rpt_end:--------------------------------------------AR3 (w[3]) points to start of array.A (r[0]) maintains the running count.RPT #1L5:ADD *AR3+,A // r[0]=r[0]+(W[w[3]]
{24)}24;w[3]=w[3]+1;
L6:
VPO
cl500 (TI-compiler)
Zero Overhead Loop Buffers Loops are buffered in a special internal buffer
using a rpt instruction whose parameters are start label, end label and loop count. Access to this buffer may be faster than fetching the instructions from memory.
The usual branch instruction at the end of the loop is no longer necessary when using a repeat instruction, and hence pipeline bubbles are avoided.
On the tms320c54x a single instruction rpt allows memory block copies/initializations without using an address register.
Detail on ZOLB Advantage of doing it in vpo
Can make use of all the information that vpo has already collected about the loop.
Easily retargetable Code in machine independent part is reused. Code in machine dependent part for one target
provides a framework for the new target. After conversion to a Repeat Block, registers
may be freed up. Other optimizations may get enabled.
Status of ZOLB Repeat Blocks with compile time
known loop iteration count implemented.
Plan to implement the banz instruction which is the next best option to ZOLB.
Acknowledgements
Dr. Jack Davidson (advisor)Jason HiserClark Coleman