3
ShrinathJi Institute of Technology & Engineering SITE, Nathdwara M.TECH I SEMESTER 2011-12 SUBJECT-IMSE3 OPTIMIZING COMPLIER Assignment II DATE OF COMMENCEMENT 22-01-2012 DATE OF SUBMISSION 01-02-2012 Q.1 What is cloning procedure, procedure level optimization Ans.:Cloning Procedures Cloning is a generalization of in-line expansion. Consider a procedure P that is called at a number of call sites C1, . . . , Cm in the program. Procedure cloning occurs when the compiler notices that the set of call sites can be partitioned into subsets such that different versions of the procedure P can be used. The same version is used at all call sites within the same set in the partition. The different versions will run more quickly than the general procedure in each of the contexts in which the specialized version occurs. This is a research topic. The best discussion available is by Hall (1991). The partitioning of the call sites involves the identification of some characteristic of the parameters. An easy case is a constant parameter that is used as the stride of a loop. More complex relationships can be identified by determining the dependency information for the flow graph in terms of the formal parameters. When the parameter has certain values, the procedure may be vectorizable, parallelizable, or have a form where the cache usage can be controlled. There is one case of cloning that should be implemented whether the more advanced technique is implemented or not. Consider a function F that has n different call sites, S1, . . . , Sn. If there is one call site that is in a frequently executed region of the program and that call site executes a large roportion o f all calls on F, then a copy of F should be expanded inline at that site. All other sites should execute a normal call on the original copy of F. The frequency and proportion of the calls is a parameter to be tuned, and the information to make the choice may be gathered by program profiling. Simple Procedure-Level Optimization Whether full interprocedural analysis is performed or cloning is implemented, there are several optimizations that can be made when the body of the calling and called procedures are known at compile time. Consider the two procedures in Figure 9.3. The first column represents the original two procedures. If this section of code is executed frequently, the loop can be moved into a new rocedure made from a copy of CALLED in which the body of the procedure becomes the body of the loop. The parameters for NEWCALLED are not listed; however, enough information must be passed to describe the bounds of the loop and the original arguments. Figure 9.3 Moving a Loop Inside a Procedure This transformation decreases the amount of procedure overhead. At the same time, it increases the possibilities that the loop can be software pipelined, vectorized, or parallelized. Q.2 Explain Global and Local Optimization? Q.3 Explain register renaming and coalescing. Register Renaming Re gi st er renami ng el imin atesthe situation in wh ic h the same te mp or ar y is us ed in distinct pa rts of th e flow gr aph to hold di ff erent va lu es. St at ic si ngle as si gn me nt form pr ovid es a ba si s for regi ster re na mi ng . Reca ll that st at ic si ng le as si gnme nt form ge ne rate s a ne w temporary name for each defi ni ti onof a va lu e. Wh en tr ansl at ing ba ck in to no rmalform, th e na me s ar e re comb in edto elimin at e the copy operations implied by the -nodes. Recall that the translation back to normal form is governed by a relation between temporaries. Two temporaries that are related share the same name in the normal form of the graph. Regi ster ren aming is impl emente d by constr ucting the mi nima l relati on that eliminat esal l copi es fro m -nodes. Th is re la ti on is the transitive closure of the condition that two temporaries are re lated if one is an operand and the other isthe targ etof thesame - node. The relation is implemented using NION/FINDalgorithms tocreatea partition of all tempora ries. Hence the algori thm consists of translating to the minimum SSA form, constructing the partition by declaring that the operands and the target of each -node are related, and then translating back into normal form.

Assign 2 Compiler

Embed Size (px)

Citation preview

Page 1: Assign 2 Compiler

8/3/2019 Assign 2 Compiler

http://slidepdf.com/reader/full/assign-2-compiler 1/3

ShrinathJi Institute of Technology & Engineering

SITE, Nathdwara

M.TECH I SEMESTER 2011-12

SUBJECT-IMSE3 OPTIMIZING COMPLIER 

Assignment II

DATE OF COMMENCEMENT 22-01-2012 DATE OF SUBMISSION 01-02-2012Q.1 What is cloning procedure, procedure level optimization

Ans.:Cloning Procedures

Cloning is a generalization of in-line expansion. Consider a procedure P that is called at a number of call sites C1, . . . , Cm in the program.

Procedure cloning occurs when the compiler notices that the set of call sites can be partitioned into subsets such that different versions of 

the procedure P can be used. The same version is used at all call sites within the same set in the partition. The different versions will run

more quickly than the general procedure in each of the contexts in which the specialized version occurs. This is a research topic. The best

discussion available is by Hall (1991).

The partitioning of the call sites involves the identification of some characteristic of the parameters. An easy case is a constant parameter

that is used as the stride of a loop. More complex relationships can be identified by determining the dependency information for the flow

graph in terms of the formal parameters. When the parameter has certain values, the procedure may be vectorizable, parallelizable, or have

a form where the cache usage can be controlled.

There is one case of cloning that should be implemented whether the more advanced technique is implemented or not. Consider a function Fthat has n different call sites, S1, . . . , Sn. If there is one call site that is in a frequently executed region of the program and that call site

executes a large roportion of all calls on F, then a copy of F should be expanded inline at that site. All other sites should execute a normal call

on the original copy of F. The frequency and proportion of the calls is a parameter to be tuned, and the information to make the choice may

be gathered by program profiling.

Simple Procedure-Level Optimization

Whether full interprocedural analysis is performed or cloning is implemented, there are several optimizations that can be made when the

body of the calling and called procedures are known at compile time. Consider the two procedures in Figure 9.3. The first column represents

the original two procedures. If this section of code is executed frequently, the loop can be moved into a new rocedure made from a copy of 

CALLED in which the body of the procedure becomes the body of the loop. The parameters for NEWCALLED are not listed; however, enough

information must be passed to describe the bounds of the loop and the original arguments.

Figure 9.3 Moving a Loop Inside a Procedure

Page 2: Assign 2 Compiler

8/3/2019 Assign 2 Compiler

http://slidepdf.com/reader/full/assign-2-compiler 2/3

eg ster oa esc ng

Register coalescing removes as many copy operations as possible. Many of the copy operations have already been eliminatedduring

 peephole optimization, which eliminated all copies that were not implied by -nodes and did not involve temporaries associated

with -nodes at abnormal edges. The largest proportion of the copies are removed in this way. The rest of the copies are eliminated

using an observation of Chaitin (1981): If the source and the destination of a copy do not conflict, then the source and destination

can be combined into one register. Once the two temporaries have been combined, the algorithm can be applied again to another 

copy. The observation creates a partition of the temporaries: Two temporaries are in the same partition if they have been combined

during register coalescing.

The SSA-form register -renamingalgorithm can generate -nodes associated with abnormal edges in the flow graph. These -nodes

must not generate copy operations when the graph is translated back into normal form. Thus the algorithm must avoid eliminatin

copies that will cause copies to occur on abnormal edges. As usual,impossibleedges are fine since the code on them can never be

executed anyway.Thealgorithm consists of using the SSA form to eliminate most copies. Initiallythe temporaries are partitioned so

that each temporary is in an element of the partition by itself. Then each -node and copy instruction is investigated. If an operand

and the destination temporaries do not conflict, then both temporaries are put in the same partition. The flow graph is then

translated back into normal form.

 Note the similarity between register coalescing and register renaming. Both are implemented by creating a partition, and both

 partitions are created to eliminate the copies at the-nodes.

Q.4 Describe Inter procedure analysis, Inlining procedure.

Interprocedural Analysis

Initiallythe compiler compileseach procedure individually,one procedure or flow graph at a time. In fact, the compiler is organized

as a production line: Each procedure is translated into a flow graph and fed through the compiler, one at a time, until the results are

added to the object file.With this structure, the compiler does not know about the effects of any procedure or function calls. It does

not know which variables might be modified by each procedure call, so it must assume the worst.For interprocedural analysis, this

organization must be changed. However the change can be hidden inside the interprocedural analysis phase if careful data

abstractions are maintained. Interprocedural analysisrequires informationabout multipleprocedures within the applicationprogram,

so the compile-one-at-a-time approach must be modified. Instead, the compiler must accumulate the flow graphs (and other data)

for each procedure. When all of the flow graphs have been found, the whole program can be analyzed to find the effects of each

 procedure call more precisely. Then the rest of the compilation can occur, one flow graph at a time (see Figure 9.1).Figure 9.1 Schematic of Interprocedural Phase

In other words, the interprocedural analysisphase can be thought of as the stomach of the compiler. It gathers together all of the

flow graphs of the application,processes them, and passes each one along to the rest of the compiler to be processed. As each flow

graph is passed along, the inter - procedural analysis information about its calls and where they are called are available for the

optimizers and code generators.There are manyways in which this repository of information can be stored. One approach is to keep

a library of procedures and their flow graphs on the disk as a complex data structure that is updated each time a file in the

application is compiled. Another approach is to keep the repository in memory. In our sample case, the whole application will be

t t t

Page 3: Assign 2 Compiler

8/3/2019 Assign 2 Compiler

http://slidepdf.com/reader/full/assign-2-compiler 3/3

omput ng nterproce ura as n ormat on

There are four other kinds of information computed during interprocedural analysis:

1. The interprocedural analyzer computes alias information. Consider the point in application execution immediately after a

rocedure call inside a procedure. Which of its formal parameters (dummy arguments) may reference the same memory location as

another variable mentioned in the flow graph? This is only a problem for formal parameters passed by reference so that the actual

arameter is a pointer to the data in memory. Interprocedural analysis will compute an estimate of which formal parameters might

e sharing the same memory location as other formal parameters or global variables.2. The interprocedural analyzer computes modification information. The compiler would like to know which variables and memory

locations might be modified during a procedure call. This includes both the modification of arguments that are passed by reference

and global variablesthat are modifiedas side effects. Again, the word ³might is used since it is too difficultto determine whether the

data must be

modified during a procedure call.

3. The interprocedural analyzer computes the variables that might be used in a procedure. Again, this includes both variables that ar 

modified because they are associated with formal parameters that are passed by reference and global variables. As before, the

information is only accurate to ³might rather than ³must standards.

4. The interprocedural analyzer computes the formal parameters that are always bound to a single constant in the application

rogram. I will not describe the computation of this information here, instead referring you to the papers referenced previously.

Inlining Procedures

The one part described in this chapter that is needed in any high- performance compiler is procedure inlining.Consider a function

such as in Figure 9.2. The cost of calling the function and returning the value is probably more expensive than the actual execution

of the function body. These costs can be avoided by substituting the body of the function into the calling procedure rather than

inserting a procedure call. During the substitution, the formal parameters must be replaced by the actual parameters in such a

fashion that the same computations will be performed after the substitution as would be performed by the function call, and local

variablesmust be renamed so that they do not conflict with the variables in the callingprocedure. Of course global variablesvariable

common between the called function and other functionsmust not be renamed. When should a function be expanded inline?There is

no single good answer to that question because of the expansion/contraction problem. The expansion of a function inline

within another function initiallyexpands the size of the whole program. On the other hand, this expansion may make possible anumber of simplificationsthat will result in a smaller program. Consider the example of a function that is a large case or switch

statement with each alternative being a single statement. If a function call on that function with a constant actual parameter is

replaced by an in-line expansion of the function, the program initiallyexpands in size; however, constant propagation will eliminat

all of the code except the corresponding one small alternative, thus making the program smaller and faster. Here is

the logic that the compiler should use for deciding whether a function is to be expanded inline:

If the compiler contains a compile-time command to expand a function inline, then expand it inline. This simply means that the

rogrammer is telling the compiler to do it, so do it. Correspondingly, if a compile-

timedirective indicates not to expand a functioninline, then do not do it under any circumstances.

Figure 9.2 Example of Function to Inline If there is onlyone call on a function, then it can be expanded inline.This willdecrease the

amount of function-call overhead without increasing program size. This

situation occurs with programs that are written in a top-down programming style. Such a programming style encourages the writing

of functions called only once. If the resulting function is estimated to be larger than some size, such as the size of the fastest cache,

then the expansion should not be performed automatically.

If the com iler estimates that the size of the function bod is smaller than the size of the function call, then the function can be