Automatic compilation

Automatic compilation

Student name: Eldad UzmanStudent ID : 036544062

Lecturer : DR Itzhak Aviv

Introduction to compilation A compiler is a program that translates one programminglanguage to another. Its input is the source language; its output is the target language.The most known form of compiling is to read a program inone programming language, namely: C, C++, JAVA, C# and VB, and to translate it to an equivalent Assembly code or binary code for the machine to execute.Far more than just writing a program, these tools are needed in many other fields of software engineering.

Why compilation tools are needed?

A common mistake among many programmers is the

attitude that if your code is being compiled

successfully, release it.

It is clear that more complex software, that involves

many external tools and have a rough need for

integration, there’s a need for more than just to compile

the code.

System engineer life cycleSystem concept

System design

Functional specifications

Component design

Program and unit tests

Integration and system tests

Conversion and installation

Operation and maintenance

Software release life cycle

Release Management is the relatively new but rapidly Release Management is the relatively new but rapidly growing discipline within software engineering of managing growing discipline within software engineering of managing software releasessoftware releases..

As software systems, software development process, and As software systems, software development process, and resources become more distributed, they invariably resources become more distributed, they invariably become more specialized and complex. Furthermore, become more specialized and complex. Furthermore, software products are typically in an ongoing cycle of software products are typically in an ongoing cycle of development, testing, and release. Add to this an evolution development, testing, and release. Add to this an evolution and growing complexity of the platforms on which these and growing complexity of the platforms on which these systems run, and it becomes clear there are a lot of systems run, and it becomes clear there are a lot of moving pieces that must fit together seamlessly to moving pieces that must fit together seamlessly to guarantee the success and long-term value of a product or guarantee the success and long-term value of a product or projectproject..

Release management

My final project

My engine

version

JS files

C++ codeSoftware binary code

I don’t want to compile ALL the code, but just some fragments of it.

I need an external tool that will allow me to chose the exact fragments that are necessary for the specific version.

How does a compiler works?There are countless source languages and target languages and there are many kinds of compilers as well.However, despite of this apparent complexity, the fundamental tasks that any compiler must implement are virtually the same.There are two parts to compilation:1) Analysis – breaking up the source program into

constitute pieces and creates an intermediate representation (IR) of the source program.

2) Synthesis – construction of the desired target code from the IR.

Analysis

In order to break up the structure and to understand the

meaning of the program, the compiler will execute the

three phases in the analysis stage:

1) Lexical analysis – breaking the input to words or tokens.

2) Syntax analysis – parsing the phrase structure of the program

3) Semantic analysis – calculating the meaning of the program.

Intermediate representation (IR)

Intermediate representation is a data structure that is

constructed from the data to a program and from

which parts of the output data of the program is

constructed in turn.

Lexical analysisAlso called linear analysis or scanning.

In this phase we will break the source code to tokens , place

them on a tree with a left to right reading process.

Example:

position = initial +rate*60;

Would be grouped as:

1) The identifier position.

2) The assignment symbol.

3) The identifier initial

4) The plus sign

5) The identifier rate.

6) Times sign

7) The number 60

Syntax analysisAlso called hierarchical analysis or parsing.

In this phase we will group the tokens into grammatical

phrases into a syntax tree.

Note:

The division between syntax analysis and lexical analysis is

debatable.

typedef char* string;typedef struct A_stm_ *A_stm;typedef struct A_exp_ * A_exp;typedef struct A_expList_ *A_expList;typedef enum {A_plus, A_minus, A_times, A_div} A_binop;

//statement structstruct A_stm_ {enum {A_compoundStm, A_assignStm, A_printStm }kind;union{struct {A_stm stm1, stm2;}compound;

struct {string id, A_exp exp;}assign; struct {A_expList exps;} print;

{ u;;{

//statement types constructorsA_stm A_CompoundStm(A_stm stm1, A_stm stm2)}A_stm s = malloc(sizeof(*s));if(s ==null)}stderr("error allocating memory");exit(0);

{ s->kind = A_compoundStm;s->u.compound.stm1 = stm1; s->u.compound.stm2 = stm2;return s;

{

A_stm A_AssignStm(string id, A_exp exp){A_stm s = malloc(sizeof(*s));

if(s ==null){stderr("error allocating memory");

exit(0); }

s->kind = A_assignStm;s->u.assign.id = id; s->u.assign.exp = exp;return s;}

A_stm A_PrintStm ( A_expList exps){A_stm s = malloc(sizeof(*s));

if(s ==null){stderr("error allocating memory");

exit(0); }

s->kind = A_printStm;s->u.print.exps = exps;return s;}

//Expression structstruct A_exp_ {enum { A_idExp, A_numExp, A_opExp, A_eseqExp }

kind;union { string id; int num; srtuct { A_exp left; A_binop

oper; A_exp right;}op struct { A_stm stm; A_exp

exp ; }eseq; } u;};

A_exp A_IdExp (string id){A_exp e = malloc(sizeof(*e));

if(e ==null){stderr("error allocating memory");

exit(0); }

e->kind = A_idExp;e->u.id = id;return e;}

//Expression constructorsA_exp A_numExp ( int num )}A_exp e = malloc(sizeof(*e));if(e ==null)}stderr("error allocating memory");exit(0);

{ e->kind = A_numExp;e->u.num = num;return e;

{

A_exp A_opExp ( A_stm stm , A_exp exp )}A_exp e = malloc(sizeof(*e));if(e ==null)}stderr("error allocating memory");exit(0);

{ e->kind = A_opExp;e->u.op.left = left;e->u.op.oper = oper;e->u.op.right = right;return e;

{

A_exp A_EseqExp ( A_exp left , A_binop oper , A_exp right )

}

A_exp e = malloc(sizeof(*e));

if(e ==null)}

stderr("error allocating memory");

exit(0);

{

e->kind = A_eseqExp;

e->u.eseq.stm = stm; e->u.eseq.exp = exp;e->u.op.right = right;

return e;

{

//experessions list struct

struct A_expList_ {enum { A_pairExpList, A_lastExpList } kind;

union { struct {A_exp head; A_expList tail;} pair;

A_exp last;

{u;

//expression list constructor

A_expList A_PairExpList (A_exp head, A_expList tail)

{

A_expList el = malloc (sizeof(el));

If(el == null){


exit(0);

}

el->kind = A_pairExpList;

el->u.pair.head = head;

el->u.pair.tail = tail;

return el;

}

A_expList A_LastExpList (A_exp last)

{

A_expList el = malloc (sizeof(el));

If(el == null){


exit(0);

}

el->kind = A_lastExpList;

el->u.last = last;

return el;

}

Syntax analysis exampleA_stm prog =

A_CompoundStm(A_AssignStm(“a”,

A_OpExp(A_NumExp(5),A_plus, A_numExp(3))),

A_CompoundStm(A_AssignStm(“b”,

A_EseqExp(A_printStm(A_PairExpList(A_IdExp(“a”),

A_LastExpList(A_OpExp(A_IdExp(“a”),A_minus,

A_NumExp(1))))),

A_OpExp(A_NumExp(10), A_times, A_IdExp(“a”)))),

A_printStm(A_LastExpList(A_IdExp(“b”)))));

syntax analysis treecompoundStm

assignStm

compoundStmopExpa

numExp

numExp

plus

35

assignStm

b eseqExp

printStmopExp

b

idExp

printStm

lastExpList

pairExpList

idExp

a

lastExpList

opExp

idExp numExpminus

1a

idExpnumExp times

a10

Semantic analysis

In this phase we will check the source program for semantic

errors and gather type information for the code generation

phase.

There are many checks included in the semantic analysis

phase but the most important one is the type checking.

Type checking verifies that each operator has the permitted

number of operands.

To do that, we need to write all our identifiers on a special

data structure called the symbol table.

Symbol tableThe basic data structure of the symbol table is the HASH

table, that allows us to find each key in a constant complexity

key1 key2 key3 key6key4 key5

value Hash function

value1

value2

struct bucket {string key; void* binding; struct bucket* next;};

#define SIZE 109

struct bucket *table[SIZE];

Unsigned int hash (char* s0){Unsigned int h =0 ; char *s;for(s=s0;*s;s++)

h=h*65599 +*sreturn h;}

struct bucket* Bucket(string key, void* binding, struct bucket* next){struct bucket* b= malloc(sizeof(*b));

if(b==null){stderr(“error allocating

memory”);exit(0);

}b->key = key; b->binding = binding; b->next = next;return b;}

void insert (string key, void *binding){int index = hash(key)%SIZE;Table[index] = Bucket(key,binding, table[index]);}

void * loockup(string key){int index = hash(key)%SIZE;struct bucket *b;for(b= table[index]; b; b= b->next)If(0==strcmp(b->key,key))return b->binding;return null;}

Implementation of hash table

Typedef struct S_symbol_ *S_symbol;Struct S_Symbol_ { string name, S_symbol next;};

Types moduletypedef struct TY_ty_ *TY_ty;typedef struct TY_tyList_ *TY_tyList;typedef struct TY_field_ *TY_field;typedef struct TY_fieldList_ *TY_fieldList;

struct TY_ty_ {enum{Ty_record, Ty_nil, Ty_int, Ty_string, ty_array, Ty_name, Ty_void} kind;union {TY_fieldList record; TY_ty array; struct{ S_symbol sym; Ty_ty ty;}name}name;} u;};

TY_ty TY_Nill(){TY_ty ty = malloc(sizeof(*ty));

if(ty == null){stderr(“error allocating memory”);

exit(0);}

Ty->kind = Ty_nil; return ty;}TY_ty TY_Int();TY_ty TY_String();TY_ty TY_Void();

struct TY_tyList_ { TY_ty head; TY_tyList tail};

TY_tyList TY_TyList(TY_ty head, TY_tyList tail){TY_tyList tyl = malloc(sizeof(*tyl));

if(tyl==null){ stderr(“error allocating memory”);

exit(0);}

tyl->head = head; tyl->tail = tail;return tyl;}

struct TY_field_ {S_symbol name; TY_ty ty};TY_field TY_Field(S_symbol name; TY_ty ty); {…}

struct TY_fieldList {TY_field head; TY_fieldList tail};TY_fieldList TY_FieldList (TY_field head, TY_fieldList tail);

Type checking

Now it’s all simple,

All we need to do is to make a left to right scan over the syntax tree

produced in the syntax analysis, and each time we found an

operators, we need to check the descending nodes of the current

operators.

compoundStm

assignStm

compoundStmopExpa

numExp

numExp

plus

35

assignStm

b eseqExp

printStmopExp

b

idExp

printStm

lastExpList

pairExpList

idExp

a

lastExpList

opExp

idExp numExpminus

1a

idExpnumExp times

a10

Now after the analysis is completed, we know the meaning of the source code, we know the code is correct, we can generate the intermediate code.

Intermediate code

typedef struct T_stm_ *T_stm;struct T_stm_ { enum {T_SEQ, T_LABEL, T_JUMP, T_CJUMP, T_MOVE, T_EXP} kind;

union{ struct {T_stm left , right} SEQ; Label LABEL; struct { T_exp dst; labelList labels } JUMP; struct { T_relOP op; T_exp left, right; Label true , false;} CJUMP; struct { T_exp dst , src;} MOVE; struct { T_exp exp; } T_EXP; } u; };

{T_stm Constructors…}typedef struct T_exp_ *T_exp;Struct T_exp_ {enum { T_BINOP, T_MEM, T_TEMP, T_ESEQ, T_NAME, T_CONST, T_CALL} kind;

union {struct {T_binOp op ; T_exp left, right;} BINOP; T_exp MEM; Temp_ TEMP; struct {T_stm stm ; T_exp exp;} ESEQ; Label NAME; int CONST; struct {T_exp exp; T_expList expList; } CALL; } u; };

{T_exp constructors…}

Intermediate code (cont)typedef struct T_expList_ *T_expList;struct T_expList_{T_exp head; T_expList tail;}T_expList T_ExpList (T_exp head, T_expList tail);

typedef struct T_stmList_ *T_stmList;struct T_stmList_{T_stm head; T_stmList tail;}T_stmList T_StmList (T_stm head, T_stmList tail);

Typedef enum {T_plus, T_minus, T_mul, T_div, T_and, T_or, T_lshift, T_rshift, T_arshift, T_xor} T_binOp;Typedef enum {T_eq, T_ne, T_lt, T_div, T_gt, T_le, T_ge, T_ult, T_ule, T_ugt , T_uge} T_relOp;

So far we have dealt with expressions that computes a value, we must expend it to expressions that do not compute values, namely : void functions (or procedures), while instructions and Boolean conditions thatmay jump to true or false labels.

Intermediate code (cont)

//translation:

typedef struct Tr_exp_ *Tr_exp;struct Cx {patchList trues , falses ; T_stm stm;};struct Tr_exp_ {enum {Tr_ex , Tr_nx, Tr_cx} kindunion {T_exp ex; T_stm nx ; struct Cx cx;}u;};{constructors…}

//patch list:typedef struct patchList_ *patchList;struct patchList_ {Label *head ; patchList tail};patchList PatchList(Label *head , patchList tail);

Tr_ex – stands for expressions Tr_nx – stands for “no result”Tr_cx – stadns for conditions, the statement may jump to one of the true or false labels in the two given lists.

What do we get so far?Position = initial + rate * 60;

Lexical analysis

Id1, EQ, id2, PL, id3, Mul, number(60), endl

Syntax analysis

assignStm

opExp

opExpid1

id2 +

id3 * numExp

60

Semantic analysis

Intermediate representation has been generated, we are ready for the

synthesis phase, now we can generate the machine code.

What do we get so far? (cont)

Intermediate code generator

temp 1 = number(60)

temp2 = id3 * temp1

temp 3 = id2 + temp2

id1 = temp3

Synthesis

Now that we have the intermediate representation, we

can generate the machine code.

In fact , the intermediate code is a code for an abstract

machine, so all we need to take care of in the Synthesis

phase is:

1) Instruction selection – finding the appropriate machine instructions to implement a given intermediate tree.

2) Registers allocation – allocations of variables to machine registers.

Instruction selection

Unlike other phases when we performed a left to right

scan over the tree, this time the scan will be DFS.

Our intention is to find tree patterns.

NAEMEffectTree pattern

-r[i]TEMP

ADDr[i] <- r[j] + r[k]

MULr[i] <- r[j] * r[k]

DIVr[i] <- r[j] / r[k]

ADDCr[i] <- r[j] / c

LOADr[i] <- M[r[j]+c]

STOREM[r[j]+c] <- r[i]

+

*

/

+

const

+

const

const

MEM

const

+

MEM

const

+

MEM

const

MEM

MEM

const

+

MOVE

MEM

const

+

MOVE

Maximum munchIn order to generate a target code with the minimal number of machine

instructions, we need to find the optimal tiling for a pattern.

An optimal tiling is one where no adjacent tiles can be combined into a

single tile.

There’s an algorithm that finds the optimal tiling in a tree, and it’s the

maximum munch also known as largest match.

Maximum munch has the concept of a greedy algorithm, once it found the

largest match, it doesn’t need any improvements afterward.

Maximum munch (cont)void maximuMunchStm (T_stm s){switch(prog->kind) {case T_SEQ: {maximuMunchStm (s->u.left);maximuMunchStm (s->u.right);}

case T_MOVE{T_exp dst = s->u.MOVE.dst , src = s->u.MOVE.src; if (dst ->kind==T_MEM)

if(dst->u.MEM->kind ==T_BINOP && dst->u.BINOP.op==T_plus && dst->u.MEM->u.BINOP.right->kind == T_CONST{

T_exp e1 = dst->u.MEM->u.BINOP.left, e2=src;munchExp(e1); MunchExp(e2); emit(“STORE”);

else if(dst->u.MEM->kind ==T_BINOP && dst->u.BINOP.op==T_plus && dst->u.MEM->u.BINOP.left->kind == T_CONST{

T_exp e1 = dst->u.MEM->u.BINOP.right, e2=src;munchExp(e1); MunchExp(e2); emit(“STORE”);

else if (src->kind==T_MEM ){T_exp e1 = dst->u.MEM, e2=src->u.MEM;munchExp(e1) ; munchExp(e2) emit(“MOVEM”);

else{T_exp e1 = dst->u.MEM, e2=src;munchExp(e1) ; munchEcp(e2); emit(“STORE”);else if (dst->kind == T_TEMP) {T_exp e2 = src; munchExp(e2); emit(“ADD”);}

MunchExp(T_exp exp){switch (exp->kind){case T_ESEQ:{maximummunchStm(Exp->u.ESEQ->stm); MunchExp(Exp->u.ESEQ->exp);}case T_MEM: {T_exp e =exp->u.MEM ;If(e.kind ==T_BINOP && e->u.BINOP.op = T_plus && e->u.BINOP.right->kind ==T_CONST){MunchExp(e->u.BINOP.left); emit (itoa(e->u.BINOP.right->u.CONST));emit (“LOAD”);}else If(e.kind ==T_BINOP && e->u.BINOP.op = T_plus && e->u.BINOP.left->kind ==T_CONST){MunchExp(e->u.BINOP.right); emit (itoa(e->u.BINOP.left->u.CONST)); emit (“LOAD”);}else if (e.kind ==T_CONST){emit (itoa(e->u.BINOP.right->u.CONST)); emit (“LOAD”);}else {emit (“LOAD”);}case T_BINOP:{T_exp e = exp->u.BINOP;If(e->kind == T_plus && e->right->kind == T_CONST){MunchExp(e->u.BINOP.left); emit (itoa(e->u.BINOP.right->u.CONST));emit (“ADD1”);}else if(e->kind == T_plus && e->left->kind == T_CONST){MunchExp(e->u.BINOP.right); emit (itoa(e->u.BINOP.left->u.CONST)); emit (“ADD1”);}

//cont…

else If(e->kind == T_minus && e->right->kind == T_CONST){MunchExp(e->u.BINOP.left); emit (itoa(e->u.BINOP.right->u.CONST));emit (“SUB1”);}

else if (b->kind = T_plus){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“ADD”);}

else if (b->kind = T_minus){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“SUB”);}

else if (b->kind = T_mul){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“MUL”);}

else if (b->kind = T_div){MunchExp(e->u.BINOP.left); MunchExp(e->u.BINOP.right);emit (“DIV”);}

case T_CONST:{emit (“ADD1”);}}

Register allocation

All the phases we discussed assumes that there are an

Unlimited amount of registers.

We know that this number is limited and hence we need

a method to deal with it.

Two temporaries can fit into the same register if the are

not “in use” at the same time, so the compiler needs to

analyze the intermediate program to determine which

temporaries are in use at the same time.

This phase called, liveness analysis.

Control flow graphIn order to solve the problem, we’ll use control flow graph

the nodes in the graph stands for the statements.

If the statement X can be followed by statement y the edge

(x,y) exists on the graph.

a <- 0L1 : b <- a+1c <- c+bA <- b*2If a < N goto L1return c

b=a+1

2

a=0

1

c=c+b

3

a=b*2

4

a<N

5

return c

5

Liveness analysis

- A flew graph has out-edges that leads to the successors (succ).

- A flew graph has out-edges that leads to the predecessors (pred).

- An assignment to a variable or temporary defines it (def).

- An occurrence of a variable or temporary on the right side of the

assignment uses it (use).

terminology

Liveness of a variable

A variable lives on an edge if there’s a path to a use of the

variable through that edge that doesn’t go through any of it’s def

A variable is live-in a node if it lives on any one of it’s in edges.

A variable live-out a node if it lives on any one of it’s out edges.

definition

equations

][

][][

])[][(][][

nsuccs

sinnout

ndefnoutnusenin

Liveness of a variable (cont)algorithm

][][][][

}

}

][][

])[][(][][

][][

][][

){(

{

}

{];

{};

){(

''

][

'

'

noutnoutninnin

until

sinnout

ndefnoutnusenin

noutnout

ninnin

nforeach

repeat

out

in

nforeach

nsuccs

Liveness of a variable (cont)

Run time complexity

For a control flow graph with N nodes:

The first foreach provides N iterations.

Then there’s a nested loops, in which the inner loop is a

foreach and each iteration of the foreach loop has a

union operation in it, worst case for this union is if the

control flow graph is whole, and the union will provide N

iterations maximum, so with the complexity of the inner

foreach loop is

Each iteration of the repeat iteration deals with a single

edge that can be either in or out edge, so the

complexity of the repeat loop is

)( 2NO

)( 2NO

Liveness of a variable (cont)

Run time complexity (cont)

The worst case complexity of the algorithm is

However in reality, it runs in time between and

)( 4NO

)( 2NO)(NO

Documents

Automatic compilation