55
CS 426 Compiler Construction 1. Introduction

CS 426 Compiler Construction 1. Introduction. Prolog

Embed Size (px)

Citation preview

Page 1: CS 426 Compiler Construction 1. Introduction. Prolog

CS 426 Compiler Construction1. Introduction

Page 2: CS 426 Compiler Construction 1. Introduction. Prolog

Prolog

Page 3: CS 426 Compiler Construction 1. Introduction. Prolog

Prolog› This is a course about designing and building programming

language analyzers and translators.

› A fascinating topic:– Compilers as translators facilitate programming (increase productivity)

by › Presenting a high-level interface ( high level language ) › Enabling target system independence for portability › Detecting errors, defects› Applying optimizations. When they work, programmers struggle less to tune the

program to target machine. Acceptance of a programming language in some cases depends on compiler effectiveness.

› A bridge across areas– Languages– Machines– Theory, decidability, complexity

Page 4: CS 426 Compiler Construction 1. Introduction. Prolog

› Program translation and analysis are among the oldest Computer Science subject and have numerous applications.– Implementation of compilers that translate high level languages onto

machine language. A crucial part of computing since the early days.– Implementation of efficient interpreters. – Program analysis for

› Optimization› Parallelization/vectorization› Refactoring for readability› Static error detection.› Security

– Binary translation cross platforms to increase availability of software– Hardware synthesis, to translate from notations like Verilog and VHDL

onto RTL (register transfer language)– Database query implementation and optimization

Page 5: CS 426 Compiler Construction 1. Introduction. Prolog

› Performance today– Crucial for some applications:

› Real time systems› Games› Computational sciences

– Not so much for others› Highly interactive programs that mainly wait for user some pf the

time and I/O the rest of the time.› Computations for which interactiveness is more important thatn

speed (e.g. MATLAB, R, …)

Page 6: CS 426 Compiler Construction 1. Introduction. Prolog

The first commercial compiler

Page 7: CS 426 Compiler Construction 1. Introduction. Prolog

A little bit of compiler historyIn the beginning there was FORTRAN› The compiler (ca. 1956) was a momentous

accomplishment

› The top compiler algorithm of the century

› Accomplishment by John Backus and his small team even more impressive given how little was known then.

10 IEEE Computing in Science and Engineering, Jan 2000

Page 8: CS 426 Compiler Construction 1. Introduction. Prolog

Fortran ISubscript evaluation› The address of array element A(I,J,c3*K+6)

is base_A+I-1+(J-1)*D1+(c3*K+6-1*DI*DJ)

› There was no strength reduction, induction variable analysis, nor data flow analysis.

› They used a pattern matching so that every time “K is increased by n (under the control of a DO), the index quantity is increased by c3 DI DJ n, giving the correct value” (Backus, Western Joint Computer Conference, 1957)

Page 9: CS 426 Compiler Construction 1. Introduction. Prolog

Fortran IInduction variable analysis

› “… it was not practical to track down and identify linear changes in subscripts resulting from assignment statements. Thus, the sole criterion …for efficient handling of array references was to be that the subscripts involved were being controlled by DO statements”

Page 10: CS 426 Compiler Construction 1. Introduction. Prolog

Fortran IOperator precedence in Fortran I› A big deal.

› “The lack of operator priority (often called precedence …) in the IT language was the most frequent single cause of errors by the users of that compiler” Donald Knuth.

› The Fortran I algorithm:– Replace + and – with ))+(( and ))-(( respectively– Replace * and / with )*( and )/(, respectively– Add (( at the beginning of each expression and after each left

parenthesis in the original expression.– Add )) at the end and before each right parenthesis

› “The resulting formula is properly parenthesized, believe it or not” D. Knuth

Page 11: CS 426 Compiler Construction 1. Introduction. Prolog

Fortran IRegister allocation

› Extremely complex

› Used to manage the three index registers of the 704

› “… much of it was carried along into Fortran II and still in use in the 705/9/90. In many programs. In many programs it still contributes to the production of better code than can be achieved on the new Fortran IV compiler.” Saul Rosen

Page 12: CS 426 Compiler Construction 1. Introduction. Prolog

Fortran IA difficult chore

› “… didn’t really work when it was delivered.”

› “ At first people thought it would never be done.”

› “Then when it was in field test, with many bugs…, many thought it would never work. “

› “Fortran is now almost taken for granted, as if it were built into the computer hardware.”

                                     Saul Rosen, 1967

Page 13: CS 426 Compiler Construction 1. Introduction. Prolog

Fortran IThe challenge then

› “It was our belief that if FORTRAN, during its first months, were to translate any reasonable “scientific” source program into an object program only half as fast as its hand coded counterpart, then acceptance of our system would be in serious danger.”

John Backus

› How close they come to this goal? Hard to tell

› But we know they succeeded and this conference is a clear testimony of their success

Page 14: CS 426 Compiler Construction 1. Introduction. Prolog

Language processors

Page 15: CS 426 Compiler Construction 1. Introduction. Prolog

Language processors

› Languages can be– Translated

› Compiler› Source-to-source

– Interpreted– Processed by a combination of these two approaches

› Translation

compilersource program target program linker Executableexecutable

input

output

Page 16: CS 426 Compiler Construction 1. Introduction. Prolog

Language processors (Cont.)

› Translation (Cont.)

source-to-source

translator

source program

(language A)

source program

(language B)compiler target program linker Executableexecutable

input

output

Page 17: CS 426 Compiler Construction 1. Introduction. Prolog

Language processors (Cont.)

› Translation (Cont.)

translator

source program

byte code virtual machineinput

output

Just-in-time compiler

executable

Page 18: CS 426 Compiler Construction 1. Introduction. Prolog

The inside of a compiler

Lexical analyzer

Character stream

Syntax analyzer

Token stream

Semantic analyzer

Abstract syntax tree

High level optimizer

Abstract syntax tree

Intermediate code

generator

Abstract syntax tree

Low level optimizer

Intermediate representation

Code generator

Intermediate representation

Machine-specific

optimizer

Target machine code

Symbol Table

Page 19: CS 426 Compiler Construction 1. Introduction. Prolog

The inside of a compiler

Lexical analyzer

Character stream

Syntax analyzer

Token stream

Semantic analyzer

Low level optimizer

Intermediate representation

Code generator

Intermediate representation

Machine-specific

optimizer

Target machine code

Page 20: CS 426 Compiler Construction 1. Introduction. Prolog

The inside of a compiler

Lexical analyzer

Character stream

Syntax analyzer

Token stream

Semantic analyzer

Abstract syntax tree

High level optimizer

Abstract syntax tree

Intermediate code

generator

Abstract syntax tree

Low level optimizer

Intermediate representation

Code generator

Intermediate representation

Machine-specific

optimizer

Target machine code

Source to source optimizer

XHigh level language

Page 21: CS 426 Compiler Construction 1. Introduction. Prolog

The inside of a compiler

Lexical analyzer

Character stream

Syntax analyzer

Token stream

Semantic analyzer

Abstract syntax tree

High level optimizer

Abstract syntax tree

Intermediate code

generator

Abstract syntax tree

Low level optimizer

Intermediate representation

Code generator

Intermediate representation

Machine-specific

optimizer

Target machine code

Translator (for interpreter)

X Byte code

Page 22: CS 426 Compiler Construction 1. Introduction. Prolog

The inside of a compiler

Lexical analyzer

Character stream

Syntax analyzer

Token stream

Semantic analyzer

Abstract syntax tree

High level optimizer

Abstract syntax tree

Intermediate code

generator

Abstract syntax tree

Low level optimizer

Intermediate representation

Code generator

Intermediate representation

Machine-specific

optimizer

Target machine code

Front end

Page 23: CS 426 Compiler Construction 1. Introduction. Prolog

The inside of a compiler

Lexical analyzer

Character stream

Syntax analyzer

Token stream

Semantic analyzer

Low level optimizer

Intermediate representation

Code generator

Intermediate representation

Machine-specific

optimizer

Target machine code

Front end

Page 24: CS 426 Compiler Construction 1. Introduction. Prolog

The front end

Page 25: CS 426 Compiler Construction 1. Introduction. Prolog

25

The front end› It accepts the input language, including comments,

pragmas, and macros

› Translates text into data that is more easily manipulable by the compiler.– Abstract syntax tree, or – Intermediate representation

› Detects and reports syntactic and semantic errors.

› It is built based on a description of the source language– Formal for the syntax.– Informal (typically) for the semantics (although much has

been done to formalize the semantics).

Page 26: CS 426 Compiler Construction 1. Introduction. Prolog

Backus-Naur Form (BNF)› Introduced by John Backus to formally describe IAL [J. W.

Backus, The syntax and semantics of the proposed international algebraic language of the Zürich ACM-GRAMM conference. ICIP Paris, June 1959.]

› Adopted to represent ALGOL 60.

› Widely, but not universally , used to describe syntax today (with some extensions).

› A formal description enables automatic (or semi-automatic) generation of lexers and parsers.

Page 27: CS 426 Compiler Construction 1. Introduction. Prolog

BNF of simple syntactic objects<letter> a|b|c|…|z|A|B…|Z

<digit> 0|1|2|3|4|5|6|7|8|9

<integer> <digit>

| <digit> <integer>

<identifier> <digit>

|<letter>

|_

|<digit> <identifier>

|<letter> <identifier>

|_ <identifier>

Page 28: CS 426 Compiler Construction 1. Introduction. Prolog

BNF of simple syntactic objects

<expr> <term> <expr’>

<expr’> + <term> <expr’> | - <term> <expr’>

<term> <factor> <term’>

<term’> * <factor> <term’> | / <factor> <term’>

<factor> <integer>

| <identifier>

| ( <expression> )

Page 29: CS 426 Compiler Construction 1. Introduction. Prolog

BNF of a part of C<conditional-expression> <logical-or-expression>

| <logical-or-expression> ? <expression> : <conditional-expression>

<logical-or-expression> <logical-and-expression>

| <logical-or-expression> || <logical-and-expression>

<logical-and-expression> <inclusive-or-expression>

| <logical-and-expression> && <inclusive-or-expression>

<inclusive-or-expression> <exclusive-or-expression>

| <inclusive-or-expression> | <exclusive-or-expression>

<exclusive-or-expression> <and-expression>

| <exclusive-or-expression> ^ <and-expression>

<and-expression> <equality-expression>

| <and-expression> & <equality-expression>

Page 30: CS 426 Compiler Construction 1. Introduction. Prolog

Example 1 of modified BNF (Modula 2)

Page 31: CS 426 Compiler Construction 1. Introduction. Prolog

Example 2 of Modified BNF (Also Modula 2)

Page 32: CS 426 Compiler Construction 1. Introduction. Prolog

Example 3 of modified BNF (Fortran 95)

Page 33: CS 426 Compiler Construction 1. Introduction. Prolog

Example 4 of modified BNF (Java)

Page 34: CS 426 Compiler Construction 1. Introduction. Prolog

Parsing

› Parsing is the process used to – Determine if a string of characters belongs to the

language described by the BNF– Create the parse tree (not to be confused with the syntax

tree in the textbook which is called abstract syntax tree in these slides)

› The parse tree is seldom explicitly computed and the syntax analyzer typically generates an abstract syntax tree or intermediate code directly.

Page 35: CS 426 Compiler Construction 1. Introduction. Prolog

Example of parse tree<letter> a|b|c|…|z|A|B…|Z

<digit> 0|1|2|3|4|5|6|7|8|9

<integer> <digit>

| <digit> <integer>

<identifier> <digit>

|<letter>

|_

|<digit> <identifier>

|<letter> <identifier>

|_ <identifier>A_1

<letter> <digit>

<identifier>

<identifier>

<identifier>

123

<digit>

<digit>

<digit><integer>

<integer>

<integer>

Page 36: CS 426 Compiler Construction 1. Introduction. Prolog

Example of parse tree

1 * k + x / 5

<identifier><integer>

<term>

<factor>

<expr>

<term>

<factor>

<integer><identifier>

<factor> <factor>

<term>

<term>

<expr>

<expr> <expr> + <term>

| <expr> - <term>

| <term>

<term> <term> * <factor>

| <term> / <factor>

| <factor>

<factor> <integer>

| <identifier>

| ( <expression> )

Page 37: CS 426 Compiler Construction 1. Introduction. Prolog

Formal notion of a grammar› The BNF description of syntax involves four concepts

– Nonterminals: syntactic categories from which elements of the language can be derived. These are all the symbols on the LHS of the rules. (e.g. <expr>, <term>)

– Terminals: The actual elements of the language that are not expanded further. They do not appear on the left hand side of any rule (e.g. +, A,…)› Note: For practical reasons, parsing is typically done in two phases. First

some objects like <identifier> and <integer> are recognized by the lexical scanning phase. Then, the rest of the language is parsed assuming the objects recognized by lexical scanning are terminals.

– Productions: The rules of the language, relating nonterminals and terminals.

– The root: A distinguished not terminal that will be the root of all parse trees for elements of the language.

Page 38: CS 426 Compiler Construction 1. Introduction. Prolog

Formal notion of a grammar

› We denote a grammar G by (VN, VT, P, S) where VN is the set of nonterminals, VT is the terminals, P the productions, and SVN is the start symbol.

› L(G) is the language generated by G. This is defined as follows:– Let V= VN VT – If is a production with a string is V+ and in V*

(the right hand side can be , the empty string )› If , then G › If 1G 2 , 2G 3, … mG m, then 1G m

– L(G)={w|w is in VT* and S G w}

Page 39: CS 426 Compiler Construction 1. Introduction. Prolog

Formal notion of a grammar

› Example: Let G=(VN={S}, VT={0,1},P={S0S1,S 01},S)

› Then, since SG0S1G00S11G…G0n-1S1n-1G0n1n

, we have that L(G) = {0n1n|n1}.

Page 40: CS 426 Compiler Construction 1. Introduction. Prolog

Classes of grammars› Classified according to productions

– Type 0, the most general class, defined above.– Type 1 or context sensitive

› such that . › Some use 1A2 12 with and A in VN . The class of languages generated are the

same › Second form motivates the name context sensitive.

– Type 2 or context free: A where A is in VN and in V+.

– Type 3 or regular grammars A aB or A a where A is in VN and a in VT.

› Languages are typically defined using context free grammars.

› Regular grammars are used for lexical scanning.

Page 41: CS 426 Compiler Construction 1. Introduction. Prolog

Formal notion of a grammar

› Example: Let G=(VN={S,B,C}, VT={a,b,c},P,S)

› With P= {1. S aSBC2. S aBC3. CB BC4. aB ab

› Then, L(G) = {anbncn|n1}. Proof?

5. bB bb6. bC bc

7. cC cc }

Page 42: CS 426 Compiler Construction 1. Introduction. Prolog

Multiple grammars, single language

› Different grammars can be equivalent, i.e. they generate the same language.

› Grammars can be modified to remove “undesirable properties”

› For example, it is better for the grammar not to be ambiguous. That is for it not to allow multiple parse trees for a given element of the language.

Page 43: CS 426 Compiler Construction 1. Introduction. Prolog

Example of ambiguous grammar (1/3)› A simple expression

involving only numbers, additions and multiplications can be described as follows:

<expr> <expr> + <term>

| <term>

<term> <term> * <factor>

| <factor>

<factor> <integer>

› A simpler but ambiguous version generating the same language would be:

<expr> <expr> + <expr>

| <expr> * <expr>

| <integer>

Page 44: CS 426 Compiler Construction 1. Introduction. Prolog

Example of ambiguous grammar (2/3)

1 + 2 + 3

<integer>

<expr><expr>

<integer>

<expr>

<integer>

<expr>

<expr>

1 + 2 + 3

<integer>

<expr><expr>

<integer>

<expr>

<integer>

<expr>

<expr>

4 * 5 + 6

<integer>

<expr><expr>

<integer>

<expr>

<expr>

4 * 5 + 6

<integer>

<expr><expr>

<integer>

<expr>

<integer>

<expr>

<expr>

Page 45: CS 426 Compiler Construction 1. Introduction. Prolog

Example of ambiguous grammar (3/3)

› The examples above also show that we need the right grammar to represent – Associativity (Sec. 2.2.5)– Precedence of operators (Sec. 2.2.6)

Page 46: CS 426 Compiler Construction 1. Introduction. Prolog

Left recursive grammar› Another example are left recursive productions.

› For example, we could define <identifier> as follows:<identifier> <digit>

|<letter>

|_

| <identifier> <digit>

| <identifier> <letter>

| <identifier> _

› The identifiers defined with these productions are exactly the same, but this version has the (sometimes undesirable) property of being left recursive.

Page 47: CS 426 Compiler Construction 1. Introduction. Prolog

A very simple compiler

Page 48: CS 426 Compiler Construction 1. Introduction. Prolog

Our first compiler› Now, we present a simple syntax analyzer for assignment statements.

› The parser will directly generate intermediate code.

› Our assignments are so simple that there is no need for semantic analyzer.

› We use a stack machine similar to that of the Java Virtual Machine. Here – memory(l) is location l of memory, – S is the stack – TOS is the top of the stack. – LIT A: S(++TOS) = A– LOAD: S(TOS)=memory(S(TOS)). – STORE: memory(S(TOS-1)) = S(TOS); TOS-=2;– NEG: S(TOS)=-S(TOS)

– ADD, MUL, DIV : S(TOS-1)=S(TOS-1) S(TOS); --TOS;

Page 49: CS 426 Compiler Construction 1. Introduction. Prolog

Our first compiler› Thus, for the assignment A = -A + 5 * B / (B-1)

› The compiler should generate› LIT A› LIT A› LOAD› NEG› LIT 5› LIT B› LOAD › MUL › LIT B› LOAD › LIT 1› NEG› ADD› DIV› ADD› STORE

Page 50: CS 426 Compiler Construction 1. Introduction. Prolog

Grammar

› We extend BNF with regular expressions to define our grammar.

<assignment> <identifier> = <expr>

<expr> (|-) <term> ((+|-)<term>)*

<term> <factor> ((*|/) <factor>)*

<factor> <integer> | <identifier> | (<expr>)

<integer> 0|1|2|3|4|5|6|7|8|9

<identifier> A|B|C …|Z

Page 51: CS 426 Compiler Construction 1. Introduction. Prolog

Recursive descent compiler

› There is only one variable: token, which has the value of the next character in the input line.

› The main program is as follows:

char token; token= nextchar(); // nextchar() skips spacesassignment();

Page 52: CS 426 Compiler Construction 1. Introduction. Prolog

Recursive descent compiler› Then, we just follow the productions when writing the

code: <assignment> <variable> = <expr>assignment(){ if isletter(token) identifier(); else throw error(); if token == “=” token=nextchar();

else throw error();

expr(); print(“STORE”); }

identifier(){ print(“LIT”); print(token); token=nextchar(); }

integer(){ print(“LIT”); print(token); token=nextchar(); }

Page 53: CS 426 Compiler Construction 1. Introduction. Prolog

Recursive descent compiler

› <expr> (|-) <term> ((+|-)<term>)*

expr(){ // check for unary minus if token == “-” {

token=nextchar(); term();

emit(“NEG”); }

else term();

//process sequence of +/- while (token == “-” | token == “+”){ char t=token; token=nextchar(); term(); if t ==“-” emit(“NEG”) emit(“ADD”) }}

Page 54: CS 426 Compiler Construction 1. Introduction. Prolog

Recursive descent compiler› <term> <factor> ((*|/) <factor>)*

term(){ factor() //process sequence of * and / while (token == “*” | token == “/”){ char t=token; token=nextchar(); factor(); if t ==“*” print(“MUL”); else print(“DIV”); }}

Page 55: CS 426 Compiler Construction 1. Introduction. Prolog

Recursive descent compiler› <factor> <integer> | <identifier> | (<expr>)

factor(){ if isletter(token){ identifier(); print(“LOAD”);} else if isdigit(token) integer(); else if token == “(“ { token = nextcar(); expression(); if token ==“)” token=nextchar(); else throw error(); } else throw error(); }