Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi

Preview:

Citation preview

Compiler Construction

By:Muhammad Nadeem

Edited By:M. Bilal Qureshi

Compiler Construction

Lecture 2

3

Compilation Process

Source Code Compilation Process Object Code

Error Messages

Something we can understand

easily

Something that computer can

understand easily

Analysis

Phases of a Compiler(Structure of Compiler)

Synthesis ( front end of compiler) ( back end of compiler)

5

Source Code

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Code Optimizer

Code Generator

Object Code

SymbolTable

Manager

ErrorHandler

Synthesis

Analysis

Synthesis

Tokens

Syntax Tree

Syntax Tree

Intermediate Representation

Intermediate Representation

6

Analysis part breaks up the source program into constituent pieces and checks grammar and syntax. It then uses this structure to generate intermediate representation of the source program.

If the source program is detected syntactically incorrect or semantically unsound then proper error messages are generated so that the user may take proper action.

Symbol Table is a data structure that collects information about the source program and pass it to the Synthesis part along with the intermediate representation.

Synthesis part constructs the desired target program from the intermediate representation and symbol table information.

7

Example: Z = X + 10;

Token Token_ID

Z 1

= 2

X 1

+ 2

10 3

Symbol table

1 Variable

2 Operator

3 Number

Lexical Analyzer (Scanner)

9

Lexical Analyzer (Scanner)

It reads a stream of characters and groups the characters into tokens

Learn by Example Position = initial + rate*60

Tokens Generated 1. Identifier#1 Position2. Assignment Operator =3. Identifier#2 initial4. Addition Operator +5. Identifier#3 rate6. Multiplication Operator *7. Number 60

Learn by doingPercentage = Marks_Obtained / Total * 100

10

Source Code

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Code Optimizer

Code Generator

Object Code

SymbolTable

Manager

ErrorHandler

id1 = id2 + id3*number

Token

The activity of breaking stream of characters into tokens s called lexical analysis.

The lexical analyzer partition input string into substrings, called words, and classifies them according to their role.

Example: Consider if(b == 0)

a = bIn the above programming sentence the words are “if”, “(”,

“b”, “==”, “0”, “)”, “a”, “=” and “b”.The roles are keyword, variable, boolean operator, assignment

operator.

11

The pairs made by lexical analyzer are:<keyword, if> <symbol, (> <variable, b> <boolean operator, ==> <constant/number, 0> <symbol, )><variable, a> <assignment operator, => <variable, b>

The pair <role, word> is called token.

12

Specification/description of tokens

Regular Languages are the most popular for specifying tokens. Regular languages can be described using regular

expressions. Each regular expression is a notation for a regular language (a

set of words). If A is a regular expression, we write L(A) to refer to language denoted by A.

A regular expression (RE) is defined inductively a ordinary character from ∑ є the empty string R|S either R or S RS R followed by S (concatenation) R* concatenation of R zero or more times (R* = є |R|RR|RRR...)

13

Here are some REs and the strings of the language denoted by the RE.

RE Strings in L(R) a “a”

ab “ab”a|b “a” “b”(ab)* “” “ab” “abab” ...(a| є)b “ab” “b”

14

15

Role of Lexical Analyzer

1. Removal of white space2. Removal of comments3. Recognizes constants4. Recognizes Keywords5. Recognizes identifiers6. Correlates error messages with the

source program

16

1. Removal of white space

By white space we mean Blanks Tabs New lines

Why ? White space is generally used for formatting

source code.

A = B + C A=B+CEquals

17

Learn by Example // This is beginning of my codeint A; int B = 2;int C = 33;A = B + C ;/* This is end of my code*/

1. Removal of white space

18

2. Removal of comments

Why ? Comments are user-added strings which do

not contribute to the source codeExample in Java

// This is beginning of my codeint A; int B = 2;int C = 33;A = B + C ;/* This is end of my code*/

Means nothing to the program

Means nothing to the program

19

3. Recognizes constants/numbers

How is recognition done? If the source code contains a stream of digits

coming together, it shall be recognized as a constant.

Example in Java // This is beginning of my codeint A; int B = 2 ;int C = 33 ;A = B + C ;/* This is end of my code*/

20

4. Recognizes keywords

Keywords in C and Java If , else , for, while, do , return etc

How is recognition done? By comparing the combination of letters keywords pre defined in the grammar of the

programming language Example in Java int A; int B = 2 ;int C = 33 ;If ( B < C )

A = B + C ;else

A = C - B

Considered a keyword if character sequence 1. I2. N3. T

Considered a keyword if character sequence 1. I 2. F

Considered a keyword if character sequence 1. E 2. L 3.S 4.E

21

5. Recognizes identifiers

What are identifiers ? Names of variables, functions, arrays , etc

How is recognition done? If the combination of letters with/without digits in source code is not a keyword,

then compiler considers it as an identifier. Where is identifier stored ?

When an identifier is detected, it is entered into the symbol table Example in Java

// This is beginning of my codeint A; int B2 = 2 ;int C4R = 33 ;A = B + C ;/* This is end of my code*/

22

6. Correlates error messages with the source program

How ? Keeps track of the number of new line characters seen

in the source code Tells the line number when an error message is to be

generated. Example in Java

1. This is beginning of my code2. int A; 3. int B2 = 2 ;4. int C4R = 33 ;5. A = B + C ;6. /* This is7. end of 8. my code9. */

Error Message at line 1No // inserted in the beginning

23

Errors generated by Lexical Analyzer

1. Illegal symbols • E.g., =>

2. Illegal identifiers• E.g., 2ab

3. Un terminated comments• E.g., /* This is beginning of my code

24

Learn by example

// Beginning of Code int a char } switch b[2] =; // end of code

No error generated

Why ?

It is the job of syntax analyzer

25

Terminologies

• Lexeme– Actual sequence of characters that matches a pattern and has a given Token class.– Examples:

Identifier: Name, Data, xInteger: 345, 2, 0, 629

Pattern– The rules that characterize the set of strings for a token– Example:

Integer: A digit followed or not followed by digits Identifier: A character followed or not followed by characters or

digits

26

Syntax Analyzer (Parser)

28

Syntax Analyzer (Parser)

Uses the tokens produced by the lexical analyzer to create a tree-like intermediate representation.

Parse tree depicts the grammatical structure of the token stream.

Example Source Code --> Position = initial +

rate*60Lexical Analyzer --> id1= id2+ id3 * number

Parse Tree / Syntax Tree=

id1 id2 + id3 * number

29

=

id1 +

id2 Id3 * 60

Syntax Analyzer (Parser)

30

number

=

id1 +

id2 *

id3

position

initial

rate 60

Syntax Analyzer (Parser)

31

Learn by doing Percentage = Marks_Obtained / Total *

100

Syntax Analyzer (Parser)

32

Source Code

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Intermediate Code Generator

Code Optimizer

Code Generator

Object Code

ErrorHandler

number

=

id1 +

id2 *id3

position

initial

rate 60

Recommended