SYSTEM PROGRAMMING HANDOUT#09

System Programming

Walchand Institute of Technology

Aim:

Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators

Theory:

Figure 1 depicts the schematic

performs analysis of the source program and reflects its

representation. The second pass reads and analyses the IR, instead

program, to perform synthesis

processing of the source program.

Figure 1: Two pass schematic for la

The Front End

The front end performs

• Lexical analysis,

• Syntax analysis and

• Semantic analysis

Each kind of analysis involves the following functions:

1. Determine validity

analysis.

2. Determine the ‘content’

Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur

HANDOUT#09

Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators

depicts the schematic of a two pass language processor.

the source program and reflects its results in the intermedi

representation. The second pass reads and analyses the IR, instead

program, to perform synthesis of the target program. This avoids repeated

the source program.

Figure 1: Two pass schematic for language processing

Lexical analysis,

Syntax analysis and

Semantic analysis of the source program.

analysis involves the following functions:

Determine validity of a source statement from the viewpoint

Determine the ‘content’ of a source statement.


Page 1

Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators.

language processor. The first pass

results in the intermediate

representation. The second pass reads and analyses the IR, instead of the source

the target program. This avoids repeated

nguage processing

a source statement from the viewpoint of the

System Programming


3. Construct a suitable representation

subsequent analysis functions, or

processor.

The word ‘content’ has different co

analysis.

• In lexical analysis, the content is the lexical class to which each lexical unit

belongs,

• In syntax analysis it is the syntactic structure

In semantic analysis the content is t

statement, it is the sef of

mensionality), while for an imperative statement, it is the sequence

implied by the statement.

Figure: Front en

Output of the front end

The IR produced by the front end consists

1. Tables of information

2. An intermediate code



Construct a suitable representation of the source statement for use

quent analysis functions, or by the synthesis phase

The word ‘content’ has different connotations in lexical, syntax and semantic

In lexical analysis, the content is the lexical class to which each lexical unit

In syntax analysis it is the syntactic structure of a source statement.

In semantic analysis the content is the meaning of a statement

of attributes of a declared variable (e.g. type, length and di

mensionality), while for an imperative statement, it is the sequence

Figure: Front end of the toy compiler

the front end consists of two components:

information

intermediate code (IC) which is a description of the source program.


Page 2

the source statement for use by

the synthesis phase of the language

nnotations in lexical, syntax and semantic

In lexical analysis, the content is the lexical class to which each lexical unit

a source statement.

a statement—for a declaration

a declared variable (e.g. type, length and di-

mensionality), while for an imperative statement, it is the sequence of actions

the source program.

System Programming Sunita M. Dol, CSE Dept

Walchand Institute of Technology, Solapur Page 3

The Back End

The back end performs

• Memory allocation: Memory allocation is a simple task given the presence

of the symbol table. The memory requirement of an identifier is computed

from its type, length and dimensionality, and memory is allocated to it. The

address of the memory area is entered in the symbol table.

• Code generation: Code generation uses knowledge of the target architecture,

viz. knowledge of instructions and addressing modes in the target computer,

to select the appropriate instructions. The important issues in code

generation are:

o Determine the places where the intermediate results should

be kept, i.e. whether they should be kept in memory

locations or held in machine registers. This is a preparatory

step for code generation.

o Determine which instructions should be used for type

conversion operations.

o Determine which addressing modes should be used for

accessing variables.

System Programming


Figure : Back end of the toy compiler

Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then

the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and

enters them into different tables. Lexical analysis builds a descriptor, called a

token, for each lexical unit. A token contains two fields

in class, class code identifies the class to which a lexical unit belongs;

class is the entry number

Example: The statement a

Input:

TRIAL.CPP




Lexical analysis (Scanning)

Lexical analysis identifies the lexical units in a source statement. It then



, for each lexical unit. A token contains two fields—class code

identifies the class to which a lexical unit belongs;

is the entry number of the lexical unit in the relevant table.

statement a := b+i; is represented as the string of


Page 4


Lexical analysis identifies the lexical units in a source statement. It then classifies



class code, and number

identifies the class to which a lexical unit belongs; number in

the lexical unit in the relevant table.

of tokens



# include< stdio.h>

# include< conio.h>

void main()

{

int num1= 5 , count= 1 , ab= 10 ;

char ch;

printf ( "This is a trial program" ) ;

while ( count ! = num1 )

{

ab = ab* count/ 2 ;

if ( count== 3 )

count= count+ 1 ;

printf ( "AB %d" , ab);

}

}

Output:

Token ID=0 # Special Character

Token ID=1 include Keyword type

Token ID=2 < Special Character

Token ID=3 stdio Keyword type

Token ID=4 . Special Character

Token ID=5 h Identifier type

Token ID=6 > Special Character

Token ID=7 # Special Character

Token ID=8 include Keyword type

Token ID=9 < Special Character

Token ID=10 conio Keyword type

Token ID=11 . Special Character

Token ID=12 h Identifier type

Token ID=13 > Special Character



Token ID=14 void Keyword type

Token ID=15 main Keyword type

Token ID=16 ( Special Character

Token ID=17 ) Special Character

Token ID=18 { Special Character

Token ID=19 int Keyword type

Token ID=20 num1 Identifier type

Token ID=21 = Operator type

Token ID=22 5 Numeric type

Token ID=23 , Special Character

Token ID=24 count Identifier type




Token ID=28 ab Identifier type



Token ID=31 ; Special Character

Token ID=32 char Keyword type

Token ID=33 ch Identifier type


Token ID=35 printf Keyword type


Token ID=37 This is a trial program Literal type



Token ID=40 while Keyword type



Token ID=43 ! Special Character


Token ID=45 num1 Identifier type


Token ID=47 { Special Character






Token ID=51 * Operator type


Token ID=53 / Operator type



Token ID=56 if Keyword type










Token ID=66 + Operator type



Token ID=69 printf Keyword type


Token ID=71 AB %d Literal type





Token ID=76 } Special Character

Token ID=77 } Special Character

Conclusion:

� In lexical analysis, the content is the lexical class to which each lexical unit

belongs



� Lexical analysis identifies the lexical units in a source statement. It then

classifies the units into different lexical classes, e.g. id's, constants, reserved

id’s, etc, and enters them into different tables.

� Lexical analysis builds a descriptor, called a token, for each lexical unit.

� Thus the lexical analyser is implemented in the C-language.

Engineering

SYSTEM PROGRAMMING HANDOUT#09