Upload
sunita-aher
View
191
Download
1
Embed Size (px)
Citation preview
System Programming
Walchand Institute of Technology
Aim:
Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators
Theory:
Figure 1 depicts the schematic
performs analysis of the source program and reflects its
representation. The second pass reads and analyses the IR, instead
program, to perform synthesis
processing of the source program.
Figure 1: Two pass schematic for la
The Front End
The front end performs
• Lexical analysis,
• Syntax analysis and
• Semantic analysis
Each kind of analysis involves the following functions:
1. Determine validity
analysis.
2. Determine the ‘content’
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
HANDOUT#09
Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators
depicts the schematic of a two pass language processor.
the source program and reflects its results in the intermedi
representation. The second pass reads and analyses the IR, instead
program, to perform synthesis of the target program. This avoids repeated
the source program.
Figure 1: Two pass schematic for language processing
Lexical analysis,
Syntax analysis and
Semantic analysis of the source program.
analysis involves the following functions:
Determine validity of a source statement from the viewpoint
Determine the ‘content’ of a source statement.
Sunita M. Dol, CSE Dept
Page 1
Design lexical analyzer for tokens: keywords, identifiers, numbers, and operators.
language processor. The first pass
results in the intermediate
representation. The second pass reads and analyses the IR, instead of the source
the target program. This avoids repeated
nguage processing
a source statement from the viewpoint of the
System Programming
Walchand Institute of Technology
3. Construct a suitable representation
subsequent analysis functions, or
processor.
The word ‘content’ has different co
analysis.
• In lexical analysis, the content is the lexical class to which each lexical unit
belongs,
• In syntax analysis it is the syntactic structure
In semantic analysis the content is t
statement, it is the sef of
mensionality), while for an imperative statement, it is the sequence
implied by the statement.
Figure: Front en
Output of the front end
The IR produced by the front end consists
1. Tables of information
2. An intermediate code
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
Construct a suitable representation of the source statement for use
quent analysis functions, or by the synthesis phase
The word ‘content’ has different connotations in lexical, syntax and semantic
In lexical analysis, the content is the lexical class to which each lexical unit
In syntax analysis it is the syntactic structure of a source statement.
In semantic analysis the content is the meaning of a statement
of attributes of a declared variable (e.g. type, length and di
mensionality), while for an imperative statement, it is the sequence
Figure: Front end of the toy compiler
the front end consists of two components:
information
intermediate code (IC) which is a description of the source program.
Sunita M. Dol, CSE Dept
Page 2
the source statement for use by
the synthesis phase of the language
nnotations in lexical, syntax and semantic
In lexical analysis, the content is the lexical class to which each lexical unit
a source statement.
a statement—for a declaration
a declared variable (e.g. type, length and di-
mensionality), while for an imperative statement, it is the sequence of actions
the source program.
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 3
The Back End
The back end performs
• Memory allocation: Memory allocation is a simple task given the presence
of the symbol table. The memory requirement of an identifier is computed
from its type, length and dimensionality, and memory is allocated to it. The
address of the memory area is entered in the symbol table.
• Code generation: Code generation uses knowledge of the target architecture,
viz. knowledge of instructions and addressing modes in the target computer,
to select the appropriate instructions. The important issues in code
generation are:
o Determine the places where the intermediate results should
be kept, i.e. whether they should be kept in memory
locations or held in machine registers. This is a preparatory
step for code generation.
o Determine which instructions should be used for type
conversion operations.
o Determine which addressing modes should be used for
accessing variables.
System Programming
Walchand Institute of Technology
Figure : Back end of the toy compiler
Lexical analysis (Scanning)
Lexical analysis identifies the lexical units in a source statement. It then
the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and
enters them into different tables. Lexical analysis builds a descriptor, called a
token, for each lexical unit. A token contains two fields
in class, class code identifies the class to which a lexical unit belongs;
class is the entry number
Example: The statement a
Input:
TRIAL.CPP
Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur
Figure : Back end of the toy compiler
Lexical analysis (Scanning)
Lexical analysis identifies the lexical units in a source statement. It then
the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and
enters them into different tables. Lexical analysis builds a descriptor, called a
, for each lexical unit. A token contains two fields—class code
identifies the class to which a lexical unit belongs;
is the entry number of the lexical unit in the relevant table.
statement a := b+i; is represented as the string of
Sunita M. Dol, CSE Dept
Page 4
Figure : Back end of the toy compiler
Lexical analysis identifies the lexical units in a source statement. It then classifies
the units into different lexical classes, e.g. id's, constants, reserved id’s, etc, and
enters them into different tables. Lexical analysis builds a descriptor, called a
class code, and number
identifies the class to which a lexical unit belongs; number in
the lexical unit in the relevant table.
of tokens
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 5
# include< stdio.h>
# include< conio.h>
void main()
{
int num1= 5 , count= 1 , ab= 10 ;
char ch;
printf ( "This is a trial program" ) ;
while ( count ! = num1 )
{
ab = ab* count/ 2 ;
if ( count== 3 )
count= count+ 1 ;
printf ( "AB %d" , ab);
}
}
Output:
Token ID=0 # Special Character
Token ID=1 include Keyword type
Token ID=2 < Special Character
Token ID=3 stdio Keyword type
Token ID=4 . Special Character
Token ID=5 h Identifier type
Token ID=6 > Special Character
Token ID=7 # Special Character
Token ID=8 include Keyword type
Token ID=9 < Special Character
Token ID=10 conio Keyword type
Token ID=11 . Special Character
Token ID=12 h Identifier type
Token ID=13 > Special Character
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 6
Token ID=14 void Keyword type
Token ID=15 main Keyword type
Token ID=16 ( Special Character
Token ID=17 ) Special Character
Token ID=18 { Special Character
Token ID=19 int Keyword type
Token ID=20 num1 Identifier type
Token ID=21 = Operator type
Token ID=22 5 Numeric type
Token ID=23 , Special Character
Token ID=24 count Identifier type
Token ID=25 = Operator type
Token ID=26 1 Numeric type
Token ID=27 , Special Character
Token ID=28 ab Identifier type
Token ID=29 = Operator type
Token ID=30 10 Numeric type
Token ID=31 ; Special Character
Token ID=32 char Keyword type
Token ID=33 ch Identifier type
Token ID=34 ; Special Character
Token ID=35 printf Keyword type
Token ID=36 ( Special Character
Token ID=37 This is a trial program Literal type
Token ID=38 ) Special Character
Token ID=39 ; Special Character
Token ID=40 while Keyword type
Token ID=41 ( Special Character
Token ID=42 count Identifier type
Token ID=43 ! Special Character
Token ID=44 = Operator type
Token ID=45 num1 Identifier type
Token ID=46 ) Special Character
Token ID=47 { Special Character
Token ID=48 ab Identifier type
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 7
Token ID=49 = Operator type
Token ID=50 ab Identifier type
Token ID=51 * Operator type
Token ID=52 count Identifier type
Token ID=53 / Operator type
Token ID=54 2 Numeric type
Token ID=55 ; Special Character
Token ID=56 if Keyword type
Token ID=57 ( Special Character
Token ID=58 count Identifier type
Token ID=59 = Operator type
Token ID=60 = Operator type
Token ID=61 3 Numeric type
Token ID=62 ) Special Character
Token ID=63 count Identifier type
Token ID=64 = Operator type
Token ID=65 count Identifier type
Token ID=66 + Operator type
Token ID=67 1 Numeric type
Token ID=68 ; Special Character
Token ID=69 printf Keyword type
Token ID=70 ( Special Character
Token ID=71 AB %d Literal type
Token ID=72 , Special Character
Token ID=73 ab Identifier type
Token ID=74 ) Special Character
Token ID=75 ; Special Character
Token ID=76 } Special Character
Token ID=77 } Special Character
Conclusion:
� In lexical analysis, the content is the lexical class to which each lexical unit
belongs
System Programming Sunita M. Dol, CSE Dept
Walchand Institute of Technology, Solapur Page 8
� Lexical analysis identifies the lexical units in a source statement. It then
classifies the units into different lexical classes, e.g. id's, constants, reserved
id’s, etc, and enters them into different tables.
� Lexical analysis builds a descriptor, called a token, for each lexical unit.
� Thus the lexical analyser is implemented in the C-language.