54
UNIT -IV

System Programming Unit IV

  • Upload
    mepatil

  • View
    1.625

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: System Programming Unit IV

UNIT -IV

Page 2: System Programming Unit IV

Software Development Tools

• LEX (Lexical Analyzer Generator )

• YACC (Yet Another Compiler Compiler )(Parse Generator)

Page 3: System Programming Unit IV

LEX

Page 4: System Programming Unit IV

• LEX accepts and input specification which consists of two components

• Specification of string representing Lexical units

• Specification of semantic action aimed at building TR (Translation Rule)

• TR consists of set of tables of lexical units and a sequence of tokens for the lexical units occurring in the source statement.

Page 5: System Programming Unit IV

YACC

Page 6: System Programming Unit IV

• YACC is available on Unix system.• YACC can be used for the production of compiler for PASCAL

FORTRAN C C ++• Lexical scanner must be supplied for use with YACC.• This scanner is called by the parser when ever a new input token is

needed.• The YACC parser generator accepts and input grammar for the

language being complied and set of actions corresponding to rules of grammar.

• The parser generated by YACC use the bottom up parse method.• The parser produced by YACC has very good error detection

properties.

Page 7: System Programming Unit IV

LEX & YACC

Page 8: System Programming Unit IV

Parsing

Page 9: System Programming Unit IV

• The scanner recognizes words • The parser recognizes syntactic units• Parser functions:

– Check validity of source string based on specified syntax rules

– Determine the syntactic structure of source string

Page 10: System Programming Unit IV

• For an invalid string, the parser issues a diagnostic message reporting the cause & nature of errors in the string

• For valid string, it builds a parse tree to reflect the sequence of the derivations or reduction performed during parsing.

• Each step in parsing can identify an elementary sub tree by deriving a string from an NT of reducing a string to an NT

Page 11: System Programming Unit IV

• Check and verify syntax based on specified syntax rules– Are regular expressions sufficient for describing

syntax?• Example 1: Infix expressions• Example 2: Nested parentheses

– We use Context-Free Grammars (CFGs) to specify context-free syntax.

• A CFG describes how a sentence of a language may be generated.

Page 12: System Programming Unit IV

CFG

• A CFG is a quadruple (N, T, R, S) where – N is the set of non-terminal symbols– T is the set of terminal symbols– S N is the starting symbol– R is a set of rules

• Example: The grammar of nested parenthesesG = (N, T, R, S) where – N = {S}– T ={ (, ) }– R ={ S (S) , SSS, S }

Page 13: System Programming Unit IV

Derivations

• The language described by a CFG is the set of strings that can be derived from the start symbol using the rules of the grammar.

• At each step, we choose a non-terminal to replace.

S (S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( )))

derivationsentential form

This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.

Page 14: System Programming Unit IV

Derivations and parse trees

• We can describe a derivation using a graphical representation called parse tree:– the root is labeled with the start symbol, S– each internal node is labeled with a non-terminal– the children of an internal node A are the right-hand

side of a production A– each leaf is labeled with a terminal

• A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)

Page 15: System Programming Unit IV

Derivations and parse trees

• So, how can we use the grammar described earlier to verify the syntax of "(( )((( ))))"?– We must try to find a derivation for that string.– We can work top-down (starting at the root/start

symbol) or bottom-up (starting at the leaves).• Careful!

– There may be more than one grammars to describe the same language.

– Not all grammars are suitable

Page 16: System Programming Unit IV

Types of Parsing

• Top-down parsing– Recursive Descent parser– Predictive parser

• Bottom-up parsing– Shift-reduce– Operator Precedence– LR Parser

Page 17: System Programming Unit IV

Top-down Parsing

• Starts with sentence symbol & Builds down towards terminal.

• It derives a identical string to a given I/P string by applying rules of grammar to distinguish symbol.

• Output would be a syntax tree for I/P string• At every stage of derivation, an NT is chosen &

derivation affected according to grammar rule.

Page 18: System Programming Unit IV

e.g. consider the grammarE T + E / TT V* T /V

V id

• Source string id + id * idPrediction Predicted Sentential Form

E T + E T + E

T V V+ E

V id id + E

E T id + T

T V* T id + V * T

V id id + id * T

T V id + id * V

V id id + id * id

Page 19: System Programming Unit IV

Limitations of Top-down parsing

1. The need of back tracking is must. Therefore semantic analysis cant be implemented with syntax analysis.

2. Back tracking slowdowns the parsing even if now semantic actions are performed during parsing.

3. Precise error indication is not possible in top down analysis. When ever a mismatch is encountered , the parser performs the standard action of backtracking. When no predictions are possible, the input string is declared erroneous.

Page 20: System Programming Unit IV

3. Certain grammar specification are not amendable (suitable) to top down analysis. The left-to-left nature of parser would push the parser into an infinite loop of prediction making. To make top-down parsing tensile ,it is necessary to rewrite a grammar so as to eliminate left recursion.

Page 21: System Programming Unit IV

e.g. consider the grammarE E+ E / E*E/E/id

• Source string id + id * id• BacktrackingApplied Rule Predicted Sentential

FormApplied Rule Predicted Sentential

Form

E E*E E* E E E+E E+E

E id id* E E id id + E

E E+ E Id * E+E E E*E Id + E*E

E id id *id + E E id Id + id * E

E id id *id + id E id Id + id * id

Page 22: System Programming Unit IV

e.g. consider the grammarE E+ E / E*E/E/id

• Source string id + id * id• Left recursionApplied Rule Predicted Sentential Form

E E*E E* E

E E*E E * E * E

E E*E E * E * E * E

E E*E E * E * E * E * E

E E*E E * E * E * E * E * E

Page 23: System Programming Unit IV

Top-Down parsing without backtracking

• Whenever a prediction has to be made for leftmost NT of sentential form, a decision would be made as to which RHS alternative for NT can be lead to a sentence resembling input string.

• We must select RHS alternative which can produce the next input symbol

• The grammar may too be modified to fulfill condition • Due to deterministic nature of parsing such parses are

know as predictive parses. A popular from of predictive parser used in practice is called recursive decent parser.

Page 24: System Programming Unit IV

• e.g.ET+E/TTV*T/VV id

• The modified grammar is-- ET E’E’+E/€TV T’T’*T/€V id

Page 25: System Programming Unit IV

Prediction Predicted sentential form

ET E’ T E’

TV T’ V T’ E’

V id id T’ E’

T’€ id E’

E’+E id + E

ET E’ id + T E’

T V T’ id + V T’ E’

V id id +id T’ E’

T’*T id + id * T E’

TV T’ id + id * V T’ E’

V id id + id * id T’E’

T’€ id + id * E’

E’€ id + id * id

Page 26: System Programming Unit IV

Recursive Descent Parser

• If recursive rule are exist in grammar then all these procedures will be recursive & such parse known as RDP.

• It is constructed by writing routines to recognize each non-terminal symbol.

• It is well suited for many type of attributed grammar.

• Synthesized attribute can be used because it gives depth-first construct of parse tree

• It uses simple prediction parsing strategy.

Page 27: System Programming Unit IV

• Error detection is restricted to routines which gives defined set of symbols in first position.

• It makes possible recursive call to parse procedures till the required terminal string is obtain.

• RDP are easy to construct if programming language permits.

Page 28: System Programming Unit IV

Predictive Parser(Table Driven Parser)

• When recursion is not permitted by programming language in that case these parsers are used.

• These are the table driven parsers, uses prediction technique to eliminate back tracking.

• For a given NT a prediction & a first terminal symbol is produced.

Page 29: System Programming Unit IV

• A parse table indicates what RHS alternative is used to make prediction.

• It uses its own stack to store NT for which prediction is not yet made.

Page 30: System Programming Unit IV

• e.g.ET+E/TTV*T/VV id

• The modified grammar is-- ET E’E’+TE’/€TV T’T’*VT’/€V id

Page 31: System Programming Unit IV

Parse Table NT Source Symbol

id + * -|E ET E’

E’ E’+TE’ E’ €

T TV T’

T’ T’*VT’ T’ €

V V id

Page 32: System Programming Unit IV

Prediction Symbol Predicted sentential form

ET E’ id T E’

TV T’ id V T’ E’

V id + id T’ E’

T’€ + id E’

E’+E id id + E

ET E’ id id + T E’

T V T’ id id + V T’ E’

V id * id +id T’ E’

TV T’ id id + id * V T’ E’

V id --| id + id * id T’E’

T’€ --| id + id * E’

E’€ id + id * id

Page 33: System Programming Unit IV

Bottom–up Parsing [Shift Reduce Parser]

• A bottom up parser attempt to develop the syntax tree for an input string through a sequence of reductions.

• If the input string can be reduced to the distinguished symbol , the string is valid. If not , error would have be detected and indicated during the process of reduction itself.

• Attempts at reduction starts with the first symbol in the string and process to the right.

Page 34: System Programming Unit IV

Reduction should be processed as follows

• For current sentential form, n symbols to the left of current position are matches with all RHS alternative of grammar.

• IF match is found, these n symbols are replaced with NT on LHS of the rule.

• If symbol do not find a match, then n-1 symbols are matched, followed by n-2 symbols etc.

Page 35: System Programming Unit IV

• Until it is determined that no reduction is possible at current stage of parsing, at this point one new symbol of input string would be admitted for parsing. This is known as Shift action. Due to this nature of parsing , these parses are known as left-to-left parser or shift reduce parser.

Page 36: System Programming Unit IV

Handles

• Handle of a string: • Substring that matches the RHS of some

production AND whose reduction to the non-terminal on the LHS is a step along the reverse of some rightmost derivation

• A certain sentential form may have many different handles.

• Right sentential forms of a non-ambiguous grammar have one unique handle

Page 37: System Programming Unit IV

• Rules of Production:-• E E+E• E E*E• EE• E id

Page 38: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift

Page 39: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift

Page 40: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid

Page 41: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift

Page 42: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift

Page 43: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid

Page 44: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid$(E+E )*id$ Shift

Page 45: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid$(E+E )*id$ Shift$(E+E) *id$ Reduce EE+E

Page 46: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid$(E+E )*id$ Shift$(E+E) *id$ Reduce EE+E$E *id$ Shift$E* id$ Shift

Page 47: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid$(E+E )*id$ Shift$(E+E) *id$ Reduce EE+E$E *id$ Shift$E* id$ Shift$E*id $ Reduce by Eid

Page 48: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid$(E+E )*id$ Shift$(E+E) *id$ Reduce EE+E$E *id$ Shift$E* id$ Shift$E*id $ Reduce by Eid$E*E $ ReduceE*E

Page 49: System Programming Unit IV

Stack`1` Input Action$ (id+ id)*id$ Shift$( id+ id)*id$ Shift$(id +id)*id$ Reduce by Eid$(E +id)*id$ Shift$(E+ id)*id$ Shift$(E+ id )*id$ Reduce by Eid$(E+E )*id$ Shift$(E+E) *id$ Reduce EE+E$E *id$ Shift$E* id$ Shift$E*id $ Reduce by Eid$E*E $ ReduceE*E$E $ Accept

Page 50: System Programming Unit IV

Operator-Precedence Parser• Operator grammar

– small, but an important class of grammars– we may have an efficient operator precedence parser

(a shift-reduce parser) for an operator grammar.• In an operator grammar, no production rule can have:

– at the right side– two adjacent non-terminals at the right side.

• Ex:

EAB EEOE EE+E |

Aa Eid E*E |

Bb O+|*|/ E/E | id

not operator grammar not operator grammar operator grammar

Page 51: System Programming Unit IV

Precedence Relations

• In operator-precedence parsing, we define three disjoint precedence relations between certain pairs of terminals.

a <. b b has higher precedence than aa =· bb has same precedence as aa .> b b has lower precedence than a

• The determination of correct precedence relations between terminals are based on the traditional notions of associativity and precedence of operators. (Unary minus causes a problem).

Page 52: System Programming Unit IV

Using Operator-Precedence Relations

• The intention of the precedence relations is to find the handle of a right-sentential form, <. with marking the left end, =· appearing in the interior of the handle, and .> marking the right hand.

• In our input string $a1a2...an$, we insert the precedence relation between the pairs of terminals (the precedence relation holds between the terminals in that pair).

Page 53: System Programming Unit IV

Using Operator -Precedence Relations

E E+E | E-E | E*E | E/E | E^E | (E) | -E | id

The partial operator-precedencetable for this grammar

• Then the input string id+id*id with the precedence relations inserted will be:

$ <. id .> + <. id .> * <. id .> $

id + * $

id .> .> .>

+ <. .> <. .>

* <. .> .> .>

$ <. <. <.

Page 54: System Programming Unit IV

To Find The Handles

1. Scan the string from left end until the first .> is encountered.

2. Then scan backwards (to the left) over any =· until a <. is encountered.

3. The handle contains everything to left of the first .> and to the right of the <. is encountered.

$ <. id .> + <. id .> * <. id .> $ E id $ id + id * id $$ <. + <. id .> * <. id .> $ E id $ E + id * id $ $ <. + <. * <. id .> $ E id $ E + E * id $ $ <. + <. * .> $ E E*E $ E + E * .E $$ <. + .> $ E E+E $ E + E $$ $ $ E $