30
Lexical and Syntax Analysis (of Programming Languages) Bison, a Parser Generator

Syntax Analysis With Bison

  • Upload
    ahmfaw

  • View
    14

  • Download
    0

Embed Size (px)

DESCRIPTION

Syntax Analysis With Bison

Citation preview

Page 1: Syntax Analysis With Bison

Lexical and Syntax Analysis(of Programming Languages)

Bison, a Parser Generator

Page 2: Syntax Analysis With Bison

Bison: a parser generator

Specificationof a parser

Context­free grammarwith a C action for each 

production.

Match the input stringand execute the actions of the  

productions used.

Bison

C function called 

yyparse()

Page 3: Syntax Analysis With Bison

Input to Bison

The structure of a Bison (.y) fileis as follows.

/* Declarations */

%%

/* Grammar rules */

%%

/* C Code (including main function) */

Any text enclosed in /* and */is treated as a comment.

Page 4: Syntax Analysis With Bison

Grammar rules

Let α be any sequence ofterminals and non­terminals. Agrammar rule defining non­terminal n is of the form:

n : α1 action1| α2 action2| ⋯| αn actionn;

Each action is a C statement, ora block of C statements of theform {⋯ }.

Page 5: Syntax Analysis With Bison

Example 1

/* No declarations */

%%

e :    'x'|   'y'|    '('  e  '+'  e  ')'|    '('  e  '*'  e  ')'

%%

/* No main function */

expr1.y

/* No actions */

Non­terminal

Terminal

Page 6: Syntax Analysis With Bison

Output of Bison

Bison generates a C function

int yyparse() {⋯

}

Takes as input a stream of tokens.

Returns zero if input conforms togrammar, and non­zero otherwise.

Calls yylex() to get the next token.

Stops when yylex() returns zero.

When a grammar rule is used, thatrule’s action is executed.

Page 7: Syntax Analysis With Bison

Example 1, revisted

/* No declarations */

%%

e :    'x'|   'y'|    '('  e  '+'  e  ')'|    '('  e  '*'  e  ')'

%%

int yylex() {char c  = getchar();if (c == '\n') return 0; else return c;

}

voidmain() {printf(“%i\n”, yyparse());

}

expr1.y

/* No actions */

Page 8: Syntax Analysis With Bison

Running Example 1

At a command prompt '>':

> bison ­o expr1.c  expr1.y

> gcc ­o expr1 expr1.c ­ly

> expr1(x+(y*x))0

Input

Important!

Output(0 means successful parse)

Page 9: Syntax Analysis With Bison

Example 2

Terminals can be declared using a%token declaration, for example,to represent arithmetic variables:

%token VAR

%%

e :    VAR|    '('  e  '+'  e  ')'|    '('  e  '*'  e  ')'

%%

/* main() and yylex() */

expr2.y

Page 10: Syntax Analysis With Bison

Example 2 (continued)

int yylex() {int c = getchar(); 

/* Ignore white space */while (c == ' ') c = getchar();

if (c == '\n') return 0;if (c >= 'a' && c <= 'z')return VAR;

return c;}

voidmain() {printf(“%i\n”, yyparse());

}

expr2.y

Return a VAR token

Page 11: Syntax Analysis With Bison

Example 2 (continued)

Alternatively, the yylex() functioncan be generated by Flex.

%{#include "expr2.h"%}

%%

" " /* Ignore spaces */\n return 0;[a­z] return VAR;. return yytext[0];

%%

expr2.lex

Generated by Bison

Page 12: Syntax Analysis With Bison

Running Example 2

At a command prompt '>':

> bison ­­defines ­o expr2.c  expr2.y

> flex ­o expr2lex.c  expr2.lex

> gcc ­o expr2 expr2.c expr2lex.c –ly ­lfl

> expr2(a   +   ( b * c ))0

Parser

Output(0 means successful parse)

Lexer

Generate expr2.h

Page 13: Syntax Analysis With Bison

Example 3

Adding numeric literals:

%token VAR%token NUM

%%

e :    VAR|    NUM|    '('  e  '+'  e  ')'|    '('  e  '*'  e  ')'

%%

voidmain() {printf(“%i\n”, yyparse());

}

expr3.y

Numeric Literal

Page 14: Syntax Analysis With Bison

Example 3 (continued)

%{#include "expr3.h"%}

%%

" " /* Ignore spaces */\n return 0;[a­z] return VAR;[0­9]+ return NUM;. return yytext[0];

%%

expr3.lex

Numeric Literal

Adding numeric literals:

Page 15: Syntax Analysis With Bison

Semantic valuesof tokens

A token can have a semanticvalue associated with it.

A NUM token contains an integer.

A VAR token contains a variable name.

Semantic values are returned viathe yylval global variable.

Page 16: Syntax Analysis With Bison

Example 3 (revisited)

%{#include "expr3.h"%}

%%

" " /* Ignore spaces */\n return 0;[a­z] { yylval = yytext[0];

return VAR; }[0­9]+ { yylval = atoi(yytext);

return NUM;}

. return yytext[0];

%%

expr3.lex

Returning values via yylval:

Page 17: Syntax Analysis With Bison

Type of yylval

Problem: different tokens mayhave semantic values of differenttypes. So what is type of yylval?

%union{char var;int num;

}

Solution: a union type, whichcan be specified using the%union declaration, e.g.

yylval is eithera char or an int

Page 18: Syntax Analysis With Bison

Example 3 (revisted)

%{     #include "expr3.h"     %}

%%

" " /* Ignore spaces */\n return 0;[a­z] { yylval.var = yytext[0];

return VAR; }[0­9]+ { yylval.num = atoi(yytext);

return NUM;}

. return yytext[0];

%%

expr3.lex

Returning values via yylval:

Page 19: Syntax Analysis With Bison

Tokens have types

The type of token’s semanticvalue can be specified in a %tokendeclaration.

%union{char var;int num;

}

%token <var> VAR;%token <num> NUM;

Page 20: Syntax Analysis With Bison

Semantic valuesof non­terminals

A non­terminal can also have asemantic value associated with it.

$n refers to the semantic value ofthe nth symbol in the rule;

$$ refers to the semantic value of theresult of the rule.

In the action of a grammar rule:

%type <num> e;

The type can be specified in a%type declaration, e.g.

Page 21: Syntax Analysis With Bison

Example 4

%{int env[256];   /* Variable environment */

%}%union{  int num; char var;  }%token <num> NUM%token <var> VAR%type  <num> e

%%

s   :    e { printf(“%i\n”, $1); }

e   :    VAR                   { $$ = env[$1];  }|    NUM               { $$ = $1; }|    '('  e  '+'  e  ')'   { $$ = $2 + $4;  }|    '('  e  '*'  e  ')'   { $$ = $2 * $4;  }

%%

voidmain() {   env['x'] = 100;  yyparse();   }

expr4.y

Page 22: Syntax Analysis With Bison

Exercise 1

Modify Example 4 so that yyparse()constructs an abstract syntax tree.

typedef enum { Add, Mul }   Op;

struct expr {enum { Var, Num, App } tag;union {char var;int num;struct {struct expr* e1; Op op; struct expr* e2;

} app;};

};

typedef      struct expr       Expr;

Consider the following abstract syntax.

Page 23: Syntax Analysis With Bison

Precedence and associativity

The associativity of an operatorcan be specified using a %left,%right, or %nonassoc directive.

%left '+’%left '*'%right '&'%nonassoc '='

Operators specified in increasingorder of precedence, e.g. '*' hashigher precedence than '+'.

Page 24: Syntax Analysis With Bison

Example 5

%token VAR%token NUM

%left '+'%left '*'

%%

e :    VAR|    NUM|    e  '+'  e|    e  '*'  e|    ( e )

%%

void main() {printf(“%i\n”, yyparse());

}

expr5.y

Page 25: Syntax Analysis With Bison

Conflicts

Sometimes Bison cannot deducethat a grammar is unambiguous,even if it is*.

In such cases, Bison will report:

* Not surprising: ambiguity detection isundecidable in general!

a shift­reduce conflict; or

a reduce­reduce conflict.

Page 26: Syntax Analysis With Bison

Shift­Reduce Conflicts

Bison does not know whether toconsume more tokens (shift) or tomatch a production (reduce), e.g.

stmt :     IF expr THEN stmt|    IF expr THEN stmt ELSE stmt

Bison defaults to shift.

Page 27: Syntax Analysis With Bison

Reduce­Reduce Conflicts

Bison does not know whichproduction to choose, e.g.

expr :     functionCall|    arrayLookup|    ID

functionCall :    ID '(' ID ')' 

arrayLookup :    ID '(' expr ')' 

Bison defaults to using the firstmatching rule in the file.

Page 28: Syntax Analysis With Bison

Variants of Bison

There are Bison variants availablefor many languages:

Language Tool

Java JavaCC, CUP

Haskell Happy

Python PLY

C# Grammatica

* ANTLR

Page 29: Syntax Analysis With Bison

Summary

Bison converts a context­freegrammar to a parsing functioncalled yyparse().

yyparse() calls yylex() to obtainnext token, so easy to connectFlex and Bison.

Page 30: Syntax Analysis With Bison

Summary

Each grammar production mayhave an action.

Terminals and non­terminalshave semantic values.

Easy to construct abstractsyntax trees inside actionsusing semantic values.

Gives a declarative (high level)way to define parsers.