21
Cse321, Programming Languages and Compilers 1 03/25/22 Lecture #11, Feb. 19, 2007 ml-Yacc Actions when reducing Making ml-yacc work with ml-lex Boiler plate

Cse321, Programming Languages and Compilers 1 6/30/2015 Lecture #11, Feb. 19, 2007 ml-Yacc Actions when reducing Making ml-yacc work with ml-lex Boiler

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

Cse321, Programming Languages and Compilers

104/19/23

Lecture #11, Feb. 19, 2007

•ml-Yacc

•Actions when reducing

•Making ml-yacc work with ml-lex

•Boiler plate

Cse321, Programming Languages and Compilers

204/19/23

Assignments

• Reading– Chapter 4, Sections

» 4.1 Context Sensitive Analysis

» 4.2 Intro to Type Systems

– Pages 151-170

– Quiz on Wednesday?

• Homework #9 is due Wednesday.

• Project 2 is assigned today. It is posted on the web site.

Cse321, Programming Languages and Compilers

304/19/23

Sml-yacc parser generator

• Sml-yacc specifications contain 3 parts separated by %%

<user declarations>

%%

<declarations about the grammar>

%%

<grammar rules>

Cse321, Programming Languages and Compilers

404/19/23

Declarations about the grammar

• All begin with a single % followed by a key word.

• Some declarations are required! – You MUST name the specification

» %name XXX

– you MUST describe the nonterminals and terminals of the grammar.

» %term ....

» %nonterm ...

• The description of terminals and non-terminals requires you give the type of any attribute that they may have.

• You Must have a %pos declaration. This declares the type of "positions". (more about this later).

Cse321, Programming Languages and Compilers

504/19/23

%term and %nonterm• These things look like algebraic datatype declarations. • We will build an example parser for Regular Expressions.

(* user declarations *)datatype Re = empty of int | simple of string * int | concat of Re * Re | closure of Re | union of Re * Re;val count = ref 0; fun next() =(count := (!count)+1; !count);

%% (* declarations about the grammar *)%name XXX

%term EOF | STAR | BAR | LP | RP | HASH | SINGLE of string

%nonterm exp of Re%pos int

%% (* grammar rules *) . . .

Cse321, Programming Languages and Compilers

604/19/23

Description of Example• Symbols are represented by EOF, BAR, ...

• None of them have any attributes except SINGLE which has a string attribute which represents the single character that we want to recognize.

• There is only one non-terminal, exp, and it has one attribute which is of type Re.

• Note that the Re type is defined in the user declarations section.

• The %pos declaration say that a position is an integer. This is for error reporting.

Cse321, Programming Languages and Compilers

704/19/23

Recall how a Bottom up Parse WorksE ::= E + T1 | T2

T ::= T * F3 | F4

F ::= ( E )5 | id6

stack Input Action

x + y shift

x + y reduce 6

F + y reduce 4

T + y reduce 2

E + y shift

E + y shift

E + y reduce 6

E + F reduce 4

E + T reduce 1

E accept

Cse321, Programming Languages and Compilers

804/19/23

Grammar Rules Section• Grammar rules which describe the grammar that

is to be recognized• They also tell what to do whenever a "reduce"

action is encountered.• A grammar rule has the form:

– <non-terminal> : <rhs> ( action )

<optional more rules for that non-terminal>

– For example:

exp: SINGLE ( simple(SINGLE,next() ) )

| HASH ( empty (next()) )

• The “action” is a value that is associated with the lhs of the production when it is pushed on the stack.

• Its can “depend” upon the values of the symbols in the rhs (which are already on the stack).

Cse321, Programming Languages and Compilers

904/19/23

Example Showing Grammar Rules (*user declarations (Re) omitted here*)

%%

(* declarations about the grammar *)

%name XXX

%term EOF | STAR | BAR | LP |

RP | HASH | SINGLE of string

%nonterm exp of Re

%pos int

%%

exp: SINGLE ( simple(SINGLE ,next()) )

| HASH ( empty(next()) )

| LP exp RP ( exp )

| exp STAR ( closure exp )

| exp exp ( concat(exp1,exp2) )

| exp BAR exp ( union(exp1,exp2) )

Cse321, Programming Languages and Compilers

1004/19/23

Complete Exampledatatype Re = empty of int | simple of string * int

| concat of Re * Re | closure of Re | union of Re * Re;

%%

%name XXX

%term EOF | STAR | DUMMY | BAR | LP |

RP | HASH | SINGLE of string

%nonterm go of Re | exp of Re

%pos int

%start go

%eop EOF

%verbose

%left LP SINGLE HASH

%left BAR

%left DUMMY

%right STAR

Cse321, Programming Languages and Compilers

1104/19/23

Complete Example continued%%

go: exp EOF ( exp )exp: SINGLE ( simple(SINGLE,next()) ) | HASH ( empty( next() ) ) | LP exp RP ( exp ) | exp STAR ( closure exp ) | exp exp %prec DUMMY ( concat(exp1,exp2) ) | exp BAR exp ( union(exp1,exp2) )

Cse321, Programming Languages and Compilers

1204/19/23

Boiler Plate• To get this all to work we need a lexical analyzer that

can produce terminal symbols with the correct attributes for the %term directive.

• We can use sml-lex to do this, but instead of defining our own token type we will use the one which is automatically defined by the %term declaration in sml-yacc.

• In order to do this we need the following BOILER-PLATE in the user declarations part of the sml-lex source file.

type pos = inttype svalue = Tokens.svaluetype ('a,'b) token = ('a,'b) Tokens.tokentype lexresult = (svalue,pos) tokenopen Tokensval lineno = ref 0val reset_lineno = fn () => lineno := 1val eof = fn () => EOF(!lineno,!lineno)fun error (e,l : int,_) = . . .

Boiler plate in Sml-Lex source file

Cse321, Programming Languages and Compilers

1304/19/23

More Boiler Plate• We must also place the following as the FIRST line

in the ML-lex definitions section. %header (functor XXXLexFun (structure Tokens: XXX_TOKENS));

• It is very important that the "type pos = int" be the same type as the %pos declaration in the sml-yacc source file, and that the "XXX" in the %header declaration in the sml-lex source file BE THE SAME as the %name declaration in the sml-yacc source file.

type pos = int%header

(functor

XXXLexFun

(structure

Tokens:

XXX_TOKENS))

%%

%pos int

%name XXX

%%

lexfile yacc file

Cse321, Programming Languages and Compilers

1404/19/23

Tying it all together• The file "XXX.cm" ties all the pieces

together.

• This file has many occurrences of the string XXX, they must all be changed to the same string as in the %name directive of the sml-yacc source file.

• To build a parser we do the following:– Start up sml and then use the compile-manager as follows

Cse321, Programming Languages and Compilers

1504/19/23

New Boiler plate for Parser

structure CommonTypes = struct

(* Put type declarations here that you *)(* want to appear in both the parser *)(* and lexer. You can open this structure *)(* else where inside your application as well *)

end;

group is CommonTypes.sml XXX.lex XXX.grm driver.sml (* Other user defined sml files go here *)

$/basis.cm (* system library files *) $/smlnj-lib.cm $/ml-yacc-lib.cm

XXX.cm

CommonTypes.sml

Cse321, Programming Languages and Compilers

1604/19/23

The Driver file

(* ************** Driver file **************** *)structure Driver = struct

(* ******* Tie all the libraries together ******** *)

structure regexpLrVals = regexpLrValsFun(structure Token = LrParser.Token);structure regexpLex = regexpLexFun(structure Tokens = regexpLrVals.Tokens);structure regexpParser = Join(structure ParserData = regexpLrVals.ParserData structure Lex = regexpLex structure LrParser = LrParser);

(* ******** Build a lexer and Parser *************** *)

val verboselex = ref false;

Fun parse s fromfile = . . .

end (* struct Driver *)

Driver.sml

Cse321, Programming Languages and Compilers

1704/19/23

The .lex files

open CommonTypes;

type pos = inttype svalue = Tokens.svalue

(* the type token is from the %term in XXX.grm *)type ('a,'b) token = ('a,'b) Tokens.token type lexresult = (svalue,pos) token

(* Defines constructor functions for "token" *)open Tokens val lineno = ref 0val reset_lineno = fn () => lineno := 1

. . .(* YOUR USER DECLARATIONS (if any) GO HERE *)

%%%header (functor XXXLexFun(structure

Tokens:XXX_TOKENS));

(* YOUR Lex-Definitions (if any) GO HERE *)

%%

(* YOUR RULES GO HERE *)

XXX.lex

Cse321, Programming Languages and Compilers

1804/19/23

The .grm file

open CommonTypes;

(* YOUR USER DECLARATIONS (if any) GO HERE *)

%%(* declarations about the grammar *)%name XXX

%term EOF | ...

%nonterm go of ? | ... %pos int%start go%eop EOF%verbose

(* YOUR GRAMMAR DECLARATIONS LIKE %left ETC. (if any) GO HERE *)

%%

go: ... EOF ( ... )

(* YOUR ADDITINAL GRAMMAR RULES GO HERE *)

XXX.grm

Cse321, Programming Languages and Compilers

1904/19/23

Putting it all together

• Start sml in the directory where all the files are

• Then type: CM.make “XXX.cm”

• The Open the driver Library– This imports the function – parse :: string -> bool -> answer_type

Cse321, Programming Languages and Compilers

2004/19/23

Standard ML of New Jersey v110.57 [built: Mon Nov 21 21:46:28 2005]

- CM.make "regexp.cm";[scanning regexp.cm][D:\programs\SML110.57\bin\ml-lex regexp.lex]

Number of states = 12Number of distinct rows = 2Approx. memory size of trans. table = 258 bytes[parsing (regexp.cm):regexp.lex.sml][library $/ml-yacc-lib.cm is stable][library $SMLNJ-ML-YACC-LIB/ml-yacc-lib.cm is stable][loading (regexp.cm):regexp.grm.sig][loading (regexp.cm):CommonTypes.sml][loading (regexp.cm):regexp.grm.sml][compiling (regexp.cm):regexp.lex.sml][code: 9617, data: 705, env: 1871 bytes][loading (regexp.cm):driver.sml][New bindings added.]val it = true : bool

- open Driver;opening Driver val parse : string -> bool -> Driver.regexpParser.result val verboselex : bool ref end-

Cse321, Programming Languages and Compilers

2104/19/23

Boiler Plate Files• The Final BOILER PLATE files, that you can fill

in, replacing XXX with the name of your parser, and filling in the ...'s with some code or rules can be found in the directory:

http://www.cs.pdx.edu/~sheard/course/Cs321/LexYacc/boilerplate/

SML-version/boilerplate

• You will find the 5 files – "XXX.lex"

– "XXX.grm"

– "XXX.cm“

– CommonTypes.sml

– Driver.sml

• The outline of these files is included here for your convenience

• The complete example is in the filehttp://www.cs.pdx.edu/~sheard/course/Cs321/LexYacc/regexpParser/