Upload
lydia-lindsay-hubbard
View
214
Download
0
Embed Size (px)
Citation preview
lex & yacc
CIS*2750 Winter 2013
D. McCaughanCIS*2750 (W13)
Scanners
A “scanner” turns an input stream in the source language into token codes– in principle: takes some action
when it recognizes a token in the input
– discard non-semantic content (i.e. whitespace, comments)
– may do other small jobs, like converting numeric constants
this is the wrong scanners
if (a == 0){ /* increase b */ b++;}
IFLPARENIDEQCONSTANTLBRACEIDINCRSEMIRBRACE
D. McCaughanCIS*2750 (W13)
Scanners: Lexical Analysis Analyze the structural components of input
– scanner: groups input characters into tokens
What is a token?– a sequence of characters that can be treated as
an atomic grammatical unit– a language specifies a finite set of token types (the
lexical units of the language), e.g.• ID (“foo”, “bar”, “abc123”, …}, IF (“if”), INTEGER, REAL,
COMMA, NEQ, LPAREN, RPAREN, …
– tokens with additional semantic values• e.g. identifiers, string literals, numbers
D. McCaughanCIS*2750 (W13)
Scanners: example program:
/* find a zero */float mach0(char *s){ if (!strncmp(*s, “0.0”,3)) return(0.0);}
scanner tokenizes:
FLOAT ID(match0) LPAREN ID(s) RPAREN LBRACE IF LPAREN BANG ID(strcmp) LPAREN ID(s) COMMA STRING(0.0) COMMA INTEGER(3) RPAREN RPAREN RETURN REAL(0.0) SEMI RBRACE EOF
D. McCaughanCIS*2750 (W13)
Specifying tokens Structure of tokens can be complex
– problem defining complex tokens ad hoc– e.g. string literals– e.g. floating point format
Need a formal language to specify token types without ambiguity– permit review of design and validation of input
Regular expressions– succinct, precise– capable of representing infinite sets of strings
• CAUTION: cannot describe all sets of strings with regular expressions
• consider writing a regex for “strings containing an equal number of a and b characters”
D. McCaughanCIS*2750 (W13)
Finite automata Need a formalism that can be implemented in code
Finite Automaton: a simple idealized “computer” that recognizes strings belonging to regular sets:– a finite set of states S– a finite alphabet – a set of transitions between states based on the input read
in a given state T:(S x ) S– a specific start state s S– a set of final (accepting) states F S– the set of all strings accepted by a given FA is the language
it defines
Compare the above with what you understand about regular expressions… they are equivalent
D. McCaughanCIS*2750 (W13)
Finite automata (cont.) Can represent a FA using transition graphs
– directed graph– each state is a vertex
• accepting states are marked as such– each transition is a directed edge between states
• edges are labeled with a symbol from the alphabet• a symbol can appear on only 1 outgoing edge from a
given state• an unlabelled edge is directed into the start state
a-zA-Z_
a-zA-Z_
0-9
D. McCaughanCIS*2750 (W13)
Finite automata (cont.) Deterministic finite automata
– no two edges leaving the same state are labelled with the same input symbol
Processing– begin in start state– for each character in input do
• follow edge labelled with this character to next state– after n transitions (for input of length n), if current state is
final state: ACCEPT string, else: REJECT string
Easily implemented– CASE-based processing given global current state– matrix-based transition table (table lookup)
• newstate = matrix[current_state][input]
D. McCaughanCIS*2750 (W13)
Scanner generators Writing scanners is a common requirement
– parsing is a ubiquitous activity
Process is repetitive, resulting code is similar in structure
Process is not difficult to automate
Scanner generators receive a specification file– definitions of the tokens to be scanned– non-procedural programming
• not “how”, but “what”
e.g. lex
D. McCaughanCIS*2750 (W13)
lex What is lex?
– a lexical analyzer (scanner) generator
– INPUT:• a description file that uses regular expressions to specify
patterns to be tokenized– OUTPUT:
• source code that implements the scanner
– be default, input is taken from stdin and sent to stdout (this can be changed)
Specific variants– flex for creating C/C++ lexers– JFlex for creating Java lexers– etc.
D. McCaughanCIS*2750 (W13)
Structure of a lex file
Definitions section– small building blocks of regular expressions to simplify the
scanner specification• declared outside of %{ %}• special directives to change lex’s behaviour also appear here
– anything inside of %{ %} is copied verbatim into the final program (so should be C code)
• comments, #include, #define, variables (e.g. line counter), etc.
DEFINITIONS SECTION%% RULES SECTION%% USER CODE SECTION
D. McCaughanCIS*2750 (W13)
Structure of a lex file (cont.) Rules section
– a pattern (regex) and an action (program code) to execute when that pattern is found
• the action starts on the same line as the pattern• patterns only match a given input string once• the longest possible match is always used
– “island” would match [a-zA-Z]+ before is
User code section– any legal program code, not enclosed in %{ %}– copied verbatim into final program– main(), other subroutines used (or expected) by actions
from the rules section– NOTE: comments outside of %{ %} must be indented!
D. McCaughanCIS*2750 (W13)
Example The simplest lex script:
– simply copies standard input to standard output– ECHO is a special lex directive, not a C command
%%
.|\n { ECHO; }
%%
D. McCaughanCIS*2750 (W13)
Running lex Executing lex:
lex <lexfile>
e.g.
% lex example.l (produces lex.yy.c)
– outputs C source code for lexer - by default this file is called lex.yy.c, which can be compiled normally
– some systems may require you to link in the lex library (i.e. -ll note for flex: -lfl)
e.g.
% gcc -Wall -ansi lex.yy.c -o scanner -fl
D. McCaughanCIS*2750 (W13)
Running lex (cont.) Key points:
– automatically generates a function yylex() which when called begins scanning the input (stdin) for patterns and executing actions
• if actions have no return statements, yylex() won’t return until EOF
– internal variables are always available in actions• yytext - text that matched the pattern• yyleng - length of string in yytext• etc. (some implementations will have built-in support for lineno)
– if a main() routine is not explicitly provided, lex will include one automatically that simply calls yylex()
D. McCaughanCIS*2750 (W13)
Example
Things to note here:– local variables in the definitions section; #define and #include would also belong there
– special internal variables (yyleng) and functions (yylex)
%{/* a word counting program */unsigned char_count = 0, word_count = 0, line_count = 0;
%}
word [^ \t\n]+eol \n
%%
{word} { word_count++; char_count += yyleng; }{eol} { char_count++; line_count++; }. { char_count++; }
%%
int main(){ yylex(); printf(“l: %d - w: %d - c: %d\n”, line_count, word_count, char_count);}
DEFINITIONS
RULES
USER CODE
D. McCaughanCIS*2750 (W13)
Example%{
/* crude verb recognition program */%}
%%
[\t ]+ { /* ignore whitespace */ }
is |are |was |being |do |did |would |can |have |go { printf(“%s: is a verb\n”,yytext); }
[a-zA-Z]+ { printf(“%s: is not a verb\n”, yytext); }
.|\n { ECHO; /* default catch-all */ }
%%
int main(){
yylex();}
D. McCaughanCIS*2750 (W13)
Example (cont.) Compiled & run:
% ./verbdid I have fun?did: is a verbI: is not a verbhave: is a verbfun: is not a verb?^D
D. McCaughanCIS*2750 (W13)
Hints and tips Error reporting
– you’ll want to be able to report (at least) a line number for unrecognized toekns (and other error conditions related to the parser to follow)
– consider using %option yylineno in flex• you can easily implement this function yourself (how?)
– it can be useful to have special actions apply inside a comment (for example) or other semantic construct
• have a look at “start conditions” (lex manual) and <<EOF>> rules
Recall that tokens often have associated semantic values that must be recorded over time– symbol table: a look-up table (typically a hash table) that
permits storage and retrieval of data to be associated with a symbol
– consider how this would be integrated with lex
D. McCaughanCIS*2750 (W13)
Parsers What we’ve seen to this point is syntax analysis
– only concerned with identifying the structural components of the input
Typically the sequences of tokens are also significant: this is semantic analysis– recognize sequences of tokens (or classes of tokens) and
perform appropriate actions
“Parsers” validate the phrase structure of input– specific sequences of tokens– recognizer– determine the semantics of the input
• consider parse trees (abstract syntax)
D. McCaughanCIS*2750 (W13)
Parsers (cont.) A language is defined by the phrase structure
of its component expressions. e.g.:
addition expression = ID ADDOP IDe.g.
a = b
decl = TYPE ID decls SEMICOLON
decls = decls COMMA decls | IDe.g.
int a, b, c;
D. McCaughanCIS*2750 (W13)
Specification of languages Consider defining phrases with regex’s
– e.g. addition expressionsdigits = [0-9]+sum = (digits “+”)* digits
• e.g. 28 + 301 + 9
– what about parentheses?digits = [0-9]+sum = expr “+” exprexpr = “(“ sum “)” | digits
• e.g. (109 + 23) … 61 … (1 + (250 + 3))
D. McCaughanCIS*2750 (W13)
Specification of languages (cont.) BUT…it is impossible for a DFA to recognize
balanced parentheses (can’t count to arbitrary N)– sum and expr are thus not regular expressions
recall abbreviations in lex– what does lex do with such abbreviations?– RHS is substituted for LHS prior to generation of DFA– try substituting abbreviations in prev. example
• explosion of abbreviations– abbreviations does not increase expressive power
What we need is recursive abbreviations
D. McCaughanCIS*2750 (W13)
Context Free Grammars (CFGs) A precise method of specifying context free
languages
Incorporate recursion into definitions– counting
• e.g. balanced parentheses
– arbitrary repetition• e.g. mathematical expressions
D. McCaughanCIS*2750 (W13)
CFG Terminology Non-terminals: variables that represent a language
(UPPER CASE)
Terminals: atomic symbols in the language (lower case)
Productions: rules relating variables ()– languages associated with given non-terminal contains
strings formed by concatenating strings from langauges of other non-terminals, and possibly terminals
Start symbol: a special symbol that starts all derivations (S)
D. McCaughanCIS*2750 (W13)
Backus-Nuar Form (BNF) From Hopcroft & Ullman, 1979 Describing natural language:
<sentence> <noun phrase> <verb phrase><noun phrase> <adjective> <noun phrase><noun phrase> <noun><noun> boy<adjective> little
Generally not adequate for describing natural language (no accommodation of context)
Ideal for most programming languages– Backus-Nuar Form (BNF)
D. McCaughanCIS*2750 (W13)
Productions Example
– arithmetic expressions with + and - operators, id-class operands and balanced parentheses
S EXPREXPR EXPR + EXPREXPR EXPR - EXPREXPR ( EXPR )EXPR id
S EXPREXPR EXPR + EXPR | EXPR - EXPR | ( EXPR ) | id
D. McCaughanCIS*2750 (W13)
Derivations To show a sentence is in the language defined by a
grammar, we can perform a derivation --- start with start symbol and repeatedly replace any non-terminal by one of its RHSs
S EXPR EXPR + EXPR EXPR + id id + id
S EXPR EXPR - EXPR ( EXPR ) - EXPR ( EXPR ) - id ( EXPR + EXPR ) - id ( EXPR + id ) - id ( id + id ) - id
D. McCaughanCIS*2750 (W13)
Parse trees A tree in which each symbol in a derivation is
connected to the one from which it was derived– several derivations can have the same parse tree
S
EXPR
EXPR EXPR
-
+
id
id
id
( )
EXPR
EXPR
EXPR
S ( id + id ) - id
D. McCaughanCIS*2750 (W13)
Derivation sequence Many different possible derivations of the same
sentence– if more than one non-terminal appears in the RHS of
productions, we can choose which to expand first
Two obvious conventions:– leftmost derivation
• choose leftmost non-terminal to expand• top down (recursive descent) parsing• easiest to write by hand
– rightmost derivation• choose rightmost non-terminal to expand• “canonical” derivation• bottom up parsers (e.g. yacc)
D. McCaughanCIS*2750 (W13)
Repetition and recursion Two ways to specify recursion
Left Recursion– non-terminal appears as the first symbol on RHS of
production (NOTE: for yacc it is better to use left recursion where possible - minimizes stack size)
– e.g. A Az | z
Right recursion– non-terminal appears as the last symbol on RHS of
production– e.g. A zA | z
Either produces the same language rule– which we use can have significant effect depending on the
parsing algorithm used
D. McCaughanCIS*2750 (W13)
Example Specifying a programming language (Pascal-like)
PROGRAM HEADER VARS BODY
HEADER program string ‘(‘ IO ‘)’ ‘;’IO input | output | inpout | none
VARS DECLS | void ‘;’DECLS DECLS DECL | DECLDECL TYPE IDS ‘;’TYPE integer | realIDS IDS ‘,’ id | id
BODY begin STMTS end
STMTS STMTS STMT | STMTSTMT EXPR ‘;’
EXPR EXPR ‘+’ EXPR | EXPR ‘=‘ EXPR | ‘(‘ EXPR ‘)’ | id | number
D. McCaughanCIS*2750 (W13)
Errors in grammars Ambiguity: effect on semantics
– consider 2 - 1 - 3– (2 - 1) - 3 != 2 - (1 - 3)– checking for ambiguous grammars in general CFG
is impossible• algorithms exist for certain classes of grammar (such as
those for which we can generate parsers)
Recall: grammar used to define a language– errors in grammar: wrong language defined– comparison for identity (equality) between pairs of
grammars in the general case is also impossible
D. McCaughanCIS*2750 (W13)
Ambiguous grammars A grammar is ambiguous if we can derive a sentence
with two different parse trees– semantics are no longer necessarily clear; e.g.
S EXPREXPR EXPR + EXPR | EXPR - EXPR | id
– NOTE :multiple ways to derive id + id - id
Leftmost derivation:
S EXPR EXPR + EXPR id + EXPR id + EXPR - EXPR id + id - EXPR id + id - id
Rightmost derivation:
S EXPR EXPR - EXPR EXPR - id EXPR + EXPR - id EXPR + id - id id + id - id
D. McCaughanCIS*2750 (W13)
Ambiguous grammars (cont.)S
EXPR EXPR
+
-id
id id
EXPR
EXPR EXPR
S
EXPR EXPR
-
+ id
id id
EXPR
EXPREXPR
D. McCaughanCIS*2750 (W13)
Resolving ambiguity Disambiguating rules
– explicitly states which parse tree is correct– no change required to grammar
Precedence– stated order of derivations based on operator– recall: subtrees will be evaluated before expressions
represented by root node --- order of derivations is opposite to order of evaluation
Associativity– stated order of derivations based on location– left associative: derivation from first choice– right associative: derivation from last choice
D. McCaughanCIS*2750 (W13)
Resolving ambiguity (cont.) Rewrite the grammar
– accommodate concepts of precedence and associativity in statement of grammar
• write rules that have phrases to be evaluated first deriving later in production sequence
PrecedenceEXPR EXPR + EXPR | MEXPRMEXPR MEXPR * MEXPR | AEXPRAEXPR ( EXPR ) | number
AssociativityEXPR EXPR + MEXPR | MEXPRMEXPR MEXPR * AEXPR | AEXPRAEXPR ( EXPR ) | number
D. McCaughanCIS*2750 (W13)
Common ambiguities Mathematical expressions
– if parentheses are not required, operators that are not associative by nature may be ambiguous
Conditional expressions– dangling else
if condition
if condition
statements
else
statements
if condition
if condition
statements
else
statements
D. McCaughanCIS*2750 (W13)
Notes Classes of grammars
– regular grammar (regex)• A zB | z OR (i.e. not both)• A Bz | z
– context free grammar (CFG)• A B (A is any non-terminal, B is any string)
– context sensitive grammar• xAz xBz (A is any non-terminal, B is any string)
– unrestricted grammar• also called recursively enumerable
Example: context issues in programming languages– symbols defined prior to use– cannot specify with CFGs
D. McCaughanCIS*2750 (W13)
yacc What is yacc?
– a parser generator
– INPUT:• a description file that uses a BNF-like notation to specify
sequences of tokens to be recognized as a semantic unit– OUTPUT:
• source code that implements the parser
– yacc operates on tokens rather than the input directly• requires a source of tokens (like lex!)
Specific variants– bison, byacc for creating C/C++ parsers– CUP for creating Java parsers– etc.
D. McCaughanCIS*2750 (W13)
Using lex & yacc together The parser is the higher level routine
– it calls the lexer when it needs a token from the input
– the scanner sends tokens to the parser as codes– not all input is of interest to the parser
(whitespace, comments) so the lexer does not return these
What are the token codes?– scanner and parser must agree
• solution: let yacc define the token codes• tokens defined in the parser will automatically be defined
as a small integer value using #define macros in a header file generated automatically by yacc
D. McCaughanCIS*2750 (W13)
yacc and parsing Shift/reduce parsing
– a yacc parser looks for rules that might match the tokens seen so far
– has a set of states: each reflects a possible position in one or more partially matched rules
– when it reads a token that doesn’t complete a rule, it pushes the token onto a stack and switches to a new state
• this is a shift
– when it reads a token that completes a rule, it pops the RHS symbols off the stack, pushese the LHS symbol onto the statck and switches to a new state
• this is a reduce
– whenever a rule is reduced, user code associated with the rule is executed
D. McCaughanCIS*2750 (W13)
Shift/reduce parsinge.g. statement NAME = expression
expression NUMBER + NUMBER | NUMBER - NUMBER
Parse: A = 12 + 13
stack: A (shift A)A= (shift =)A=12 (shift 12)A=12+ (shift +)A=12+13 (shift 13)
This matches the rule expression NUMBER + NUMBERso reduce: pop 13, +, 12 and push expression
stack: A=expression (shift A)
This matches the rule statement NAME = expressionso reduce: pop expression, =, A and push statement
End of input. Stack has been reduced to the start symbol, so the input was valid according to the grammar
D. McCaughanCIS*2750 (W13)
Structure of a yacc file
Definitions section– specify tokens and types for symbols, precedence and associativity
rules• declared outside of %{ %}• tokens and types for symbols in the grammer (with %token and %type
respectively)• we can specify a non-integer token type (as a union) with %union• a start symbol can be explicitly specified with %start
– anything inside of %{ %} is copied verbatim into the final program (so should be C code)
• comments, #include, #define, variables (e.g. symbol table)
DEFINITIONS SECTION%% RULES SECTION%% USER CODE SECTION
D. McCaughanCIS*2750 (W13)
Structure of a yacc file (cont.) Rules section
– a grammar rule and an action (program code) to execute when that pattern is found
• default start symbol is the LHS of the first rule• NOTE: yacc cannot parse ambiguous grammars!
– a rule consists of a list of grammar rules (using “:” instead of “->”), optionally including an action consisting of program code, with a semi-colon terminating each rule
– parser generated will execute any action present when it reduces a rule
User code section– any legal program code, not enclosed in %{ %}– copied verbatim into final program– main(), other subroutines used (or expected) by actions from the
rules section• caution: there should only be one main() between lex and yacc
(obviously)
D. McCaughanCIS*2750 (W13)
Symbols NOTE: yacc reverses the BNF conventions with
respect to terminals and non-terminals– non-terminals are lower-case; terminals are upper-case
Every symbol in a yacc grammar has a value– symbols can be of different types by using the %union and %type directives
– the LHS is referred to as $$; the symbols on the RHS are referred to by position, as $1, $2, $3, …
– these shorthand notations are replaced in the generated code by the actual variable containing the value
D. McCaughanCIS*2750 (W13)
Running yacc Executing yacc
yacc -d -y <yaccfile>
– outputs C source, by default named y.tab.c, and an include file for use by a scanner, named y.tab.h
– must also produce/compile a scanner and link it all together
% yacc -d -y example.y (produces y.tab.[ch])
Key points– automatically provides a function yyparse()
• scans the input, shifting/reducing until the scanner reports the end of input (subsequent calls will reset the state and continue)
– internal variables are available to both lexer and parser (yyin - input stream; yylval - value of lexer token, etc.)
D. McCaughanCIS*2750 (W13)
Example: an expression parser%{ #include <stdlib.h> #include <stdio.h>%}
%union { int ival; char *sval;}%token PLUS MINUS EQUALS%token <sval> NAME%token <ival> NUMBER%type <ival> expression
%%
statement : NAME EQUALS expression { printf(“%s = %d“, $1, $3); } | expression { printf(“= %d\n”,$1); } ;
expression : expression PLUS NUMBER { $$ = $1 + $3; } | expression MINUS NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ;
%%
extern FILE *yyin;
int yyerror(char *s) { fprintf(stderr, “%s\n”,s); }
int main(){ if (yyin == NULL) yyin = stdin; while (!feof(yyin)) yyparse();}
DEFINITIONS
RULES
USER CODE
D. McCaughanCIS*2750 (W13)
Example: expression parser’s scanner
Things to note here:– control source of input by setting yyin (in yacc)– yyerror() is called by yacc on parse errors (and can be freely
used in actions otherwise), and should be provided– y.tab.h is the include file generated by yacc that contains the
token definitions– yacc parsers contain an internal variable called yylval that the
lexer should set to contain any value associated with a token (the token itself is always returned as an integer - as defined by yacc)
– note: o.k. to return yytext[0], but not yytext - why?• careful managing memory when copying strings (this
coupling can’t be avoided with lex/yacc)
%{ #include “y.tab.h” %} %%
[a-zA-Z_] {yylval.sval = strdup(yytext); return(NAME); }[0-9]+ {yylval.ival = atoi(yytext); return(NUMBER); }“=“ { return(EQUALS); }“+” { return(PLUS); }“-” { return(MINUS); }[ \t] { /* ignore whitespace */ }\n { return(0); /* logical EOF */ }
%%
D. McCaughanCIS*2750 (W13)
Understanding conflicts Pointer model
– you can think of yacc processing as a “pointer” which moves through the yacc grammar as each token is read
– at first there is only 1 pointer; may be >1 to represent partially recognized rules
– e.g.
start : A B C ;
• reads A and B
start : A B C ;
This material is drawn from “lex & yacc (2e)” , Levine, Mason and Brown
D. McCaughanCIS*2750 (W13)
Understanding conflicts (cont.) e.g.,
Recall: rule is reduced when a pointer reaches the end of a rule
start : x | y ;x : A B z R ;y : A B z S ;z : C D ;
reads A and B
start : x | y ;x : A B z R ;y : A B z S ;z : C D ;
reads D
start : x | y ;x : A B z R ;y : A B z S ;z : C D ;
reads C
start : x | y ;x : A B z R ;y : A B z S ;z : C D ;
D. McCaughanCIS*2750 (W13)
Understanding conflicts (cont.) Reduce/reduce conflict
– rule is reduced while there is more than one pointer
start : x | y ;x : A ;y : A ;
reads A
start : x | y ;x : A ;y : A ;
reduce rule x? reduce rule y?
D. McCaughanCIS*2750 (W13)
Understanding conflicts (cont.) Shift/reduce conflict
– rule is reduced while there is more than one pointer
start : x | y ;x : A R ;y : A ;
reads A
start : x | y ;x : A R;y : A ;
shift R in rule x? reduce rule y?
D. McCaughanCIS*2750 (W13)
Understanding look-ahead issues Keep in mind that the implementation of a parser
algorithm is a separate issue from CFGs
yacc parsers use 1 token look-ahead– the following is not a reduce/reduce error as yacc makes
decisions based on the next token as well
– the following grammar is not ambiguous, however requires 2 tokens of look-ahead
• yacc cannot do this, so: reduce/reduce error
start : x B | y C ;x : A ;y : A ;
start : x B C | y B C ;x : A ;y : A ;
D. McCaughanCIS*2750 (W13)
Understanding token typing Default token type is int
%union - identifies all possible C types that tokens can have
e.g.%union { char *str; double real; int integer;}
Permits symbols to be of type <str>, <real> or <integer>, with the type corresponding to the C type in the %union
Note: most of this is handled automatically for you - the declaration is what is important
D. McCaughanCIS*2750 (W13)
Understanding token typing (cont.) Now:
%token <type> TOKEN1, TOKEN2, …– declares all listed tokens to be of the stated type
e.g.
%token <str> NAME
– the NAME (terminal) token has an associated semantic value that corresponds to the type associated with the identifier str in the %union directive
What about non-terminals?%type <type> nonterm1, nonterm2, …
D. McCaughanCIS*2750 (W13)
Issues We are ignoring much in this overview:
– redefining input() and output() routines to work on sources other than streams (FILE *)
– default main() routines in yacc
– incorporating lexers and parsers as modules in a larger system
– changing the default names of files/internal functions/internal variables (necessary if you want more than one parser in a program)
– many internal variables/functions (yywrap, etc.)
– we are probably ignoring issues in covering the ignored issues
D. McCaughanCIS*2750 (W13)
Additional resources Online manual
http://www.gnu.org/software/bison/manual/index.html
“lex & yacc (2e)”, John Levine, Tony Mason & Doug Brown, O’Reilly, 1992
The lex & yacc primer/HOWTOhttp://ds9a.nl/lex-yacc/
Google remains your friend (so I’m told)