View
226
Download
1
Embed Size (px)
Citation preview
2
1. Introduction and Example CUP:
» a system for generating LALR parsers from simple specifications.
» like yacc, but» written in Java, uses specifications including
embedded Java code, and produces parser code in Java.
3
Grammar File Format in summary
preamble» declarations to specify how the parser is to be
generated, and supply parts of the runtime code.
» package_decl import_decl » user_code parser_code init_code scan_code
Terminal and Nonterminal declarations (name and type)
precedence and associativity of terminals grammar rules
4
An example CUP Grammar (1. preamble)
/* CUP specification for a simple expression evaluator (no actions) */import java_cup.runtime.*;
// Preliminaries to set up and use the scanner.
init with {: scanner.init(); :}; scan with {: return scanner.next_token(); :};
5
2. Declaration of terminals and nonterminals
/* Terminals (tokens returned by the scanner). */
terminal PLUS, MINUS, TIMES, DIVIDE, MOD; terminal UMINUS, LPAREN, RPAREN; SEMI, terminal Integer NUMBER; /* Non terminals */ non terminal expr_list, expr_part; nonterminal Integer expr, term, factor;
// no type ==> no value associated with the symbol
6
3. Precedences and association
/* Precedences */ precedence left PLUS, MINUS;precedence left TIMES, DIVIDE, MOD; precedence left UMINUS;
// Rules:1. Terminals appearing at the same line has
the same precedence.2. A < B iff the line A appears is above the
line that B occurs 3. possible associativity: left, right and
nonassoc.
7
4. The production rules/* The grammar without actions*/ expr_list ::= expr_list expr_part | expr_part;expr_part ::= expr SEMI; expr ::= expr PLUS expr | expr MINUS expr | expr TIMES expr | expr DIVIDE
expr | expr MOD expr | MINUS expr %prec UMINUS | LPAREN expr RPAREN | NUMBER ;
8
How to use CUP prepare a grammar file (say, parser.cup) invoke:
» java java_cup.Main <parser.cup or» java java_cup.Main parser.cup
Then two files produced:» sym.java // contains constant decls, one for
each terminal (and nonterminal); used by scanner to refer to terminals.
» parser.java // implement the parser; containing two classes.
9
Grammar rules with action codes
/* The grammar */
expr_list ::= expr_list expr_part
|
expr_part;
expr_part ::= expr:e
{: System.out.println("= " + e); :}
SEMI
;
10
Grammar rules with action codes
expr ::= expr:e1 PLUS expr:e2 {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} | expr:e1 MINUS expr:e2
{: RESULT = new Integer(e1.intValue() - e2.intValue()); :} | expr:e1 TIMES expr:e2 {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} | expr:e1 DIVIDE expr:e2 {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} | expr:e1 MOD expr:e2 {: RESULT = new Integer(e1.intValue() % e2.intValue()); :} | NUMBER:n {: RESULT = n; :} | MINUS expr:e {: RESULT =
new Integer(0 - e.intValue()); :} %prec UMINUS | LPAREN expr:e RPAREN {: RESULT = e; :} ;
11
Variables available on the action code
RESULT : bound to the value of lhs node : name assigned to each symbol on the rhs.
» left, right : int variables for left and right position of the phrase in the input stream.
Example:
expr ::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(
e1.intValue() + e2.intValue()); :}
// here e1left and e1rigth are both usable.
12
sym
myParser
$CUP$Parser$action
Scanner
Symbol
lr_parser
MyScanner
java_cup.runtimemyPackage(generated by CUP)
Relationship for related CUP classes
13
Typical client code/* create a parsing object */ parser parser_obj = new parser(new
MyScanner( new FileReader(“myFileName”)); /* open input files, etc. here */ Symbol parse_tree = null; try { if (do_debug_parse) parse_tree = parser_obj.debug_parse(); else parse_tree = parser_obj.parse(); } catch (Exception e) { /* do cleanup here - - possibly rethrow e */ } finally { /* do close out here */ }
14
The java_cup.runtime.Symbol
class public class Symbol {
» public Symbol(int id, int l, int r, Object o)» public Symbol(int id, Object o)» public Symbol(int id, int l, int r) …
public int sym; // kind of Symbol
public int parse_state; // used while staying in stack
boolean used_by_parser = false;
public int left, right; // left right position in input stream
public Object value; // filed for storing semantic value.
public String toString();
}
15
The scanner interfacepackage java_cup.runtime;
public interface Scanner {
/** Return the next token, or <code>null</code>
on end-of-file. */
public Symbol next_token() throws
java.lang.Exception;
}
16
java_cup.runtime.lr_parser
public abstract class lr_parser {
public lr_parser() { }
public lr_parser(Scanner s) { this(); setScanner(s) }
private Scanner _scanner;
public void setScanner(Scanner s) { _scanner = s; }
public Scanner getScanner() { return _scanner; }
public void user_init() throws java.lang.Exception {}
// { %initWithCode } in subclass
17
public Symbol scan() throws java.lang.Exception {
Symbol sym = getScanner().next_token();
return (sym!=null) ? sym : new Symbol(EOF_sym());
}
public void report_fatal_error(String message, Object info) throws java.lang.Exception …
public Symbol parse() throws java.lang.Exception
18
actual code of parse()public Symbol parse() throws java.lang.Exception { /* the current action code */ int act; // the Symbol/stack element returned by a reduce Symbol lhs_sym = null; /* information about production being reduced with */ short handle_size, lhs_sym_num; /* set up direct reference to tables to drive the parser */ production_tab = production_table(); action_tab = action_table(); reduce_tab = reduce_table();
19
/* initialize the action encapsulation object */ init_actions(); // { $generatedByCUP } /* do user initialization */ user_init(); // { $initWithCode }
/* get the first token */ cur_token = scan();
/* push dummy Symbol with start state to get us underway */ stack.removeAllElements(); stack.push(new Symbol(0, start_state())); tos = 0;
20
/* continue until we are told to stop */
for (_done_parsing = false; !_done_parsing ; ) {
/* Check current token for freshness. */
if (cur_token.used_by_parser) throw new Error( … );
/* current state is always on the top of the stack */
/* look up action out of the current state with the current input */
act = get_action(((Symbol)stack.peek()).parse_state, cur_token.sym);
21
/* decode the action -- > 0 encodes shift */
if (act > 0) { /* shift */
cur_token.parse_state = act-1;
cur_token.used_by_parser = true;
stack.push(cur_token); tos++;
/* advance to the next Symbol */
cur_token = scan(); }
/* if its less than zero, then it encodes a reduce action */
else if (act < 0) { /* reduce, first arg is action number */
22
else if (act < 0) { /* reduce, first arg is action number */lhs_sym = do_action((-act)-1, this, stack, tos);
/* look up information about the production */ lhs_sym_num = production_tab[(-act)-1][0]; handle_size = production_tab[(-act)-1][1]; /* pop the handle off the stack */ for (int i = 0; i < handle_size; i++)
{ stack.pop(); tos--; } /* look up the state to go to from the one popped back to */
act = get_reduce(((Symbol)stack.peek()).parse_state, lhs_sym_num); /* shift to that state */ lhs_sym.parse_state = act;lhs_sym.used_by_parser = true; stack.push(lhs_sym); tos++; }
23
/* finally if the entry is zero, we have an error */ else if (act == 0) { /* call user syntax error reporting routine */
syntax_error(cur_token); /* try to error recover, arg is debug or not */ if (!error_recovery(false)) { /* if that fails give up with a fatal syntax error */
unrecovered_syntax_error(cur_token); /* just in case that wasn't fatal enough, end parse */
done_parsing(); } else { // continue to parse lhs_sym = (Symbol)stack.peek(); }
} } return lhs_sym; }
24
public class parser extends lr_parser{
$parserCode public Symbol scan()
{ $scanWithCode }problic user_init() {
$initWithCode }…}
class lr_parser {
public Symbol parse() {…
user_init() ;…
// getFirstToken()… scan() …
}
class CUP$Parser$action {$actionCode
…Symbol CUP$parser$do_action
{ … $actionCodeInRules …}
}
Where the user codes are put in the generated class
25
Usage of various user code components
action code : // action code {: … :}» declaration of additional variables used in grammar
rules, e.g., symbol table,..» helper methods used in rules
parser code // parser with {: … :}» for customizing the generated parser» e.g., new error_report methods, scan(), …
user init code // init with {: … :}» initialize your scanner, tables, data structures required
for semantic actions scan code // scan with {: :}
» tell how to get a token from your scanner
26
Running CUP -package name // default is the empty package -parser name // name.java is the generated file
// default is parser. -symbols name // default is sym -interface
» Outputs the symbol constant code as an interface rather than as a class.
-nonterms // include also noterminal constant symbols -expect number // of conflicts to abort parsing -compact_red // enable table compaction for reduction -nowarn // warnings suppressed
27
-nosummary -progress -dump_grammar -dump_states -dump_tables -dump -time -debug -nopositions // => elefr, eright cause error -noscanner // obsolete -version
28
Scanner Interface1. Your scanner must implement the Scanner interface:
package java_cup.runtime; public interface Scanner { public Symbol next_token() throws java.lang.Exception;
}
2. lr_parser use scan() to fetch next token : public Symbol scan() throws java.lang.Exception { Symbol sym = getScanner().next_token(); return (sym!=null) ? sym : new Symbol(EOF_sym());
} You can override scan() in your parser code
( $scanWithCode )
29
3. The generated parser also contains a constructor :
public parser(Scanner)
which calls setScanner() with it. In most cases, the init with and scan with directives may be omitted.
You can simply create the parser with a reference to the desired scanner:
parser parser_obj = new parser(new my_scanner());
or set the scanner after the parser is created:
parser parser_obj = new parser(); parser_obj.setScanner(new my_scanner());
30
4. Integration with JLex Put the following three lines in the directives region. %implements java_cup.runtime.Scanner %function next_token %type java_cup.runtime.Symbol or simply:%cup Invoking the parser with the JLex scanner is then simply: parser parser_obj = new parser( new Yylex( some_InputStream_or_Reader)); Note CUP handles the JLex/JFlex convention of returning
null on EOF without a problem, so an %eofval directive is not required in the JLex specification (added in CUP 0.10k).
31
Syntax Error Handling Programs may contain errors at different levels:
» Lexical, such as misspelling an identifier, keyword, or operator.
» Syntactic, such as an arithmetic expression with unbalanced parentheses.
» Semantic, such as an operator applied to an incompatible operand.
» Logical, such as an infinitely recursive call. Goal of the error handler in a parser:
» Report the presence of errors clearly and accurately. » Recover from each error quickly enough to be able to
detect subsequent errors. » Do not significantly slow down the processing of correct
programs.
32
Error recovery strategies Panic mode recovery.
» On discovering an error, discard input symbols one at a time until one of a designated set of synchronizing tokens (normally delimiters such as semicolon or keyword end) is found.
» Simple but may skip a considerable amount of input. Phrase level recovery.
» Perform a local correction on the remaining input: replace a prefix of the remaining input by some string that allows the parse to continue.
» For example, replace a comma by a semicolon, delete an extra semicolon, or insert a missing semicolon.
» First used in the top town parsing.
33
Error Recovery Strategies Error productions.
» Use the language grammar augmented by some error production rules to detect common errors. These extra rules generate erroneous constructs.
Global correction. » Still perform a local correction but with global
knowledge leading to smallest replacements.
34
Error recovery in LL(1) parsers
Use panic mode recovery. When an error occurs looking for nonterminal A,
scan until an element of FOLLOW(A) is found, then pop A and continue.
If we can't match a terminal on the top of stack: » 1. pop the terminal » 2. print a message saying the terminal was
inserted » 3. continue the parse
35
Error recovery in shift-reduce parsers
The problem » encounter an invalid token » bad pieces of tree hanging from stack » incorrect entries in symbol table (type error… )
We want to parse the rest of the file » Restarting the parser » find a restartable state on the stack » move to a consistent place in the input » print an informative message (line number)
36
Error Recovery in CUP and YACC
Supports a special terminal symbol: error. » need not be declared.» can matches an erroneous input sequence.» error show synchronization point
When an error is discovered » pops the stack until error transition is legal » skips input tokens until it matches a certain
number (3) of tokens » error productions can have actions
37
Error recovery in yaccstmtList ::= stmt SEMI | stmtList stmt
SEMI ;
can be augmented with error
stmtList ::= stmt SEMI | stmtList stmt SEMI error SEMI ; This should
» throw out the erroneous statement » synchronize at SEMI. » invoke syntax_error()
38
Example production 2:
stmt ::= expr SEMI | while_stmt SEMI | if_stmt SEMI | ...
| error SEMI ; If error occurs
» 1. syntax_error() reported, and » 2. [recovery:] throw out the erroneous expr (or
while_statement or if_statement) statement » 3. synchronize at SEMI. » An error is considered to be recovered iff a sufficient
number of tokens past the error symbol can be successfully parsed. (= error_sync_size() method of the parser and defaults to 3).
Other natural places for errors » missing parentheses or brackets » extra operator or missing operator