30
JAVACC – Tutorial Table of Contents JavaCC [tm]: LOOKAHEAD MiniTutorial........................................................................2 1. WHAT IS LOOKAHEAD?........................................................................................2 2. CHOICE POINTS IN JAVACC GRAMMARS.........................................................4 3. THE DEFAULT CHOICE DETERMINATION ALGORITHM...............................5 4. MULTIPLE TOKEN LOOKAHEAD SPECIFICATIONS........................................7 4.1. SETTING A GLOBAL LOOKAHEAD SPECIFICATION...............................8 4.2. SETTING A LOCAL LOOKAHEAD SPECIFICATION..................................8 5. SYNTACTIC LOOKAHEAD..................................................................................10 6. SEMANTIC LOOKAHEAD....................................................................................12 7. GENERAL STRUCTURE OF LOOKAHEAD........................................................13 JavaCC [tm]: Error Reporting and Recovery.....................................................................14 JavaCC [tm]: TokenManager MiniTutorial.......................................................................18 JavaCC [tm]: API Routines...............................................................................................23 Non-Terminals in the Input Grammar......................................................................23 API for Parser Actions..............................................................................................23 The Token Manager Interface..................................................................................23 Constructors and Other Initialization Routines........................................................24 The Token Class.......................................................................................................26 Reading Tokens from the Input Stream....................................................................27 Working with Debugger Tracing..............................................................................28 Customizing Error Messages....................................................................................28 JavaCC [tm]: JJTree...........................................................................................................29 JJTree parser methods...................................................................................................29 The Node interface........................................................................................................30 - 1 -

JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

JAVACC – Tutorial

Table of Contents

JavaCC [tm]: LOOKAHEAD MiniTutorial........................................................................21. WHAT IS LOOKAHEAD?........................................................................................22. CHOICE POINTS IN JAVACC GRAMMARS.........................................................43. THE DEFAULT CHOICE DETERMINATION ALGORITHM...............................54. MULTIPLE TOKEN LOOKAHEAD SPECIFICATIONS........................................7

4.1. SETTING A GLOBAL LOOKAHEAD SPECIFICATION...............................84.2. SETTING A LOCAL LOOKAHEAD SPECIFICATION..................................8

5. SYNTACTIC LOOKAHEAD..................................................................................106. SEMANTIC LOOKAHEAD....................................................................................127. GENERAL STRUCTURE OF LOOKAHEAD........................................................13

JavaCC [tm]: Error Reporting and Recovery.....................................................................14JavaCC [tm]: TokenManager MiniTutorial.......................................................................18JavaCC [tm]: API Routines...............................................................................................23

Non­Terminals in the Input Grammar......................................................................23API for Parser Actions..............................................................................................23The Token Manager Interface..................................................................................23Constructors and Other Initialization Routines........................................................24The Token Class.......................................................................................................26Reading Tokens from the Input Stream....................................................................27Working with Debugger Tracing..............................................................................28Customizing Error Messages....................................................................................28

JavaCC [tm]: JJTree...........................................................................................................29JJTree parser methods...................................................................................................29The Node interface........................................................................................................30

­ 1 ­

Page 2: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

JavaCC [tm]: LOOKAHEAD MiniTutorial

This minitutorial is under preparation. This tutorial refers to examples that are availablein the Lookahead directory under the examples directory of the release. Currently, thispage is a copy of the contents of the README file within that directory. This directory contains the tutorial on LOOKAHEAD along with allexamples used in the tutorial.

We assume that you have already taken a look at some of the simpleexamples provided in the release before you read this section.

1. WHAT IS LOOKAHEAD?The job of a parser is to read an input stream and determine whetheror not the input stream conforms to the grammar.

This determination in its most general form can be quite timeconsuming. Consider the following example (file Example1.jj):

----------------------------------------------------------------

void Input() : {} { "a" BC() "c" }

void BC() : {} { "b" [ "c" ] }

In this simple example, it is quite clear that there are exactly twostrings that match the above grammar, namely:

abc abcc

The general way to perform this match is to walk through the grammarbased on the string as follows. Here, we use "abc" as the inputstring:

Step 1. There is only one choice here - the first input charactermust be 'a' - and since that is indeed the case, we are OK.

Step 2. We now proceed on to non-terminal BC. Here again, there isonly one choice for the next input character - it must be 'b'. Theinput matches this one too, so we are still OK.

Step 3. We now come to a "choice point" in the grammar. We can eithergo inside the [...] and match it, or ignore it altogether. We decideto go inside. So the next input character must be a 'c'. We areagain OK.

Step 4. Now we have completed with non-terminal BC and go back tonon-terminal Input. Now the grammar says the next character must beyet another 'c'. But there are no more input characters. So we havea problem.

­ 2 ­

Page 3: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

Step 5. When we have such a problem in the general case, we concludethat we may have made a bad choice somewhere. In this case, we madethe bad choice in Step 3. So we retrace our steps back to step 3 andmake another choice and try that. This process is called"backtracking".

Step 6. We have now backtracked and made the other choice we couldhave made at Step 3 - namely, ignore the [...]. Now we have completedwith non-terminal BC and go back to non-terminal Input. Now thegrammar says the next character must be yet another 'c'. The nextinput character is a 'c', so we are OK now.

Step 7. We realize we have reached the end of the grammar (end ofnon-terminal Input) successfully. This means we have successfullymatched the string "abc" to the grammar.

----------------------------------------------------------------

As the above example indicates, the general problem of matching aninput with a grammar may result in large amounts of backtracking andmaking new choices and this can consume a lot of time. The amount oftime taken can also be a function of how the grammar is written. Notethat many grammars can be written to cover the same set of inputs - orthe same language (i.e., there can be multiple equivalent grammars forthe same input language).

----------------------------------------------------------------

For example, the following grammar would speed up the parsing of thesame language as compared to the previous grammar:

void Input() : {} { "a" "b" "c" [ "c" ] }

while the following grammar slows it down even more since the parserhas to backtrack all the way to the beginning:

void Input() : {} { "a" "b" "c" "c" | "a" "b" "c" }

One can even have a grammar that looks like the following:

void Input() : {} { "a" ( BC1() | BC2() ) }

void BC1() : {} { "b" "c" "c" }

void BC2() : {} { "b" "c" [ "c" ] }

­ 3 ­

Page 4: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

This grammar can match "abcc" in two ways, and is therefore considered"ambiguous".

----------------------------------------------------------------

The performance hit from such backtracking is unacceptable for mostsystems that include a parser. Hence most parsers do not backtrack inthis general manner (or do not backtrack at all), rather they makedecisions at choice points based on limited information and thencommit to it.

Parsers generated by Java Compiler Compiler [tm] make decisions at choicepoints based on some exploration of tokens further ahead in the inputstream, and once they make such a decision, they commit to it. i.e.,No backtracking is performed once a decision is made.

The process of exploring tokens further in the input stream is termed"looking ahead" into the input stream - hence our use of the term"LOOKAHEAD".

Since some of these decisions may be made with less than perfectinformation (JavaCC [tm] will warn you in these situations, so you don'thave to worry), you need to know something about LOOKAHEAD to makeyour grammar work correctly.

The two ways in which you make the choice decisions work properly are:

. Modify the grammar to make it simpler.

. Insert hints at the more complicated choice points to help the parser make the right choices.

2. CHOICE POINTS IN JAVACC GRAMMARS

There are 4 different kinds of choice points in JavaCC:

. An expansion of the form: ( exp1 | exp2 | ... ). In this case, the generated parser has to somehow determine which of exp1, exp2, etc. to select to continue parsing.

. An expansion of the form: ( exp )?. In this case, the generated parser must somehow determine whether to choose exp or to continue beyond the ( exp )? without choosing exp. Note: ( exp )? may also be written as [ exp ].

. An expansion of the form ( exp )*. In this case, the generated parser must do the same thing as in the previous case, and furthermore, after each time a successful match of exp (if exp was chosen) is completed, this choice determination must be made again.

. An expansion of the form ( exp )+. This is essentially similar to the previous case with a mandatory first match to exp.

Remember that token specifications that occur within angularbrackets <...> also have choice points. But these choices are madein different ways and are the subject of a different tutorial.

----------------------------------------------------------------

­ 4 ­

Page 5: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

3. THE DEFAULT CHOICE DETERMINATIONALGORITHMThe default choice determination algorithm looks ahead 1 token in theinput stream and uses this to help make its choice at choice points.

The following examples will describe the default algorithm fully:

----------------------------------------------------------------

Consider the following grammar (file Example2.jj):

void basic_expr() : {} { <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 }

The choice determination algorithm works as follows:

if (next token is <ID>) { choose Choice 1 } else if (next token is "(") { choose Choice 2 } else if (next token is "new") { choose Choice 3 } else { produce an error message }

----------------------------------------------------------------

In the above example, the grammar has been written such that thedefault choice determination algorithm does the right thing. Anotherthing to note is that the choice determination algorithm works in atop to bottom order - if Choice 1 was selected, the other choices arenot even considered. While this is not an issue in this example(except for performance), it will become important later below whenlocal ambiguities require the insertion of LOOKAHEAD hints.

Suppose the above grammar was modified to (file Example3.jj):

void basic_expr() : {} { <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 | <ID> "." <ID> // Choice 4 }

Then the default algorithm will always choose Choice 1 when the nextinput token is <ID> and never choose Choice 4 even if the tokenfollowing <ID> is a ".". More on this later.

You can try running the parser generated from Example3.jj on the input"id1.id2". It will complain that it encountered a "." when it wasexpecting a "(". Note - when you built the parser, it would havegiven you the following warning message:

­ 5 ­

Page 6: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

Warning: Choice conflict involving two expansions at line 25, column 3 and line 31, column 3 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion.

Essentially, JavaCC is saying it has detected a situation in yourgrammar which may cause the default lookahead algorithm to do strangethings. The generated parser will still work using the defaultlookahead algorithm - except that it may not do what you expect of it.

----------------------------------------------------------------

Now consider the following example (file Example 4.jj):

void identifier_list() : {} { <ID> ( "," <ID> )* }

Suppose the first <ID> has already been matched and that the parserhas reached the choice point (the (...)* construct). Here's how thechoice determination algorithm works:

while (next token is ",") { choose the nested expansion (i.e., go into the (...)* construct) consume the "," token if (next token is <ID>) consume it, otherwise report error }

----------------------------------------------------------------

In the above example, note that the choice determination algorithmdoes not look beyond the (...)* construct to make its decision.Suppose there was another production in that same grammar as follows(file Example5.jj):

void funny_list() : {} { identifier_list() "," <INT> }

When the default algorithm is making a choice at ( "," <ID> )*, itwill always go into the (...)* construct if the next token is a ",".It will do this even when identifier_list was called from funny_listand the token after the "," is an <INT>. Intuitively, the right thingto do in this situation is to skip the (...)* construct and return tofunny_list. More on this later.

As a concrete example, suppose your input was "id1, id2, 5", theparser will complain that it encountered a 5 when it was expecting an<ID>. Note - when you built the parser, it would have given you thefollowing warning message:

Warning: Choice conflict in (...)* construct at line 25, column 8. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "," Consider using a lookahead of 2 or more for nested expansion.

Essentially, JavaCC is saying it has detected a situation in yourgrammar which may cause the default lookahead algorithm to do strangethings. The generated parser will still work using the defaultlookahead algorithm - except that it may not do what you expect of it.

----------------------------------------------------------------

We have shown you examples of two kinds of choice points in the

­ 6 ­

Page 7: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

examples above - "exp1 | exp2 | ...", and "(exp)*". The other twokinds of choice points - "(exp)+" and "(exp)?" - behave similarly to(exp)* and we will not be providing examples of their use here.

4. MULTIPLE TOKEN LOOKAHEADSPECIFICATIONS

So far, we have described the default lookahead algorithm of thegenerated parsers. In the majority of situations, the defaultalgorithm works just fine. In situations where it does not workwell, Java Compiler Compiler provides you with warning messages likethe ones shown above. If you have a grammar that goes throughJava Compiler Compiler without producing any warnings, then thegrammar is a LL(1) grammar. Essentially, LL(1) grammars are thosethat can be handled by top-down parsers (such as those generatedby Java Compiler Compiler) using at most one token of LOOKAHEAD.

When you get these warning messages, you can do one of two things.

----------------------------------------------------------------

Option 1

You can modify your grammar so that the warning messages go away.That is, you can attempt to make your grammar LL(1) by making somechanges to it.

The following (file Example6.jj) shows how you may change Example3.jjto make it LL(1):

void basic_expr() : {} { <ID> ( "(" expr() ")" | "." <ID> ) | "(" expr() ")" | "new" <ID> }

What we have done here is to factor the fourth choice into the firstchoice. Note how we have placed their common first token <ID> outsidethe parentheses, and then within the parentheses, we have yet anotherchoice which can now be performed by looking at only one token in theinput stream and comparing it with "(" and ".". This process ofmodifying grammars to make them LL(1) is called "left factoring".

The following (file Example7.jj) shows how Example5.jj may be changedto make it LL(1):

void funny_list() : {} { <ID> "," ( <ID> "," )* <INT> }

Note that this change is somewhat more drastic.

----------------------------------------------------------------

Option 2

You can provide the generated parser with some hints to help it outin the non-LL(1) situations that the warning messages bring to your

­ 7 ­

Page 8: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

attention.

All such hints are specified using either setting the global LOOKAHEADvalue to a larger value (see below) or by using the LOOKAHEAD(...)construct to provide a local hint.

A design decision must be made to determine if Option 1 or Option 2 isthe right one to take. The only advantage of choosing Option 1 isthat it makes your grammar perform better. JavaCC generated parserscan handle LL(1) constructs much faster than other constructs.However, the advantage of choosing Option 2 is that you have a simplergrammar - one that is easier to develop and maintain - one thatfocuses on human-friendliness and not machine-friendliness.

Sometimes Option 2 is the only choice - especially in the presence ofuser actions. Suppose Example3.jj contained actions as shown below:

void basic_expr() : {} { { initMethodTables(); } <ID> "(" expr() ")" | "(" expr() ")" | "new" <ID> | { initObjectTables(); } <ID> "." <ID> }

Since the actions are different, left-factoring cannot be performed.

----------------------------------------------------------------

4.1. SETTING A GLOBAL LOOKAHEAD SPECIFICATION

You can set a global LOOKAHEAD specification by using the option"LOOKAHEAD" either from the command line, or at the beginning of thegrammar file in the options section. The value of this option is aninteger which is the number of tokens to look ahead when making choicedecisions. As you may have guessed, the default value of this optionis 1 - which derives the default LOOKAHEAD algorithm described above.

Suppose you set the value of this option to 2. Then the LOOKAHEADalgorithm derived from this looks at two tokens (instead of just onetoken) before making a choice decision. Hence, in Example3.jj, choice1 will be taken only if the next two tokens are <ID> and "(", whilechoice 4 will be taken only if the next two tokens are <ID> and ".".Hence, the parser will now work properly for Example3.jj. Similarly,the problem with Example5.jj also goes away since the parser goes intothe (...)* construct only when the next two tokens are "," and <ID>.

By setting the global LOOKAHEAD to 2, the parsing algorithmessentially becomes LL(2). Since you can set the global LOOKAHEAD toany value, parsers generated by Java Compiler Compiler are calledLL(k) parsers.

----------------------------------------------------------------

4.2. SETTING A LOCAL LOOKAHEAD SPECIFICATION

You can also set a local LOOKAHEAD specification that affects only a

­ 8 ­

Page 9: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

specific choice point. This way, the majority of the grammar canremain LL(1) and hence perform better, while at the same time one getsthe flexibility of LL(k) grammars. Here's how Example3.jj is modifiedwith local LOOKAHEAD to fix the choice ambiguity problem (fileExample8.jj):

void basic_expr() : {} { LOOKAHEAD(2) <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 | <ID> "." <ID> // Choice 4 }

Only the first choice (the first condition in the translation below)is affected by the LOOKAHEAD specification. All others continue touse a single token of LOOKAHEAD:

if (next 2 tokens are <ID> and "(" ) { choose Choice 1 } else if (next token is "(") { choose Choice 2 } else if (next token is "new") { choose Choice 3 } else if (next token is <ID>) { choose Choice 4 } else { produce an error message }

Similarly, Example5.jj can be modified as shown below (fileExample9.jj):

void identifier_list() : {} { <ID> ( LOOKAHEAD(2) "," <ID> )* }

Note, the LOOKAHEAD specification has to occur inside the (...)* whichis the choice is being made. The translation for this construct isshown below (after the first <ID> has been consumed):

while (next 2 tokens are "," and <ID>) { choose the nested expansion (i.e., go into the (...)* construct) consume the "," token consume the <ID> token }

----------------------------------------------------------------

We strongly discourage you from modifying the global LOOKAHEADdefault. Most grammars are predominantly LL(1), hence you will beunnecessarily degrading performance by converting the entire grammarto LL(k) to facilitate just some portions of the grammar that are notLL(1). If your grammar and input files being parsed are very small,then this is okay.

You should also keep in mind that the warning messages JavaCC printswhen it detects ambiguities at choice points (such as the two messagesshown earlier) simply tells you that the specified choice points arenot LL(1). JavaCC does not verify the correctness of your localLOOKAHEAD specification - it assumes you know what you are doing, infact, it really cannot verify the correctness of local LOOKAHEAD's as

­ 9 ­

Page 10: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

the following example of if statements illustrates (fileExample10.jj):

void IfStm() : {} { "if" C() S() [ "else" S() ] }

void S() : {} { ... | IfStm() }

This example is the famous "dangling else" problem. If you have aprogram that looks like:

"if C1 if C2 S1 else S2"

The "else S2" can be bound to either of the two if statements. Thestandard interpretation is that it is bound to the inner if statement(the one closest to it). The default choice determination algorithmhappens to do the right thing, but it still prints the followingwarning message:

Warning: Choice conflict in [...] construct at line 25, column 15. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "else" Consider using a lookahead of 2 or more for nested expansion.

To suppress the warning message, you could simply tell JavaCC thatyou know what you are doing as follows:

void IfStm() : {} { "if" C() S() [ LOOKAHEAD(1) "else" S() ] }

To force lookahead ambiguity checking in such instances, set the optionFORCE_LA_CHECK to true.

----------------------------------------------------------------

5. SYNTACTIC LOOKAHEAD

Consider the following production taken from the Java grammar:

void TypeDeclaration() : {} { ClassDeclaration() | InterfaceDeclaration() }

At the syntactic level, ClassDeclaration can start with any number of"abstract"s, "final"s, and "public"s. While a subsequent semanticcheck will produce error messages for multiple uses of the samemodifier, this does not happen until parsing is completely over.Similarly, InterfaceDeclaration can start with any number of"abstract"s and "public"s.

­ 10 ­

Page 11: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

What if the next tokens in the input stream are a very large number of"abstract"s (say 100 of them) followed by "interface"? It is clearthat a fixed amount of LOOKAHEAD (such as LOOKAHEAD(100) for example)will not suffice. One can argue that this is such a weird situationthat it does not warrant any reasonable error message and that it isokay to make the wrong choice in some pathological situations. Butsuppose one wanted to be precise about this.

The solution here is to set the LOOKAHEAD to infinity - that is set nobounds on the number of tokens to look ahead. One way to do this isto use a very large integer value (such as the largest possibleinteger) as follows:

void TypeDeclaration() : {} { LOOKAHEAD(2147483647) ClassDeclaration() | InterfaceDeclaration() }

One can also achieve the same effect with "syntactic LOOKAHEAD". Insyntactic LOOKAHEAD, you specify an expansion to try out and it thatsucceeds, then the following choice is taken. The above example isrewritten using syntactic LOOKAHEAD below:

void TypeDeclaration() : {} { LOOKAHEAD(ClassDeclaration()) ClassDeclaration() | InterfaceDeclaration() }

Essentially, what this is saying is:

if (the tokens from the input stream match ClassDeclaration) { choose ClassDeclaration() } else if (next token matches InterfaceDeclaration) { choose InterfaceDeclaration() } else { produce an error message }

The problem with the above syntactic LOOKAHEAD specification is thatthe LOOKAHEAD calculation takes too much time and does a lot ofunnecessary checking. In this case, the LOOKAHEAD calculation canstop as soon as the token "class" is encountered, but thespecification forces the calculation to continue until the end of theclass declaration has been reached - which is rather time consuming.This problem can be solved by placing a shorter expansion to try outin the syntactic LOOKAHEAD specification as in the following example:

void TypeDeclaration() : {} { LOOKAHEAD( ( "abstract" | "final" | "public" )* "class" ) ClassDeclaration() | InterfaceDeclaration() }

Essentially, what this is saying is:

if (the nest set of tokens from the input stream are a sequence of "abstract"s, "final"s, and "public"s followed by a "class") {

­ 11 ­

Page 12: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

choose ClassDeclaration() } else if (next token matches InterfaceDeclaration) { choose InterfaceDeclaration() } else { produce an error message }

By doing this, you make the choice determination algorithm stop assoon as it sees "class" - i.e., make its decision at the earliestpossible time.

You can place a bound on the number of tokens to consume duringsyntactic lookahead as follows:

void TypeDeclaration() : {} { LOOKAHEAD(10, ( "abstract" | "final" | "public" )* "class" ) ClassDeclaration() | InterfaceDeclaration() }

In this case, the LOOKAHEAD determination is not permitted to go beyond10 tokens. If it reaches this limit and is still successfully matching( "abstract" | "final" | "public" )* "class", then ClassDeclaration isselected.

Actually, when such a limit is not specified, it defaults to the largestinteger value (2147483647).

----------------------------------------------------------------

6. SEMANTIC LOOKAHEAD

Let us go back to Example1.jj:

void Input() : {} { "a" BC() "c" }

void BC() : {} { "b" [ "c" ] }

Let us suppose that there is a good reason for writing a grammar thisway (maybe the way actions are embedded). As noted earlier, thisgrammar recognizes two string "abc" and "abcc". The problem here isthat the default LL(1) algorithm will choose the [ "c" ] every timeit sees a "c" and therefore "abc" will never be matched. We need tospecify that this choice must be made only when the next token is a"c", and the token following that is not a "c". This is a negativestatement - one that cannot be made using syntactic LOOKAHEAD.

We can use semantic LOOKAHEAD for this purpose. With semanticLOOKAHEAD, you can specify any arbitrary boolean expression whoseevaluation determines which choice to take at a choice point. Theabove example can be instrumented with semantic LOOKAHEAD as follows:

void BC() : {}

­ 12 ­

Page 13: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

{ "b" [ LOOKAHEAD( { getToken(1).kind == C && getToken(2).kind != C } ) <C:"c"> ] }

First we give the token "c" a label C so that we can refer to it fromthe semantic LOOKAHEAD. The boolean expression essentially states thedesired property. The choice determination decision is therefore:

if (next token is "c" and following token is not "c") { choose the nested expansion (i.e., go into the [...] construct) } else { go beyond the [...] construct without entering it. }

This example can be rewritten to combine both syntactic and semanticLOOKAHEAD as follows (recognize the first "c" using syntacticLOOKAHEAD and the absence of the second using semantic LOOKAHEAD):

void BC() : {} { "b" [ LOOKAHEAD( "c", { getToken(2).kind != C } ) <C:"c"> ] }

7. GENERAL STRUCTURE OF LOOKAHEADWe've pretty much covered the various aspects of LOOKAHEAD in theprevious sections. A couple of advanced topics follow. However,we shall now present a formal language reference for LOOKAHEAD inJava Compiler Compiler:

The general structure of a LOOKAHEAD specification is:

LOOKAHEAD( amount, expansion, { boolean_expression } )

"amount" specifies the number of tokens to LOOKAHEAD,"expansion"specifies the expansion to use to perform syntactic LOOKAHEAD, and"boolean_expression" is the expression to use for semanticLOOKAHEAD.

At least one of the three entries must be present. If more thanone are present, they are separated by commas. The default valuesfor each of these entities is defined below:

"amount": - if "expansion is present, this defaults to 2147483647. - otherwise ("boolean_expression" must be present then) this defaults to 0.

Note: When "amount" is 0, no syntactic LOOKAHEAD is performed. Also,"amount" does not affect the semantic LOOKAHEAD.

"expansion":- defaults to the expansion being considered.

"boolean_expression":

- defaults to true.

­ 13 ­

Page 14: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

JavaCC [tm]: Error Reporting and RecoveryThis is a rough document describing the new error recovery features inVersion 0.7.1. This document also describes how features have changedsince Version 0.6.

The first change (from 0.6) is that we have two new exceptions:

. ParseException . TokenMgrError

Whenever the token manager detects a problem, it throws the exceptionTokenMgrError. Previously, it used to print the message:

Lexical Error ...

following which it use to throw the exception ParseError.

Whenever the parser detects a problem, it throws the exceptionParseException. Previously, it used to print the message:

Encountered ... Was expecting one of ...

following which it use to throw the exception ParseError.

In Version 0.7.1, error messages are never printed explicitly,rather this information is stored inside the exception objects thatare thrown. Please see the classes ParseException.java andTokenMgrError.java (that get generated by JavaCC [tm] during parsergeneration) for more details.

If the thrown exceptions are never caught, then a standard action istaken by the virtual machine which normally includes printing thestack trace and also the result of the "toString" method in theexception. So if you do not catch the JavaCC exceptions, a messagequite similar to the ones in Version 0.6.

But if you catch the exception, you must print the message yourself.

Exceptions in the Java [tm] programming language are all subclasses oftype Throwable. Furthermore, exceptions are divided into two broadcategories - ERRORS and other exceptions.

Errors are exceptions that one is not expected to recover from -examples of these are ThreadDeath or OutOfMemoryError. Errors areindicated by subclassing the exception "Error". Exceptions subclassedfrom Error need not be specified in the "throws" clause of methoddeclarations.

Exceptions other than errors are typically defined by subclassing theexception "Exception". These exceptions are typically handled by theuser program and must be declared in throws clauses of methoddeclarations (if it is possible for the method to throw thatexception).

The exception TokenMgrError is a subclass of Error, while theexception ParseException is a subclass of Exception. The reasoninghere is that the token manager is never expected to throw an exception- you must be careful in defining your token specifications such thatyou cover all cases. Hence the suffix "Error" in TokenMgrError. Youdo not have to worry about this exception - if you have designed yourtokens well, it should never get thrown. Whereas it is typical toattempt recovery from Parser errors - hence the name "ParseException".(Although if you still want to recover from token manager errors, youcan do it - it's just that you are not forced to catch them.)

­ 14 ­

Page 15: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

In Version 0.7.1, we have added a syntax to specify additional exceptionsthat may be thrown by methods corresponding to non-terminals. Thissyntax is identical to the Java "throws ..." syntax. Here's anexample of how you use this:

void VariableDeclaration() throws SymbolTableException, IOException : {...} { ... }

Here, VariableDeclaration is defined to throw exceptionsSymbolTableException and IOException in addition to ParseException.

Error Reporting:

The scheme for error reporting is simpler in Version 0.7.1 (as comparedto Version 0.6) - simply modify the file ParseException.java to dowhat you want it to do. Typically, you would modify the getMessagemethod to do your own customized error reporting. All informationregarding these methods can be obtained from the comments in thegenerated files ParseException.java and TokenMgrError.java. It willalso help to understand the functionality of the class Throwable (reada Java book for this).

There is a method in the generated parser called"generateParseException". You can call this method anytime you wishto generate an object of type ParseException. This object willcontain all the choices that the parser has attempted since the lastsuccessfully consumed token.

Error Recovery:

JavaCC offers two kinds of error recovery - shallow recovery and deeprecovery. Shallow recovery recovers if none of the current choiceshave succeeded in being selected, while deep recovery is when a choiceis selected, but then an error happens sometime during the parsing ofthis choice.

Shallow Error Recovery:

We shall explain shallow error recovery using the following example:

void Stm() :{}{ IfStm()| WhileStm()}

Let's assume that IfStm starts with the reserved word "if" and WhileStmstarts with the reserved word "while". Suppose you want to recover byskipping all the way to the next semicolon when neither IfStm nor WhileStmcan be matched by the next input token (assuming a lookahead of 1). Thatis the next token is neither "if" nor "while".

What you do is write the following:

void Stm() :{}{ IfStm()| WhileStm()| error_skipto(SEMICOLON)}

­ 15 ­

Page 16: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

But you have to define "error_skipto" first. So far as JavaCC is concerned,"error_skipto" is just like any other non-terminal. The following is oneway to define "error_skipto" (here we use the standard JAVACODE production):

JAVACODEvoid error_skipto(int kind) { ParseException e = generateParseException(); // generate the exception object. System.out.println(e.toString()); // print the error message Token t; do { t = getNextToken(); } while (t.kind != kind); // The above loop consumes tokens all the way up to a token of // "kind". We use a do-while loop rather than a while because the // current token is the one immediately before the erroneous token // (in our case the token immediately before what should have been // "if"/"while".}

That's it for shallow error recovery. In a future version of JavaCCwe will have support for modular composition of grammars. When thishappens, one can place all these error recovery routines into aseparate module that can be "imported" into the main grammar module.We intend to supply a library of useful routines (for error recoveryand otherwise) when we implement this capability.

Deep Error Recovery:

Let's use the same example that we did for shallow recovery:

void Stm() :{}{ IfStm()| WhileStm()}

In this case we wish to recover in the same way. However, we wish torecover even when there is an error deeper into the parse. Forexample, suppose the next token was "while" - therefore the choice"WhileStm" was taken. But suppose that during the parse of WhileStmsome error is encountered - say one has "while (foo { stm; }" - i.e., theclosing parentheses has been missed. Shallow recovery will not workfor this situation. You need deep recovery to achieve this. For this,we offer a new syntactic entity in JavaCC - the try-catch-finally block.

First, let us rewrite the above example for deep error recovery and thenexplain the try-catch-finally block in more detail:

void Stm() :{}{ try { ( IfStm() | WhileStm() ) catch (ParseException e) { error_skipto(SEMICOLON); }}

That's all you need to do. If there is any unrecovered error during theparse of IfStm or WhileStm, then the catch block takes over. You canhave any number of catch blocks and also optionally a finally block(just as with Java errors). What goes into the catch blocks is *Java code*,not JavaCC expansions. For example, the above example could have been

­ 16 ­

Page 17: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

rewritten as:

void Stm() :{}{ try { ( IfStm() | WhileStm() ) catch (ParseException e) { System.out.println(e.toString()); Token t; do { t = getNextToken(); } while (t.kind != SEMICOLON); }}

Our belief is that it's best to avoid placing too much Java code in thecatch and finally blocks since it overwhelms the grammar reader. Its bestto define methods that you can then call from the catch blocks.

Note that in the second writing of the example, we essentially copiedthe code out of the implementation of error_skipto. But we left out thefirst statement - the call to generateParseException. That's because inthis case, the catch block already provides us with the exception. Buteven if you did call this method, you will get back an identical object.

­ 17 ­

Page 18: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

JavaCC [tm]: TokenManager MiniTutorialThe JavaCC [tm] lexical specification is organized into a set of "lexicalstates". Each lexical state is named with an identifier. There is astandard lexical state called DEFAULT. The generated token manager isat any moment in one of these lexical states. When the token manageris initialized, it starts off in the DEFAULT state, by default. Thestarting lexical state can also be specified as a parameter whileconstructing a token manager object.

Each lexical state contains an ordered list of regular expressions;the order is derived from the order of occurrence in the input file.There are four kinds of regular expressions: SKIP, MORE, TOKEN, andSPECIAL_TOKEN.

All regular expressions that occur as expansion units in the grammarare considered to be in the DEFAULT lexical state and their order ofoccurrence is determined by their position in the grammar file.

A token is matched as follows: All regular expressions in the currentlexical state are considered as potential match candidates. Thetoken manager consumes the maximum number of characters from the inputstream possible that match one of these regular expressions. That is,the token manager prefers the longest possible match. If there aremultiple longest matches (of the same length), the regular expressionthat is matched is the one with the earliest order of occurrence inthe grammar file.

As mentioned above, the token manager is in exactly one state at anymoment. At this moment, the token manager only considers the regularexpressions defined in this state for matching purposes. After a match,one can specify an action to be executed as well as a new lexicalstate to move to. If a new lexical state is not specified, the tokenmanager remains in the current state.

The regular expression kind specifies what to do when a regularexpression has been successfully matched:

SKIP: Simply throw away the matched string (after executing any lexical action).MORE: Continue (to whatever the next state is) taking the matched string along. This string will be a prefix of the new matched string.TOKEN: Create a token using the matched string and send it to the parser (or any caller).SPECIAL_TOKEN: Creates a special token that does not participate in parsing. Already described earlier.

(The mechanism of accessing special tokens is at the end of thispage)

Whenever the end of file <EOF> is detected, it causes the creation ofan <EOF> token (regardless of the current state of the lexicalanalyzer). However, if an <EOF> is detected in the middle of a matchfor a regular expression, or immediately after a MORE regularexpression has been matched, an error is reported.

After the regular expression is matched, the lexical action isexecuted. All the variables (and methods) declared in theTOKEN_MGR_DECLS region (see below) are available here for use. Inaddition, the variables and methods listed below are also availablefor use.

Immediately after this, the token manager changes state to thatspecified (if any).

­ 18 ­

Page 19: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

After that the action specified by the kind of the regular expressionis taken (SKIP, MORE, ... ). If the kind is TOKEN, the matched tokenis returned. If the kind is SPECIAL_TOKEN, the matched token is savedto be returned along with the next TOKEN that is matched.

-------------------------------------------------------------------

The following variables are available for use within lexical actions:

1. StringBuffer image (READ/WRITE):

"image" (different from the "image" field of the matched token) is aStringBuffer variable that contains all the characters that have beenmatched since the last SKIP, TOKEN, or SPECIAL_TOKEN. You are freeto make whatever changes you wish to it so long as you do not assignit to null (since this variable is used by the generated token manageralso). If you make changes to "image", this change is passed on tosubsequent matches (if the current match is a MORE). The content of"image" *does not* automatically get assigned to the "image" fieldof the matched token. If you wish this to happen, you must explicitlyassign it in a lexical action of a TOKEN or SPECIAL_TOKEN regularexpression.

Example:

<DEFAULT> MORE : { "a" : S1 }

<S1> MORE :{ "b" { int l = image.length()-1; image.setCharAt(l, image.charAt(l).toUpperCase()); } ^1 ^2 : S2}

<S2> TOKEN :{ "cd" { x = image; } : DEFAULT ^3}

In the above example, the value of "image" at the 3 points marked by^1, ^2, and ^3 are:

At ^1: "ab"At ^2: "aB"At ^3: "aBcd"

2. int lengthOfMatch (READ ONLY):

This is the length of the current match (is not cumulative over MORE's).See example below. You should not modify this variable.

Example:

Using the same example as above, the values of "lengthOfMatch" are:

At ^1: 1 (the size of "b")At ^2: 1 (does not change due to lexical actions)At ^3: 2 (the size of "cd")

3. int curLexState (READ ONLY):

This is the index of the current lexical state. You should not modifythis variable. Integer constants whose names are those of the lexicalstate are generated into the ...Constants file, so you can refer tolexical states without worrying about their actual index value.

4. inputStream (READ ONLY):

­ 19 ­

Page 20: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

This is an input stream of the appropriate type (one ofASCII_CharStream, ASCII_UCodeESC_CharStream, UCode_CharStream, orUCode_UCodeESC_CharStream depending on the values of optionsUNICODE_INPUT and JAVA_UNICODE_ESCAPE). The stream is currently atthe last character consumed for this match. Methods of inputStreamcan be called. For example, getEndLine and getEndColumn can be calledto get the line and column number information for the current match.inputStream may not be modified.

5. Token matchedToken (READ/WRITE):

This variable may be used only in actions associated with TOKEN andSPECIAL_TOKEN regular expressions. This is set to be the token thatwill get returned to the parser. You may change this variable andthereby cause the changed token to be returned to the parser insteadof the original one. It is here that you can assign the value ofvariable "image" to "matchedToken.image". Typically that's how yourchanges to "image" has effect outside the lexical actions.

Example:

If we modify the last regular expression specification of theabove example to:

<S2> TOKEN :{ "cd" { matchedToken.image = image.toString(); } : DEFAULT}

Then the token returned to the parser will have its ".image" fieldset to "aBcd". If this assignment was not performed, then the".image" field will remain as "abcd".

6. void SwitchTo(int):

Calling this method switches you to the specified lexical state. Thismethod may be called from parser actions also (in addition to beingcalled from lexical actions). However, care must be taken when usingthis method to switch states from the parser since the lexicalanalysis could be many tokens ahead of the parser in the presence oflarge lookaheads. When you use this method within a lexical action,you must ensure that it is the last statement executed in the action(otherwise, strange things could happen). If there is a state changespecified using the ": state" syntax, it overrides all switchTo calls,hence there is no point having a switchTo call when there is anexplicit state change specified. In general, calling this methodshould be resorted to only when you cannot do it any other way. Usingthis method of switching states also causes you to lose some of thesemantic checking that JavaCC does when you use the standard syntax.

-------------------------------------------------------------------

Lexical actions have access to a set of class level declarations.These declarations are introduced within the JavaCC file using thefollowing syntax:

token_manager_decls ::= "TOKEN_MGR_DECLS" ":" "{" java_declarations_and_code "}"

These declarations are accessible from all lexical actions.

EXAMPLES--------

Example 1: Comments

­ 20 ­

Page 21: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

SKIP :{ "/*" : WithinComment}

<WithinComment> SKIP :{ "*/" : DEFAULT}

<WithinComment> MORE :{ <~[]>}

Example 2: String Literals with actions to print the length of thestring:

TOKEN_MGR_DECLS :{ int stringSize;}

MORE :{ "\"" {stringSize = 0;} : WithinString}

<WithinString> TOKEN :{ <STRLIT: "\""> {System.out.println("Size = " + stringSize);} : DEFAULT}

<WithinString> MORE :{ <~["\n","\r"]> {stringSize++;}}

HOW SPECIAL TOKENS ARE SENT TO THE PARSER:

Special tokens are like tokens, except that they are permitted toappear anywhere in the input file (between any two tokens). Specialtokens can be specified in the grammar input file using the reservedword "SPECIAL_TOKEN" instead of "TOKEN" as in:

SPECIAL_TOKEN :{ <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>}

Any regular expression defined to be a SPECIAL_TOKEN may be accessedin a special manner from user actions in the lexical and grammarspecifications. This allows these tokens to be recovered duringparsing while at the same time these tokens do not participate in theparsing.

JavaCC has been bootstrapped to use this feature to automaticallycopy relevant comments from the input grammar file into the generatedfiles.

Details:

The class Token now has an additional field:

Token specialToken;

­ 21 ­

Page 22: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

This field points to the special token immediately prior to thecurrent token (special or otherwise). If the token immediately priorto the current token is a regular token (and not a special token),then this field is set to null. The "next" fields of regular tokenscontinue to have the same meaning - i.e., they point to the nextregular token except in the case of the EOF token where the "next"field is null. The "next" field of special tokens point to thespecial token immediately following the current token. If the tokenimmediately following the current token is a regular token, the "next"field is set to null.

This is clarified by the following example. Suppose you wish to printall special tokens prior to the regular token "t" (but only those thatare after the regular token before "t"):

if (t.specialToken == null) return; // The above statement determines that there are no special tokens // and returns control to the caller. Token tmp_t = t.specialToken; while (tmp_t.specialToken != null) tmp_t = tmp_t.specialToken; // The above line walks back the special token chain until it // reaches the first special token after the previous regular // token. while (tmp_t != null) { System.out.println(tmp_t.image); tmp_t = tmp_t.next; } // The above loop now walks the special token chain in the forward // direction printing them in the process.

­ 22 ­

Page 23: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

JavaCC [tm]: API Routines

This web page is a comprehensive list of all classes, methods, and variables available foruse by a JavaCC [tm] user. These classes, methods, and variables are typically used fromthe actions that are embedded in a JavaCC grammar. In the sample code used below, it isassumed that the name of the generated parser is "TheParser". 

Non­Terminals in the Input Grammar

For each non­terminal NT in the input grammar file, the following method is generatedinto the parser class: 

● returntype NT(parameters) throws ParseError; 

Here, returntype and parameters are what were specified in the JavaCC input file in thedefinition of NT (where NT occurred on the left­hand side). 

When this method is called, the input stream is parsed to match this non­terminal. On asuccessful parse, this method returns normally. On detection of a parse error, an errormessage is displayed and the method returns by throwing the exception ParseError. 

Note that all non­terminals in a JavaCC input grammar have equal status; it is possibleto parse to any non­terminal by calling the non­terminal's method. 

API for Parser Actions

● Token token; This variable holds the last token consumed by the parser and can be used inparser actions. This is exactly the same as the token returned by getToken(0). 

In addition, the two methods ­ getToken(int i) and getNextToken() can also be used inactions to traverse the token list. 

The Token Manager Interface

Typically, the token manager interface is not to be used. Instead all access must be madethrough the parser interface. However, in certain situations ­ such as if you are not

­ 23 ­

Page 24: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

building a parser and building only the token manager ­ the token manager interface isuseful. The token manager provides the following routine: 

● Token getNextToken() throws ParseError; 

Each call to this method returns the next token in the input stream. This method throws aParseError exception when there is a lexical error, i.e., it could not find a match for any ofthe specified tokens from the input stream. The type Token is described later. 

Constructors and Other Initialization Routines

● TheParser.TheParser(java.io.InputStream stream) This creates a new parser object, which in turn creates a new token manager objectthat reads its tokens from "stream". This constructor is available only when boththe options USER_TOKEN_MANAGER and USER_CHAR_STREAM are false.If the option STATIC is true, this constructor (along with other constructors) canbe called exactly once to create a single parser object. 

● TheParser.TheParser(CharStream stream) Similar to the previous constructor, except that this one is available only when theoption USER_TOKEN_MANAGER is false and USER_CHAR_STREAM istrue. 

● void TheParser.ReInit(java.io.InputStream stream) This reinitializes an existing parser object. In addition, it also reinitializes theexisting token manager object that corresponds to this parser object. The result isa parser object with the exact same functionality as one that was created with theconstructor above. The only difference is that new objects are not created. Thismethod is available only when both the options USER_TOKEN_MANAGER andUSER_CHAR_STREAM are false. If the option STATIC is true, this (along withthe other ReInit methods) is the only way to restart a parse operation for there isonly one parser and all one can do is reinitialize it. 

● void TheParser.ReInit(CharStream stream) Similar to the previous method, except that this one is available only when theoption USER_TOKEN_MANAGER is false and USER_CHAR_STREAM istrue. 

● TheParser(TheParserTokenManager tm) This creates a new parser object whichuses an already created token manager object "tm" as its token manager. Thisconstructor is only available if option USER_TOKEN_MANAGER is false. If theoption STATIC is true, this constructor (along with other constructors) can becalled exactly once to create a single parser object. 

­ 24 ­

Page 25: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

● TheParser(TokenManager tm) Similar to the previous constructor, except that this one is available only when theoption USER_TOKEN_MANAGER is true. 

● void TheParser.ReInit(TheParserTokenManager tm) This reinitializes an existing parser object with the token manager object "tm" asits new token manager. This method is only available if optionUSER_TOKEN_MANAGER is false. If the option STATIC is true, this (alongwith the other ReInit methods) is the only way to restart a parse operation forthere is only one parser and all one can do is reinitialize it. 

● void TheParser.ReInit(TokenManager tm) Similar to the previous method, except that this one is available only when theoption USER_TOKEN_MANAGER is true. 

● TheParserTokenManager.TheParserTokenManager(CharStream stream) Creates a new token manager object initialized to read input from "stream". Whenthe option STATIC is true, this constructor may be called only once. This isavailable only when USER_TOKEN_MANAGER is false andUSER_CHAR_STREAM is true. When USER_TOKEN_MANAGER is false andUSER_CHAR_STREAM is false (the default situation), a constructor similar tothe one above is available with the type CharStream replaced as follows: 

● When JAVA_UNICODE_ESCAPE is false and UNICODE_INPUT isfalse, CharStream is replaced by ASCII_CharStream. 

● When JAVA_UNICODE_ESCAPE is false and UNICODE_INPUT istrue, CharStream is replaced by UCode_CharStream. 

● When JAVA_UNICODE_ESCAPE is true and UNICODE_INPUT isfalse, CharStream is replaced by ASCII_UCodeESC_CharStream. 

● When JAVA_UNICODE_ESCAPE is true and UNICODE_INPUT is true,CharStream is replaced by UCode_UCodeESC_CharStream. 

● void TheParserTokenManager.ReInit(CharStream stream) Reinitializes the current token manager object to read input from "stream". Whenthe option STATIC is true, this is the only way to restart a token manageroperation. This is available only when USER_TOKEN_MANAGER is false andUSER_CHAR_STREAM is true. When USER_TOKEN_MANAGER is false andUSER_CHAR_STREAM is false (the default situation), a constructor similar tothe one above is available with the type CharStream replaced as follows: 

● When JAVA_UNICODE_ESCAPE is false and UNICODE_INPUT isfalse, CharStream is replaced by ASCII_CharStream. 

● When JAVA_UNICODE_ESCAPE is false and UNICODE_INPUT istrue, CharStream is replaced by UCode_CharStream. 

● When JAVA_UNICODE_ESCAPE is true and UNICODE_INPUT isfalse, CharStream is replaced by ASCII_UCodeESC_CharStream. 

­ 25 ­

Page 26: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

● When JAVA_UNICODE_ESCAPE is true and UNICODE_INPUT is true,CharStream is replaced by UCode_UCodeESC_CharStream. 

The Token Class

The Token class is the type of token objects that are created by the token manager after asuccessful scanning of the token stream. These token objects are then passed to the parserand are accessible to the actions in a JavaCC grammar usually by grabbing the returnvalue of a token. The methods getToken and getNextToken described below also giveaccess to objects of this type. 

Each Token object has the following fields: 

● int kind; This is the index for this kind of token in the internal representation scheme ofJavaCC. When tokens in the JavaCC input file are given labels, these labels areused to generate "int" constants that can be used in actions. The value 0 is alwaysused to represent the predefined token <EOF>. A constant "EOF" is generated forconvenience in the ...Constants file. 

● int beginLine, beginColumn, endLine, endColumn; These indicate the beginning and ending positions of the token as it appeared inthe input stream. 

● String image; This represents the image of the token as it appeared in the input stream. 

● Token next; A reference to the next regular (non­special) token from the input stream. If this isthe last token from the input stream, or if the token manager has not read tokensbeyond this one, this field is set to null. 

The description in the above paragraph holds only if this token is also a regulartoken. Otherwise, see below for a description of the contents of this field. 

Note: There are two kinds of tokens ­ regular and special. Regular tokens are thenormal tokens that are fed to the parser. Special tokens are other useful tokens(like comments) that are not discarded (like white space). For more informationon the different kinds of tokens please see the minitutorial on the token manager. 

● Token specialToken; This field is used to access special tokens that occur prior to this token, but afterthe immediately preceding regular (non­special) token. If there are no such specialtokens, this field is set to null. When there are more than one such special token,

­ 26 ­

Page 27: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

this field refers to the last of these special tokens, which in turn refers to the nextprevious special token through its specialToken field, and so on until the firstspecial token (whose specialToken field is null). The next fields of special tokensrefer to other special tokens that immediately follow it (without an interveningregular token). If there is no such token, this field is null. 

● static final Token newToken(int ofKind); Returns a new token object as its default behavior. If you wish to perform specialactions when a token is constructed or create subclasses of class Token andinstantiate them instead, you can redefine this method appropriately. The onlyconstraint is that this method returns a new object of type Token (or a subclass ofToken). 

Reading Tokens from the Input Stream

There are two methods available for this purpose: 

● Token TheParser.getNextToken() throws ParseError This method returns the next available token in the input stream and moves thetoken pointer one step in the input stream (i.e., this changes the state of the inputstream). If there are no more tokens available in the input stream, the exceptionParseError is thrown. Care must be taken when calling this method since it caninterfere with the parser's knowledge of the state of the input stream, currenttoken, etc. 

● Token TheParser.getToken(int index) throws ParseError This method returns the index­th token from the current token ahead in the tokenstream. If index is 0, it returns the current token (the last token returned bygetNextToken or consumed by the parser); if index is 1, it returns the next token(the next token that will be returned by getNextToken of consumed by the parser)and so on. The index parameter cannot be negative. This method does not changethe input stream pointer (i.e., it does not change the state of the input stream). Ifan attempt is made to access a token beyond the last available token, the exceptionParseError is thrown. If this method is called from a semantic lookaheadspecification, which in turn is called during a lookahead determination process,the current token is temporarily adjusted to be the token currently being inspectedby the lookahead process. For more details, please see the minitutorial on usinglookahead. 

­ 27 ­

Page 28: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

Working with Debugger Tracing

When you generate parsers with the options DEBUG_PARSER orDEBUG_LOOKAHEAD, these parsers produce a trace of their activity which is printedto the user console. You can insert calls to the following methods to control this tracingactivity: 

● void TheParser.enable_tracing() 

● void TheParser.disable_tracing() 

For convenience, these methods are available even when you build parsers without thedebug options. In this case, these methods are no­ops. Hence you can permanently leavethese methods in your code and they automatically kick in when you use the debugoptions. 

Customizing Error Messages

To help the user in customizing error messages generated by the parser and lexer, the useris offered the facilities described in this section. In the case of the parser, these facilitiesare only available if the option ERROR_REPORTING is true, while in the case of thelexer, these facilities are always available. 

The parser contains the following method definition: 

● protected void token_error() { ... } 

To customize error reporting by the parser, the parser class must be subclassed and thismethod redefined in the subclass. To help with creating your error reporting scheme, thefollowing variables are available: 

● protected int error_line, error_column; The line and column where the error was detected. 

● protected String error_string; The image of the offending token or set of tokens. When a lookahead of more than1 is used, more than one token may be present here. 

● protected String[] expected_tokens; An array of images of legitimate token sequences. Here again, each legitimatetoken sequence may be more than just one token when a lookahead of more than 1is used. 

­ 28 ­

Page 29: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

The lexer contains the following method definition: 

● protected void LexicalError() { ... } 

To customize error reporting by the lexer, the lexer class must be subclassed and thismethod redefined in the subclass. To help with creating your error reporting scheme, thefollowing variables are available: 

● protected int error_line, error_column; The line and column where the error was detected. 

● protected String error_after; The partial string that has been read since the last successful token match wasperformed. 

● protected char curChar; The offending character. 

JavaCC [tm]: JJTree

JJTree has two APIs: it adds some parser methods; and it requires all node objects toimplement the Node interface.

JJTree parser methods

JJTree maintains some state in the parser object itself. It encapsulates all this state with anobject that can be referred to via the jjtree field.

The parser state implements an open stack where nodes are held until they can be addedto their parent node. The jjtree state object provides methods for you to manipulatethe contents of the stack in your actions if the basic JJTree mechanisms are not sufficient.

void reset() Call this to reinitialize the node stack. All nodes currently on the stack are thrownaway. Don't call this from within a node scope, or terrible things will surely happen.

Node rootNode(); Returns the root node of the AST. Since JJTree operates bottom­up, the root node isonly defined after the parse has finished. 

boolean nodeCreated(); Determines whether the current node was actually closed and pushed. Call this inthe final action within a conditional node scope. 

int arity(); 

­ 29 ­

Page 30: JAVACC – Tutorial · 2014-11-18 · JavaCC [tm]: LOOKAHEAD MiniTutorial This minitutorial is under preparation. This tutorial refers to examples that are available in the Lookahead

Returns the number of nodes currently pushed on the node stack in the current nodescope. 

void pushNode(Node n); Pushes a node on to the stack. 

Node popNode(); Returns the node on the top of the stack, and removes it from the stack. 

Node peekNode(); Returns the node currently on the top of the stack. 

The Node interface

All AST nodes must implement this interface. It provides basic machinery forconstructing the parent and child relationships between nodes.

public void jjtOpen(); This method is called after the node has been made the current node. It indicatesthat child nodes can now be added to it. 

public void jjtClose(); This method is called after all the child nodes have been added. 

public void jjtSetParent(Node n);public Node jjtGetParent(); 

This pair of methods is used to inform the node of its parent. public void jjtAddChild(Node n, int i); 

This method tells the node to add its argument to the node's list of children. public Node jjtGetChild(int i); 

This method returns a child node. The children are numbered from zero, left toright. 

int jjtGetNumChildren(); Return the number of children the node has. 

­ 30 ­