Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
D.C FILE Copy Copy of 14 copies
iLn
I IDA PAPER P-2108
NN
Ada LEXICAL ANALYZER GENERATORI
Reginald N. Meeson
January 1989 O TICEECTES,~
Prepared forSTARS Joint Program Office
DIM1111ION STMMMT A
hipproved for pubik mhouq
90 0 IDA Log No. HQ 8B-033367
DEFINITIONSIDA publishes the ftolowing documents to report the results of its work.
ReportsReports are the most authoritative and most carefully considered products IDA publishes.They normally embody results of major projects which (a) have a direct bearing ondecisions affecting major programs, (b) address issues of significant concern to theExecutive Branch, the Congress and/or the public, or (c) address issues that havesignificant economic implications. IDA Reports are reviewed by outside panels of expertsto ensure their high quality and relevance to the problems studied, and they are releasedby the President of IDA.
Group ReportsGroup Reports record the findings and results of IDA established working groups andpanels composed of senior individuals addressing major issues which otherwise would bethe subject of an IDA Report. IDA Group Reports are reviewed by the senior individualsresponsible for the project and others as selected by IDA to ensure their high quality andrelevance to the problems studied, and are released by the President of IDA.
PapersPapers, also authoritative and carefully considered products of IDA, address studies thatare narrower in scope than those covered in Reports. IDA Papers are reviewed to ensurethat they meet the high standards expected of refereed papers in professional journals orformal Agency reports.
DocumentsIDA Documents are used for the convenience of the sponsors or the analysts (a) to recordsubstantive work done in quick reaction studies, (b) to record the proceedings ofconferences and meetings, (c) to make available preliminary and tentative results ofanalyses, (d) to record data developed in the course of an investigation, or (e) to forwardinformation that is essentially unanalyzed and unevaluated. The review of IDA Documentsis suited to their content and intended use.
The work reported in this document was conducted under contract MDA 903 84 C 0031 forthe Department of Defense. The publication of this IDA document does not indicateendorsement by the Department of Defense, nor should the contents be construed asreflecting the official position of that Agency.
This Paper has been reviewed by IDA to assure that It meets high standards ofthoroughness, objectivity, and appropriate analytical methodology and that the results,conclusions and recommendations are properly supported by the material presented.
© 190 Institute for Defense Analyses
The Government of the United States is granted an unlimited license to reproduce this
document.
Approved for public release, unlimited distribution; 30 August 1990. Unclassified.
S
Form ApprovedREPORT DOCUMENTATION PAGE OMB No. 0704-0188
Public reporting burden for this collectlon of IntoTaion Is ealstimated to average 1 hour per response, Including the time for revieing instructions. searching existing data sources. gathering andmaintaining the data needed, and completing and reviewing the collection of Information. Send cormmnts regarding this burden estinate or any other aspect o4 this collction of Informalion,Including suggestions for reducing this burden, to Washington Headquarters Services. Directorate for Irormation Operations and Reports, 1215 Jefferson Davis Highway, Suie 1204. Arlington.VA 222024302. and to the Office of Management and Budget. Paperwork Reduction Projec (0704.0188), Washington. DC 20500.
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED
January 1989 Final
4. TITLE AND SUBTITLE 5. FUNDING NUMBERS
Ada Lexical Analyzer GeneratorMDA 903 84 C 0031
6. AUTHOR(S) A- 134Reginald N. Meeson
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
Institute for Defense Analyses REPORT NUMBER
1801 N. Beauregard St. IDA Paper P-2108Alexandria, VA 22311-1772
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORINGAGENCY REPORT NUMBER
STARS Joint Program Office1400 Wilson Blvd.Arlington, VA 22209-2308
11. SUPPLEMENTARY NOTES
12a. DISTRIBUTION/AVAILABIUTY STATEMENT 12b. DISTRIBUTION CODE
Approved for public release, unlimited distribution; 30 August 2A1990.
13. ABSTRACT (Maximum 200 words)
IDA Paper P-2108, Ada Lexical Analyzer Generator, documents the Ada Lexical AnalyzerGenerator program which will create a lexical analyzer or "next-token" procedure for use in acompiler, pretty printer, or other language processing programs. Lexical analyzers are producedfrom specifications of the patterns they must recognize. The notation for specifying patterns isessentially the same as that used in the Ada Language Reference Manual. The generator producesan Ada package that includes code to match the specified lexical patterns and returns the symbols itrecognizes. Familiarity with compiler terminology and techniques is assumed in the technicalsections of this document.
14. SUBJECT TERMS 15. NUMBER OF PAGES
Ada Programming Language; Software Engineering; Lexical Analysis; Lexical 174Patterns. 16. PRICE CODE
17. SECURITY CLASSIFICATION 18. SECURITY CLASSIFICATIe"' 19. SECURITY CLASSIFICATION 20. UMITATION OF
OF REPORT OF THIS PAGE OF ABSTRACT ABSTRACTUnclassifUnclanclassified Unclassified UL
NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)Prescribed by ANSI Std. Z39-18298-102
IDA PAPER P-2 108
Ada LEXICAL ANALYZER GENERATOR
Reginald N. Meeson
January 1989
I DAIN~ [ITUTE FOR IDEFEN3E, A'%NALYSES
Contract MDA 903 84 C 0031DARPA Assignment A- 134
PREFACE
The purpose of IDA Paper P-2108, Ada Lexical Analyzer Generator, is to for-ward software and supporting documentation developed as part of IDA's prototypesoftware development work for the Software Technology for Adaptable and ReliableSoftware (STARS) program under Task Order T-D5-429. This paper documents the AdaLexical Analyzer Generator's requirements, design, and implementation, and is directedtoward potential users who may wish to modify or extend the generator's capabilities.
An earlier draft of this document was reviewed within the Computer andSoftware Engineering Division (CSED) by B. Brykczynski, W. Easton, R. Knapper, J.Sensiba, L. Veren, R. Waychoff, and R. Winner (April 1988).
Accession For
NTIs CF &IDTIC TAB 5Unannounced
Just Ifcatio
ByDistribution/
Availabi1ity CodesiAvail and/or
Dist I Speoial
V
TABLE OF CONTENTS
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . 11.1 Scope . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background .......... ..................... 1
2. REQUIREMENTS STATEMENT ........ ............. 3
3. DEVELOPMENT APPROACH ....... ............... 5
4. DESIGN SPECIFICATION . ........ ............... 7
4.1 Lexical Pattern Notation ........ ................. 74.2 Declarations and Actions ........ ................ 94.3 Input and Output Streams ...... ............... .. 11
4.4 Pattern Representation ...... ................. .. 12
4.5 Code Generation ....... ................... .. 14
5. TEST PLAN ........ ...................... 19
6.REFERENCES ........ ..................... .. 21
APPENDIX A - Example Lexical Analyzer Specification and Generated
Code .......... ......................... 23
APPENDIX B - Lexical Analyzer Generator Source Listings ........ . 35
APPENDIX C - Lexical Analyzer Test Data .... ............ 127
vii
LIST OF FIGURES
Figure 1. Lexical Analyzer Generator Development Approach 6.......6
ix
1. INTRODUCTION
This document describes the Ada Lexical Analyzer Generator developed as partof IDA's prototype software development work for STARS (Software Technology forAdaptable and Reliable Systems). This report and the accompanying software were writ-ten in partial fulfillment of Section 4(d) of Task Order T-D5-429.
The Ada Lexical Analyzer Generator is a program that will create a lexicalanalyzer or "next-token" procedure for use in a compiler, pretty printer, or otherlanguage processing program. Lexical analyzers are produced from specifications of thepatterns they must recognize. The notation for specifying patterns is essentially the sameas that used in the Ada language reference manual [i]. The generator produces an Adapackage that includes code to match the specified lexical patterns and return the symbols
it recognizes. Familiarity with compiler terminology and techniques is assumed in thetechnical sections of this report.
1.1 Scope
This report describes the requirements for the lexical analyzer generator and theapproach taken in the prototype design. The report includes descriptions of the notationused for specifying lexical patterns and the internal data structures and processing per-formed to transform these specifications into Ada pattern recognition code.
1.2 Background
Lexical analysis is the first stage of processing in a compiler or other languageprocessing program, and is where basic language elements such as identifiers, numbers,and special symbols are separated from the sequence of characters submitted as input.Lexical analysis does not include recognizing higher levels of source language structuresuch as expressions or statements. This processing is performed in the next compilerstage, the parser. Separating the lexical analysis stage from the parsing stage greatly sim-plifies the parser's task. Lexical analyzers also simplify language processing tools that donot need full-scale parsers to perform their functions; for example, pretty printers. Infact, lexical analysis techniques can simplify many other applications that process com-plex input data.
1 u nnm n ~ umu iellu nlll nllnl~ llln l illm i
For more information on compiler organization and implementation techniques,
readers may wish to consult a standard text on compiler development. (See, for exam-
ple, the "dragon" book [2].)
A lexical analyzer generator produces lexical analyzers automatically from
specifications of the input language's lexical components. This is easier and more reli-able than coding lexical analyzers manually. One commercial lexical analyzer generator
now available is the UNIX®-based program "lex" [3]. The lexical analyzer generator wedeveloped differs from lex in at least three significant ways:
" The notation for describing lexical patterns is much easier to read and under-
stand* The generator produces directly executable code (lex-generated analyzers are
table driven)
" The generator produces Ada code
® UNIX is a registered trademark of AT&T Bell Laboratories.
2
2. REQUIREMENTS STATEMENT
The Ada Lexical Analyzer Generator is to be a reusable tool for creating lexical
analyzers. The generator will translate specifications of lexical patterns into an Ada
package or procedure that produces a stream of lexical data values from an input charac-
ter stream. The notation for specifying lexical patterns should be easy to read and under-
stand. The generator and generated code should be portable. Generated code should be
efficient enough for use in practical applications. Documentation is to include descrip-tions of the development approach, the lexical pattern notation, and the translation tech-
niques employed. A user's guide is also required.
3
0
S
4
S
3. DEVELOPMENT APPROACH
The Ada Lexical Analyzer Gcnerator was developed in two stages. The firststage was to definc the notation for speciiyring lexical patterns and actions to be taken
when patterns are recognized. The notation we adopted is very similar to that used in the
Ada language reference manual to define Ada's syntax. (See [11, Sec. 1.5.)
The second stage of the project was to build the generator, which translates pat-
tern definitions into Ada pattern-matching code. An existing compiler-writing system,
Zuse [41, was used to facilitate this work. Using Zuse simplified the parsing of lexical
sp( -iAcations and provided a framework for transforming patterns and generating code.
Figure 1 shows the major components used to build the generator. Starting at the
upper left, a translation grammar for the lexical pattern notation was created and pro-cess,.; by the compiler writing system to extract translation action code and a set of run-
time translation tables. The translation action code was then compiled together with askeleton main procedure, support routines, and a bootstrap lexical analyzer to produce
the generator.
Figure 1 also shows the second step, which was to generate a new lexical analyzer
to replace the bootstrap analyzer. This required creating a set of lexical pattern defini-
tions and running the generator with its run-time translation tables. This use of the gen-
erator to produce its own lexical analyzer served as a test of the system. The specifica-
tion for the replacement lexical analyzer is inc6,ded in Appendix A.
Compiler TranslationTranslation Writing TablesGrammarSystem
Action IDCode
Support Lexical Lexical New 0Routines Ada Analyzer Analyzer Lexical
Compiler Generator Generator Analyzer(executable)
LexicalAnalyzer Skeleton Lexical
(bootstrap) Compiler PatternDefinitions
Figure 1. Lexical Analyzer Generator Development Approach
66
4. DESIGN SPECIFICATIONDesigns for translators typically involve two separate designs: one for code that
will be generated and one for the translator itself. For this project we also designed the
input notation. To explain how these pieces fit together, the design specification is
divided into the following five parts:
" The notation used to specify lexical patterns
" Specification of actions to be taken when patterns are recognized
* The input and output interfaces for generated lexical analyzers
* Data structures used to represent patterns within the translator
* Templates used to generate pattern matching code
4.1 Lexical Pattern Notation
Lexical patterns are specified using a simple variant of Backus-Naur Form
(BNF). Definitions in this language have the form
non-terminal ::= regular-expression;
Non-terminal symbols are represented by Ada identifiers. Regular expressions are made
up of terminal and non-terminal symbols using the combining forms described below. Ter-
minal symbols are represented by Ada character and string literals and by reserved iden-
tifiers. For example,
semicolon ::=
apostrophe ::=
assignmentsymbol ::=
Concatenated terms are written consecutively, without an operator symbol, as in
characterliteral ::= apostrophe graphic-character apostrophe ;
Literal string values are equivalent to the concatenation of the corresponding literal char-
acters. For example, the string ":=" is the same as the concatenation of the two charac-
ters ':' and '='.
7
Character ranges can be specified using Ada's double-dot notation. For example,
upper-case-Jetter ::- 'A' .. 'Z' ;
A vertical bar is used to separate alternative terms, as in
letter.or-digit ::= letter I digit
Square brackets enclose optional terms. For example, numbers with optional fractionand exponent parts can be specified by
numeric-literal ::= integer ['.' integer] [exponent];
Braces enclose a repeated pattern, as in the following expression for identifiers. The
enclosed pattern may be matched zero or more times.
identifier ::= letter { ['-'] letteror-digit } ; W
Options and repetitions are exercised whenever possible so that the longest possible pat-tern is matched.
Precedence of operations. Range construction is the highest precedence operation, con-
catenation is next, and alternation is the lowest.
Look-ahead and ambiguity. Two different patterns may start with the same character orsequence of characters. This situation requires lexical analyzers to look ahead into the Sinput to determine which pattern to match. Look-ahead processing can usually be han-
dled completely automatically.
Patterns may also be ambiguous. That is, a given sequence of characters may match twodifferent patterns at the same time. Normal processing attempts to match the longer pat-
tern first and accept it if it matches. If the longer pattern fails to match, the analyzer willfall back and match the shorter pattern.
To match the shorter of two ambiguous patterns, a special look-ahead operator is pro- •
vided. The classic situation where this occurs is the Fortran "DO" statement. The fol-
lowing Fortran statements illustrate the problem:
8
DO 10 I=1,10
and
DO 10 I=1.110
The first is the start of a loop structure, for which the keyword "DO" must be matched.The second is an assignment statement, for which the identifier "DOlOI" must bematched. Without special attention, the analyzer would match identifier "DOIOI" in bothcases. The pattern required to recognize the keyword "DO" is
keyword..DO ::= "DO" # label identifier '=' indexexpr',';
The sharp symbol (#, not in quotes) separates this pattern into two parts. If the entirepattern is matched, the analyzer falls back to the # and returns the first part of the pat-tern as the result. The string to the right is preserved as input to be scanned for the next
symbol, which in this example is the loop label. If the pattern fails to match, the lexical
analyzer falls back to the # and attempts to match the alternative pattern, which in this
example is an identifier.
Regular form. To allow simple, efficient code to be generated for lexical analyzers, theinput pattern definitions must have a simple structure. Specifically, they must form a reg-
ular grammar so that code for an equivalent finite-state machine can be generated. Thepattern construction operations described above allow the definition of arbitrary regular
patterns. The lexical analyzer generator does not support recursive pattern definitions.
Predefined patterns. The patterns ENDOFJNPUT, ENDOFLINE, and
UNRECOGNIZED are automatically defined and handled by the generated code.
Additional examples of pattern definitions can be found in Appendix A.
4.2 Declarations and Actions
In addition to the specification of lexical patterns, the lexical analyzer generator
requires definitions of the actions to be taken when a pattern is recognized. These actionsmay require type, variable, and procedure definitions to be included in the generated
code. A lexical analyzer specification, therefore, has the form:
lexicon token-streamname is
9
[declarative-part
patterns{ pattern-definition }
actions{ action-alternative }
end [token-stream-name] ;
"Lexicon" is a reserved word. The token.stream-name is the name of the token
stream package generated by the lexical analyzer. The declarative part allows thedeclaration of any supporting constants, types, variables, functions, or procedures.
These declarations are copied into the generated package body.
"Patterns" is a reserved word. Pattern definitions have the form described in
Section 4.1.
"Actions" is a reserved word. Action alternatives have the same form as Ada
case statement alternatives, i.e.,
Saction-alternative
when choice {'1' choice} => sequence.oLstatements
Action choices can be any non-terminal symbol defined in a production or "oth-
ers" for the last action alternative. The generator turns the action alternatives into a casestatement with the name of the recognized pattern as the selector.
There are two principle types of action performed by a lexical analyzer: return-ing a token value and skipping over uninteresting input. To return a token to the calling
program, the action statements must assign a value to the output parameter NEXT (see
Section 4.3) and end with a "return" statement. For example,
when Identifier ->
NEXT := MAKELTOKEN( IDENT, CURRENT-SYMBOL, CURLINENUM);
return; ID
To skip over a recognized pattern (for example, white space or comments),
specify "null" as the action, with no return. The parameterless function
CURRENT-SYMBOL returns the recognized string. CUR.LINENUM is an integervariable that holds the current line number. 0
Examples of action statements are given in Appendix A.
10
4.3 Input and Output Streams
The input character stream for the lexical analyzer is represented by a procedurethat produces consecutive characters each time it is called. The specification for this pro-cedure is:
procedure GETCHARACTER( EOS: out BOOLEAN;NEXT: out CHARACTER;MORE: in BOOLEAN := TRUE);
This mechanism allows input text to be produced from a file or from other sourceswithin a program.
The output stream produced by the lexical analyzer generator is a sequence oftokens. The specification for the generated token stream package is:
package TOKENSTREAMNAME is
procedure ADVANCE( EOS: out BOOLEAN;NEXT: out TOKEN;MORE: in BOOLEAN := TRUE);
end TOKENSTREAMNAME;
The package name is taken from the lexicon specification. The procedureADVANCE reads input by invoking the GET-CHARACTER procedure and returns anend-of-stream flag, EOS, which is TRUE when the end of the input is reached. WhenEOS is FALSE, NEXT contains the next token value. TOKEN is a user-defined type.The optional parameter MORE may be set to FALSE to indicate that no more tokens willbe drawn from the stream.
There are three methods for combining generated stream packages with theremainder of an application program:
* Copying the generated text into the program source file* Making the generated package body a separate compilation unit* Creating a generic package
Copying generated text is the least flexible method. If you change any of the lexi-cal patterns, you have to delete the old text and add the new using a text editor. Creatinga generic package requires passing the GET-CHARACTER procedure and TOKENtype, and possibly other information, as instantiation parameters. Making the package
11
body a separate compilation unit is the simplest method. Generics and separate compila-tion are supported by the generator by allowing either a generic formal part or a"separate" declaration to precede a lexical analyzer specification. A complete descrip- 0tion of the form of a specification is:
[context-clause I[generic-formaLpart I separate (parent-name)]lexicon token.streamname is 1
[ declarative-part]patterns
{ production }act ons
{ action.alternative }end [token.stream.name] ;
For generic lexical analyzers, a complete package definition (specification andbody) with the specified generic parameters is generated. The GET-CHARACTER pro-cedure and TOKEN type must be included in the list of generic parameters. For non-gen-eric analyzers, only the package body is generated. If a "separate" clause is supplied inthe lexicon specification, it is reproduced in the generated code. The parent unit mustinclude the package specification and an "is separate" declaration for the package body.
S4.4 Pattern Representation
Pattern definitions are stored as tree structures within the lexical analyzer genera-tor. These tree structures are derived directly from the input specification's parse tree.Pattern trees have nodes corresponding to the following types of patterns: 9
* Alternation - letter I digit
" Concatenation- letter digit" Identifier - identifier" Look-ahead - "DO" # loopparameters 1
* Option - [exponent]" Range - 'A'..71" Repetition - {digit}
Character literals are represented by range nodes, where the range contains asingle character. Empty patterns are represented as empty ranges. Strings arerepresented as concatenations of individual characters (ranges).
Selection sets. Each pattern has a selection set, which is the set of possible initial
12
S
characters. For example, if an identifier starts with an upper- or lower-case letter, then
the upper- and lower-case letters will form the pattern's selection set. This information is
used to control pattern matching decisions in the generated code. The selection set for a
range pattern is the range itself. The selection set for an alternation pattern is the union
of the two alternative selection sets. The selection set for a concatenation pattern is that
of the left-hand pattern, unless this pattern is an option or repetition, in which case the
selection set is the union of the left and right parts.
Ambiguity. Patterns with overlapping selection sets are said to be ambiguous. This isbecause pattern matching code cannot determine which of the possible alternative pat-
terns to pursue. An example of this is:
dots ::= '.' ..";
The selection sets for the two alternatives are the same. The ambiguity is that two dots in
the input stream could be interpreted either as two consecutive single dots or as one dou-
ble-dot symbol.
The following transformation eliminates overlapping selection sets and converts ambigu-
ous patterns into unambiguous form. Assume the original pattern is defined by:
original ::= left Iright ;
left :-:= left-overlap I left-unique;
right ::= right-overlap I right-unique;
where left-overlap and right-overlap have the same selection sets, and the selection sets
for left-unique and right-unique are disjoint. (The unique patterns may be empty.) The
replacement pattern is:
replacement ::= left-unique I new-overlap I right-unique;
where the new overlap pattern is constructed from the overlapping range and the tails of
the left and right overlapping patterns:
new-overlap ::= overlap-range overlap-tails;overlap-tails ::- left-overlap-tail I right-overlap-tail;
The tail of a range is the empty pattern, the tail of an alternation is the alterna-
tion of the tails of the two components, and the tail of a concatenation is the tail of the left
term concatenated with the right.
13
Resulting patterns can be simplified when any of the component patterns are
empty. For example, the result of transforming the dots pattern above is:
new-dots ::- '.' ['.']
Canonical form. Several additional tree transformations are performed to further sim-
plify pattern tree structures. These are designed to simplify and improve the code that is •
generated. When a tree satisfies the following conditions it is said to be in canonical
form:
* All ambiguities are eliminated
* Alternations of ranges are merged into single range nodes S
* Alternations are right associative
* Concatenations are right associative
* Alternations with options or repetitions are treated as option patterns
* Redundant options and repetitions are eliminated
4.5 Code Generation
-Package structure. Generated package bodies contain the following major sec-
tions: a
" Local constant, type, and variable declarations
* Utility function and procedure definitions
" Procedure ADVANCE
* Procedure SCANPATERN S
Declarations that appear in lexicon specifications are copied into the first section
with the lexical analyzer's local declarations. In addition, the enumerated type PAT-
TERNJD is created from the list of pattern names defined in the lexicon.
The utility functions and procedures are independent of the lexicon specification
and are simply written to the output file.
Action statements are copied from the lexicon specification into the following
template for the ADVANCE procedure:
procedure ADVANCE ( EOS: out BOOLEAN;
NEXT: out TOKEN;
MORE: in BOOLEAN :- TRUE) is
14
begin
EOS :=FALSE;
loop
SCANPAT'ERN;case CURPATERN is
when ENDOFINPUT =>EOS := TRUE;return;
when ENDOFLINE =>
null;
<lexicon action statements>
end case;
end loop;end ADVANCE;
The procedure SCAN-PATTERN contains all of the pattern matching code.
The body of this procedure is generated from the canonical-form pattern tree by travers-ing the tree and filling in standard code templates based on information stored in the treenodes. The code templates for each pattern type are described below.
Alternation patterns. The template for alternation patterns is:
case CURRENT-CHAR is
<code for each alternative>when others =>
<code for "others" alternative>
end case;
The template for each alternative is:
when <alternative selection set> =>
<code to match this alternative>
Code to match each alternative is generated by passing the current node's leftsubtree to the general pattern code generator. Successive alternatives are found bytraversing the node's right subtree. If an alternative pattern is part of an option or repeti-tion pattern, the code for the "when others" clause is "null;". Otherwise, if the current
character is not included in any of the selection sets, the pattern fails to match.
15
Concatenation patterns. The template for concatenation patterns when the left
subtree could fail is:
<code to match left subtree>
if <left subtree didn't fail> then<code for right subtree>
end if;
If the left subtree cannot fail, the "if" statement is not necessary and this template
can be simplified to:
<code to match left subtree><code for right subtree> S
Code to match the left subtree is generated by a recursive call to the general pat-
tern code generator. Code for the right subtree is generated the same way if it is an alter-
nation, look-ahead, option, or repetition pattern. When the right subtree is a concatena-
tion or range pattern, the following template is used: •
case CURRENT-CHAR is
when <right subtree selection set> =>
<code to match right subtree>when others =>
<this pattern fails to match>
end case;
Option patterns. The template for matching option patterns is:
case CURRENT-CHAR is
when <option selection set> =>
<code to match the option>
when others => null;
end case;
Code to match the option expression is generated by a recursive call to the gen-
eral pattern code generator.
Repetition patterns. The template for matching repetition patterns is:
16
loopcase CURRENT-CHAR is
<code for the repeated pattern>when others => exit;
end case;<exit when look-ahead failed>
end loop;
If the repeated pattern is an alternation, "when" clauses for each of the alternatives aregenerated. Otherwise, a single "when" clause is emitted and code to match the repeatedexpression is generated by a recursive call to the general pattern code generator. The
template for this is:
when <repetition selection set> =>
<code to match repeated pattern>
Range patterns. The code generated for range patterns is simply:
CHARADVANCE;
17
S
S
S
S
S
0
0
18
S
5. TEST PLAN
The primary test for the lexical analyzer generator was to have it reproduce itsown lexical analyzer. This test exercised most of the translator's capabilities. Additionaltests were created during development to exercise each type of pattern, check internaltree structures for canonical form, and verify generated code. These test input data andthe corresponding results are included in Appendix C.
A test of the generator's portability was conducted by moving the program fromits development environment, a DEC@ VAX® ,ystem with the DEC Ada compiler, to a
Sun Workstation® with the Verdix® Ada compiler. The following discrepancies wereencountered:
1. Package INTEGERTEXTJO, which allows the generator to read and writeinteger values is predefined in DEC's Ada run-time library. This feature is not stan-dard but it is fully documented. The correction required adding the following two-line package definition and linking it with th program.
with TEXTJO;
INTEGERTEXTIO is new INTEGERJO( INTEGER);
2. File STANDARD-ERROR, which the generator uses to report translation errors,is predefined in the Verdix TEXTIO package. This feature is not standard nor is itdescribed in Verdix's documentation. The correction required deleting the filedeclaration from package LL__DECLARATIONS and removing the CREATE andCLOSE operations in the main procedure, LLCOMPILE. The entire programthen had to be recompiled.
The only other differences between the DEC and Sun environments were in theprogram-to-file-system interface. A command file was used on the DEC system to con-nect all input and output files to the program. On the Sun, a link named "TABLE" was
® DEC and VAX are registered trademarks of the Digital Equipment Corporation.® Sun Workstation is a registered trademark of Sun Microsystems, Inc.® Verdix is a registered trademark of Verdix Corporation.
19
created to connect the translation-table file and the UNIX command interpreter was used
to connect the standard input and output streams to the source and object code files.
2
200
6. REFERENCES
[1] Ada Programming Language, ANSI/MIL-STD-1815A, January 1983.
[21 Aho, A., R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools,
Addison-Wesley, 1985.
[3] Lesk, M., Lex-A Lexical Analyzer Generator, Computing Science TechnicalReport 39, AT&T Bell Laboratories, Murray Hill, NJ, 1975.
[4] Pyster, A., Zuse User's Manual, Department of Computer Science TechnicalReport, TRCS81-04, University of California, Santa Barbara, May 1981.
21
S
S
S
S
S
S
S
0
S22
S
APPENDIX A
Example Lexical Analyzer Specification and Generated Code
This appendix contains two listings. The first is the specification for the lexicalanalyzer used in the lexical analyzer generator. The second is a pretty-printed version ofthe code generated by the lexical analyzer generator from the specification. The genera-tor was developed using a hand-written, bootstrap lexical analyzer until it was able to
produce this code automatically from the specification.
Contents:
Example lexical analyzer specification ....................................... 24Code produced by the lexical analyzer generator ............................ 26
23
separate ( LLCOMPILE
lexicon LLTOKENS is
-- This lexicon produces the token stream generator for the
-- lexical analyzer generator, ADALEX.
patterns 9
Graphic Character ' ' .
Letter 'A' .. 'Z' j 'a' . 'z' ;
Digit : '0' .. '9'
Letteror Digit Letter I Digit
Character Literal ''' Graphic-Character '''
Comment "... [ Graphic-Character ;
Delimiter '&' I ''' i '*' I II ',' I '-0 I '/' I " = :I : " I '<' I "< " I "< - I "
I I= , >' I "> " I ">>-
Identifier Letter I [''] Letter or Digit )
Number ::= Digit ['_'] Digit ;
SpecialSymbol ::= '#' I '' I ' I '.' I"
I 'i'= I 'I' I '= " ['' ]
StringLiteral QuotedString [ QuotedString I ; •
QuotedString '"' [ NonQuoteChar I '"'
24
NonQuoteChar = ' ' '' I
White-Space = Separator [ Separator I
Separator ' ' ASCII.HT
actions
when CharacterLiteral =>
NEXT := MAKETOKEN( CHAR, CURRENTSYMBOL, CURLINENUM );
return;
when Comment I White_Space => null;
when Delimiter I Number I SpecialSymbol =>
NEXT := MAKETOKEN( LIT, CURRENTSYMBOL, CURLINENUM );
return;
when Identifier =>
NEXT := MAKETOKEN( IDENT, CURRENT_SYMBOL, CURLINENUM );
return;
when StringLiteral =>
NEXT := MAKETOKEN( STR, CURRENTSYMBOL, CURLINENUM );
return;
when others =>
NEXT := MAKETOKEN( LIT, CURRENTSYMBOL, CURLINENUM );
return;
end LLTOKENS;
25
separate ( LLCOMPILE
package body LLTOKENS is
BUFFERSIZE: constant := 100;
subtype BUFFERINDEX is INTEGER range 1..BUFFERSIZE;
type PATTERNID is (CharacterLiteral,Comment,Delimiter,Digit,
GraphicCharacter,Identifier,Letter,Letter or Digit,
Non_QuoteChar,Number,QuotedString,Separator,
SpecialSymbol,StringLiteral,WhiteSpace,
ENDOFINPUT, ENDOFLINE, UNRECOGNIZED);
CURLINENUM: NATURAL := 0;
CURPATTERN: PATTERNID := END OFLINE;
STARTOFLINE: BOOLEAN;
CHARBUFFER: STRING(BUFFER INDEX);
CURCHARNDX: BUFFERINDEX;
TOPCHARNDX: BUFFERINDEX;
procedure SCANPATTERN; -- forward
function CURRENTSYMBOL return STRING is
begin
return CHARBUFFER(I..(CURCHARNDX-I));
end; S
procedure ADVANCE(EOS: out BOOLEAN;
NEXT: out LLTOKEN;
MORE: in BOOLEAN := TRUE) is
begin
EOS := FALSE;
loop
SCANPATTERN;
case CURPATTERN is •
when ENDOFINPUT =>
EOS := TRUE;
return;
26
when ENDOFLINE => null;
when CharacterLiteral =>
NEXT := MAKETOKEN( CHAR, CURRENTSYMBOL, CURLINENUM);
return;
when Comment I WhiteSpace => null;
when Delimiter I Number I SpecialSymbol =>
NEXT := MAKETOKEN( LIT, CURRENTSYMBOL, CURLINENUM);
return;
when Identifier =>
NEXT := MAKETOKEN( IDENT, CURRENTSYMBOL, CURLINENUM);
return;
when StringLiteral =>
NEXT := MAKETOKEN( STR, CURRENTSYMBOL, CURLINENUM);
return;
when others =>
NEXT := MAKETOKEN( LIT, CURRENTSYMBOL, CURLINENUM);
return;
end case;
end loop;
end ADVANCE;
procedure SCANPATTERN is
CURRENTCHAR: CHARACTER;
ENDOFINPUTSTREAM: BOOLEAN;
LOOKAHEADFAILED: BOOLEAN : FALSE;
FALLBACKNDX: BUFFERINDEX : 1;
LOOKAHEADNDX: BUFFERINDEX;
procedure CHARADVANCE is
begin
CURCHARNDX CURCHARNDX+1;
FALLBACKNDX : CURCHARNDX;
if CURCHARNDX < TOPCHARNDX then
CURRENTCHAR := CHARBUFFER(CURCHARNDX);
else
GETCHARACTER(END OFINPUTSTREAM,CURRENTCHAR);
27
if ENDOFINPUTSTREAM then
CURRENTCHAR := ASCII.etx;
end if;
CHARBUFFER(CURCHARNDX) := CURRENTCHAR;
TOPCHARNDX := CURCHARNDX;
end if;
end;
procedure LOOKAHEAD is
begin
CURCHARNDX := CURCHARNDX+I;
if CURCHARNDX <= TOPCHARNDX then
CURRENTCHAR := CHAR_BUFFER(CURCHARNDX);
else
GETCHARACTER(END OF INPUTSTREAM,CURRENT_CHAR);
if ENDOFINPUTSTREAM then
CURRENTCHAR := ASCII.etx;
end if;
CHARBUFFER(CURCHARNDX) := CURRENT_CHAR;
TOPCHARNDX := CURCHARNDX;
end if;
end;
begin
STARTOFLINE := CUR PATTERN = ENDOFLINE;
if STARTOFLINE then S
CURLINENUM CURLINENUM+1;
TOPCHARNDX 1;
GETCHARACTER(END OFINPUTSTREAM,CHARBUFFER(l));
if ENDOFINPUTSTREAM then
CHARBUFFER(l) :- ASCII.etx;
end if;
else
TOPCHARNDX := TOPCHARNDX-CURCHARNDX+1;
for N in I..TOPCHARNDX loop 0
CHARBUFFER(N) :- CHARBUFFER(N+CURCHARNDX-I);
end loop;
end if;
28
CURCHARNDX 1;
CURRENTCHAR CHARBUFFER(I);
case CURRENTCHAR is
when ASCII.etx =>
CURPATTERN := ENDOFINPUT;
when ASCII.If..ASCII.cr =>
CUR PATTERN := END OFLINE;
when '"=>
CHARADVANCE;
case CURRENTCHAR is
when ' '
loop
case CURRENTCHAR iswhen ' , . , , ,. , ,=>
CHARADVANCE;
when others => exit;
end case;
end loop;
case CURRENTCHAR is
when '"' =>
CHARADVANCE;
CURPATTERN := StringLiteral;
loop
case CURRENTCHAR is
when '"' =>
LOOKAHEAD;
case CURRENTCHAR is
when ''=>
loop
case CURRENTCHAR is
when ' ,..,!, I '#'..'~' =>
LOOKAHEAD;
when others => exit;
end case;
end loop;
case CURRENTCHAR is
when '"' =>
CHARADVANCE;
29
when others =>
CURCHARNDX := FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when others =>
CURCHARNDX := FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when others => exit;
end case;
exit when LOOKAHEADFAILED;
end loop;
when others =>
CURPATTERN := UNRECOGNIZED;
end case;
when others =>
CURPATTERN := UNRECOGNIZED;
end case;
when 'A'..'Z' I 'a'..'z' =>
CHARADVANCE;
CURPATTERN := Identifier;
loop
case CURRENTCHAR is
when ' '=>
LOOKAHEAD;
case CURRENTCHAR is Swhen 'A'..'Z' I 'a'..'z' =>
CHARADVANCE;
when '0'..'9' =>
CHARADVANCE;
when others =>
CURCHARNDX := FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when 'A'..'Z' I 'a'..'z' =>
CHARADVANCE;
when '0'..'9' =>
CHARADVANCE;
3
when others => exit;
end case;
exit when LOOKAHEADFAILED;
end loop;
when '0'..'9' =>
CHARADVANCE;
CURPATTERN := Number;
loop
case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when ' '=>
LOOKAHEAD;
case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when others =>
CURCHARNDX := FALLBACKNDX;
LOOKAHEADFAILED TRUE;
end case;
when others => exit;
end case;
exit when LOOKAHEADFAILED;
end loop;
when ASCII.HT ' '
CHARADVANCE;
CURPATTERN := WhiteSpace;
loop
case CURRENTCHAR is
when ASCII.HT I ' ' =>
CHARADVANCE;
when others => exit;
end case;
end loop;
when '-'>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENT CHAR is
31
when '-' =>
CHAR_AVANCE;
CURPATTERN := Comment;
loop
case CURRENTCHAR is
when =>
CHARADVANCE;
when others => exit;
end case;
end loop;
when others => null;
end case;
when ''' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is
when I =>
LOOKAHEAD;
case CURRENTCHAR is
when ''' =>
CHARADVANCE;
CURPATTERN CharacterLiteral;
when others =>
CURCHARNDX FALLBACKNDX;
LOOKAHEADFAILED TRUE;
end case; Swhen others => null;
end case;
when '&' =>
CHARADVANCE;
CURPATTERN := Delimiter; 0when '*' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is 5when '*' =>
CHARADVANCE;
when others => null;
32
end case;
when '+' =>
CHARADVANCE;
CURPATTERN := Delimiter;
when '/' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is
when '=' =>
CHPqADVANCE;
when others => null;
end case;
when ':' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is
when '=' =>
CHARADVANCE;
when ':' =>
LOOKAHEAD;
case CURRENTCHAR is
when '=' =>
CHARADVANCE;
CURPATTERN SpecialSymbol;
when others =>
CURCHARNDX FALLBACKNDX;
LOOKAHEADFAILED TRUE;
end case;
when others => null;
end case;
when '<' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is
when '<'..'>' =>
CHARADVANCE;
when others =) null;
end case;
33
when '=' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is
when '>' =>
CHARADVANCE;
CURPATTERN := SpecialSymbol;
when others => null;
end case;
when '>' =>
CHARADVANCE;
CURPATTERN := Delimiter;
case CURRENTCHAR is
when '='..'>' =>
CHARADVANCE;
when others => null;
end case; Swhen '#' 1I ';' I '['I]' I = >
CHARADVANCE;
CURPATTERN := Special_Symbol;
when '.' =>
CHARADVANCE;
CURPATTERN := Special_Symbol;
case CURRENTCHAR is
when '.' =>
CHARADVANCE; Swhen others => null;
end case;
when others =>
CHARADVANCE;
CURPATTERN := UNRECOGNIZED;
end case;
end;
end LLTOKENS;
34
APPENDIX B
Lexical Analyzer Generator Source Listings
This appendix contains nine listings that make up the lexical analyzer generator.
The first is a command file for creating the generator using the Digital Equipment Cor-
poration VAX Ada environment. The second is the translation grammar for the notation
used to specify lexical analyzers. The next three are the declarations, action code, and
run-time translation tables produced by the translator writing system. The sixth is the
skeleton main procedure for the generator. The seventh is the lexical analyzer used to
bootstrap the generator. And the last two are the specification and body of the transla-
tion support package.
Contents:
Compilation command file ..................................................... 36Translation grammar for lexical pattern specifications ..................... 37
Declarations package generated from the translation grammar .......... 45
Translation actions extracted from the translation grammar .............. 47
Run-time translation tables generated from the translation grammar .... 53
Generator skeleton main procedure .......................................... 63Bootstrap lexical analyzer ..................................................... 82
Translation support package specification ..................................... 88
Translation support package body ............................................ 91
35
$ make adalex command file
$ usage: @makeadalex
$ ! first process the translation grammar
$ taadaaen adalex
$ ! then compile the Ada modules in this order:
$ ada 11_decls,ll_compile,ll_tokens,llsupspec,ll sup body,llactions$ ! link all the pieces together
$ acs link 11 compile 0$ ! and rename the result to "adalex"
$ ren ll_compile.exe adalex.exe
$ exit
36
(* Ada Lexical Analayzer Generator *)
(* This grammar defines the input notation used to specify the lexical
elements of a language in the Ada Lexical Analyzer Generator [1].
This notation is essentially the same as the BNF notation used in
the Ada language reference manual (ANSI/MIL-STD-1815A). The trans-
lation of a collection of lexical pattern definitions and associated
actions yields an Ada package body that contains a "next-token"
procedure for use in a compiler or other language processing tool. *)
(* This grammar is an LL(1) grammar that is processed by an Ada version
of the Zuse parser generator [2]. Zuse produces a complete table-
driven translator that recognizes the specified productions and
applies the indicated translation actions. *)
(* Author: Reg Meeson, grammar 7/31/87, actions 12/15/87 *)
(* References: *)
(* 1. Meeson, R., The Ada Lexical Analyzer Generator, Institute
for Defense Analyses, Paper P-3008, May 1988. *)
(* 2. Pyster, A., Zuse User's Manual, University of California,
Santa Barbara, Department of Computer Science, Technical
Report TRCS81-04, May 1981. *)
%a (* Axiom: *)
Adalex
%n (* Non-terminal symbols: *)
ActionCode ActionDefs AltChoices Alternation Catenation
Character CharOrRange Declarations DeclStmt DeclToken
37
0
IdentOrOther LookAhead NonEndCode NonEndOrWhen NonEndStmt
NonWhenStmt NonWhenToks OptIdent PatternDefs RangePart
RegularExpr RegularFact RegularTerm SubunitDecl
%g (* Terminal symbol groups: *)
CharLit Identifier Other StringLit
%l (* Literal terminal symbols: *)
#( ) .
=> ASCII [
actions begin case end
if is lexicon loop others
patterns separate when [
I
%t (* Declarations for the generated translator: *)
ANONYMOUS: constant LLSTRINGS := "ANONYMOUS
type NODETYPE is ( ALT, BAD, CAT, CHAR, EMPTY, IDENT,
LIT, LOOK, OPT, REP, RNG, STR );
type SELECTION SET is array ( ASCII.HT . ' ) of BOOLEAN;
type TREENODE ( VARIANT: NODETYPE );
type LLATTRIBUTE is access TREENODE;
type TREENODE ( VARIANT: NODE-TYPE ) is
record
NAME: LLSTRINGS;
SELSET: SELECTIONSET;
NULLABLE: BOOLEAN; 0COULDFAIL: BOOLEAN;
case VARIANT is
when ALT I CAT I LOOK -> LEFT, RIGHT: LLATTRIBUTE;
38
when BAD I EMPTY I RNG => null;
when CHAR => CHARVAL: CHARACTER;
when IDENT I LIT I STR => STRINGVAL: LLSTRINGS;
when OPT I REP => EXPR: LLATTRIBUTE;
end case;
end record;
BADPATTERN: constant LLATTRIBUTE
new TREENODE' (BAD,ANONYMOUS, (others => FALSE) ,FALSE,TRUE);
% (*Productions: *
Adalex [ s lexicon => 2, patterns => 6, actions => 8, =>*
SubunitDeci
lexicon Identifier is
(a PUT("package body"); EMIT TOKEN($3);
PUT_-LINE(" is");
Declarations
patterns
PatternDef s
[a COMPLETE-PATTERNS;
EMITPKG-DECLS;)
actions
(a EMITADVANCEHDR;)
ActionDef s
end Optldent';
(a EMITADVANCETLR;
EMITSCAN PROC;
PUT("end"); EMIT TOKEN($3); PUTLINE(";");
SubunitDecl =separate ( Identifier(a PUT("separate (";EMIT TOKEN($3);
PUT_LINE(" )"); NEWLINE;];
=% any
Declarations=
= (s ';' => 2, patterns => *, actions => @, =>*
DeclStmt';
39
[a EMITTOKEN($2);)
Declarations % any
DeclStmt =DeciToken
[a EMITTOKEN($1);
DeciStmt;
DeciToken = case
[a $0 :=$1;
= end
[a $0 $1; I= when
[a $0 $1; ;
= NonEndOrWhen
[a $0 $1;
NonEndOrWhen = Character
[a $0 $1;
=Identifier
[a $0 :=$1;)
=Other
[a $0 $1;
=StringLit
[a $0 $1;
=is0
[a $0 $1;I
=others
[a $0 $1;
/C0
[a $0 $1;
[a $0 $1;
(a $0 $1; ;0
(a $0 :=$1;
400
[a $0 $1;
(a $0 $1;
Character =CharLit
[a $0 $1;)
=ASCII '.1 Identifier
[a $0 :=CVT-ASCII($3);]
PatternDefs =
- (s Identifier => 1, => 2, actions => *,@ =>
Identifier:=
lp LLSKIPNODE;
RegularExpr LookAhead';
[a STOREPATTERN($1,COIWATENATE($3,$4));
PatternDefs % any
RegularExpr = RegularTerm Alternation
[a $0 :=ALTERNATE($1,$2);
RegularTerm = RegularFact
[s ';' => *, actions => @ => *
$0 :=BADPATTERN;
Catenation
($0 CONCATENATE($l,$2); Iany
RegularFact = CharOrRange
(a $0 $1;
= Identifier
[a $0 $1; ;
= StringLit
(a $0 CVT_STRING($1);];
- VRegularExpr ']'
lp LLSKIPNOlE;
(a $0 :=OPTION($2); )
-'IRegularExpr ''
(p LLSKIPNODE;I
(a $0 :=REPEAT($2); )
41
CharOrRange =Character RangePart
[a $0 :=CHARRANGE($.,$2);
RangePart '.'Character
(a $0 $2;
-[ a$0 :null;)
Catenation =RegularFact Catenation0
(a $0 CONCATENATE($1,$2);1
- (a $0 null;
Alternation = ',RegularExpr[a $0 $2; ;
- [a $0 null;
LookAhead = #' RegularExpr
[a $0 LOOKAHEAD($2);
(a $0 null;
ActionDefs =
[s when => 1, @ => Iwhen
[a EMIT_-TOKEN($l);
IdentOrOther '=>'
(p LLSKIPNODE;
[a EMIT_-TOKEN($3);J
ActionCode ActionDefs %any;
IdentOrOther -Identifier
(a EM4ITTOKEN($l); INCLUDEPATTERN($l);
AltChoices
= others
(a EMITTOKEN($1);
AltChoices = 'I' Identifier0
(a EMITTOKEN($1); EMITTOKEN($2);
INCLUDEPATTERN($2);
AltChoices
42
ActionCode
[s 1s;1=> 2,when => ,@ =>*
NonWhenStmt 1;/
[a EMITTOKEN($2); )
ActionCode % any
NonWhenStmt = begin
(a EMITTOKEN($1); )
NonEndCode end
(a EMITTOKEN($3);
= case
[a EMITTOKEN($1);
NonEndCode end case
(a EMITTOKEN($3); EMITTOKEN($4);
= if
(a EMITTQKEN($l);
NonEndCode end if
(a EMITTOKEN($3); EMIT_TOKEN($4);
= loop
[a EMITTOKEN($l);
NonEndCode end loop
(a EMITTOKEN($3); EMITTOKEN($4);1
= NonWhenToks;
NonEndCode = NonEndStmt';
(a EMITTOKEN($2);
NonEndCode
NonEndStmt = NonWhenStmt;
= when
(a EMITTOKEN($1);
NonWhenToks
NonWhenToks = NonEndOrWhen
(a EMITTOKEN($1);
43
NonWhenToks
Optldent = Identifier
[a $0 $1;
-[a $0 null;
44
with TEXT_IO;
package LLDECLARATIONS is
LLSTRINGLENGTH: constant := 20;
subtype LLSTRINGS is STRING(l..LLSTRINGLENGTH);
ANONYMOUS: constant LLSTRINGS := "ANONYMOUS
type NODETYPE is ( ALT, BAD, CAT, CHAR, EMPTY, IDENT,
LIT, LOOK, OPT, REP, RNG, STR );
type SELECTIONSET is array ( ASCII.HT ' ) of BOOLEAN;
type TREENODE ( VARIANT: NODETYPE );
type LLATTRIBUTE is access TREENODE;
type TREENODE ( VARIANT: NODETYPE ) is
record
NAME: LLSTRINGS;
SELSET: SELECTIONSET;
NULLABLE: BOOLEAN;
COULDFAIL: BOOLEAN;
case VARIANT is
when ALT I CAT I LOOK => LEFT, RIGHT: LLATTRIBUTE;
when BAD I EMPTY I RNG => null;
when CHAR => CHARVAL: CHARACTER;
when IDENT I LIT I STR -> STRING VAL: LLSTRINGS;
when OPT I REP => EXPR: LLATTRIBUTE;
end case;
end record;
BADPATTERN: constant LLATTRIBUTE
new TREENODE'(BAD,ANONYMOUS,(others => FALSE),FALSE,TRUE);
LLTABLESIZE: constant := 32;
45
LLRHSSIZE: constant := 174;
LLPRODSIZE: constant :=64;
LLSYNCHSIZE: constant :26;
STANDARDERROR: TEXTIO.FILETYPE;
end LLDECLARATIONS;
46
with LLSUPPORT;
sepa~cate ( LLCOMPILE
procedure LLTAKEACTION( CASEINDEX: in INTEGER ) is
use LLSUPPORT;
begin
case CAFI-NDEX is
when 0 => null;
when 1 =>
PUT( "package body");
EMITTOKEN(llstack(llsentptr-2) .attribute);
PUT-LINE(" is");
when 2 =>
COMPLETEPATTERNS;
EMITPKGDECLS;
when 3 =>
EMIT ADVANCE HDR;
when 4 =>
EMITADVANCETLR;
EMITSCANPROC;
PUT( "end");
EMITTOKEN(llstack(llsentptr-13) .attribute);
PUT-LINE(";");
when 5 =>
PUT("separate ()
EMITTOKEN(llstack(llsentptr-2) .attribute);
PUT-LINE(" ))
NEWLINE;
when 6 =>
EMITTOKEN(llstack(llsentptr-l) .attribute);
when 7 =>
EMITTOKEN(llstack(llsentptr-l) .attribute);
when 8 =>
llstack(llstack Ilsentptr) .parent) .attribute
llstac~k(llsentptr-l).attribute;
when 9 =>
llstack(llstack(llsentptr) .parent) .attribute
llstack( llsentptr-l).attribute;
when 10 =>
47
llstack(llstack(llsentptr) .parent) .attribute
listack ( lsentptr-1).attribute;
when 11 =>
llstack(llstack(llsentptr) .parent) .attribute
llstack(llsentptr-1).attribute;
when 12 =>
llstack(llstack(llsentptr) .parent) .attribute
llstack(llsentptr-1).attribute; 4
when 13 =>
llstack(llstack(llsentptr) .parent) .attribute
llstack(llsentptr-1).attribute;
when 14 =>
llstack(llstack(llsentptr) .parent) .attribute
listack (llsentptr-1).attribute;
when 15 =>
llstack(llstack(llsentptr) .parent) .attribute
listack (llsentptr-1).attribute;
when 16 =>
llstack(llstack(llsentptr) .parent) .attribute
listack (llsentptzc-1).attribute;
when 17 =>
llstack( llstack( llsentptr).parent).attribute
llstack(llsentptr-1).attribute;
when 18 =>
llstack( llstack( llsentptr).parent).attribute
listack (llsentptr-1).attribute; 4
when 19 =>
llstack(llstack(llsentptr) .parent) .attribute
listack (llsentptr-1).attribute;
when 20 =>
llstack( llstack( llsentptr) parent).attribute
llstack(ilsentptr-1).attribute;
when 21 =>
llstack(llstack(11 sentptr) .parent) .attribute
listack (llsentptr-1).attribute;3
when 22 =>
llstack(llstack(llsentptr) .parent) .attribute
listack (llsentptr-1).attribute;
48
when 23 =>
llstack(llstack(llsentptr) .parent) .attribute
40 ilstack(llsentptr-1).attribute;
when 24 =>
llstack(llstack(llsentptr) .parent) .attribute
llstack(llsentptr-1).attribute;
when 25 =>
0 llstack(llstack(llsentptr) .parent) .attribute
CVTASCII(llstack(llsentptr-1) .attribute);
when 26 =>
LLSKIPNODE;
0 when 27 =>
STOREPATTERN(ilstack(llsentptr-6) .attribute,
CONCATENATE(llstack(llsentptr-3).attribute,
llstack(llsentptr-2) .attribute));
when 28 =>
llstack(llstack(lsentptr) .parent) .attribute
ALTERNATE( llstack( llsentptr-2).attribute,
listack (llsentptr-1).attribute);
when 29 =>
* llstack( llstack(llsentptr).parent).attribute =BADPATTERN;
when 30 =>
llstack( llstack(llsentptr).parent).attribute
CONCATENATE ( lstack (llsentptr-2).attribute,listack(llsentptr-1) .attribute);
* when 31 =>
llstack(llstack(llsentptr) .parent) .attribute
llstack(llsentptr-1).attribute;
when 32 =>
* llstack(llstack(llsentptr) .parent) .attribute
llstack(llsentptr-1).attribute;
when 33 =>
llstack(llstack(llsentptr) .parent).attribute
CVTSTRING(llstack(llsentptr-1) .attribute);
* when 34 =>
LLSKIPNODE;
when 35 =>
llstack(llstack(llsentptr) .parent) .attribute
49
OPTION(llstack(llsentptr-3).attribute);
when 36 =>
LLSKIPNODE;
when 37 =>
llstack(llstack( llsentptr).parent).attribute
REPEAT ( lstack (llsentptr-3).attribute);
when 38 =>
listack(llstack(llsentptr).parent). attribute 4&
CHARRANGE( llstack( llsentptr-2).attribute,
llstack(llsentptr-1).attribute);
when 39 =>
listack (llstack( llsentptr).parent). attribute =
listack (llsentptr-1).attribute;
when 40 =>
llstack(llstack( llsentptr).parent).attribute =null;
when 41 =>
llstack(llstack( lisentptr).parent). attribute 0
CONCATENATE(llstacklllsentptr-2).attribute,
llstack(llsentptr-1) .attribute);
when 42 =>
llstack(llstack( llsentptr) .parent).attribute =null;
when 43 =>
llstack(llstack(llsentptr).parent).attribute=
llstack(llsentptr-l).attribute;
when 44 =>
llstack(llstack(llsentptr) .parent) .attribute null; <
when 45 =>
llstack(llstack( llsentptr).parent).attribute=
LOOKAHEAD (llstack (llsentptr-1).attribute);
when 46 =>
llstack( llstack( llsentptr).parent).attribute =null;
when 47 =>
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 48 =>
LLSKIPNODE;
when 49 =>
EMITTOKEN(llstack(llsentptr-2) .attribute);
when 50 =>
50
EMITTOKEN( llstack(llsentptr-1).attribute);
INCLUDEPATTERN(llstack( llsentptr-1).attribute);
when 51 =>
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 52 =>
EMITTOKEN( llstack( llsentptr-2).attribute);
EMITTOKEN(llstack(llsentptr-1) .attribute);
INCLUDEPATTERN(llstack( llsentptr-1).attribute);
when 53 =>
EMITTOKEN( llstack( lisentptr-1).attribute);
when 54 =>
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 55 =>
EMITTOKEN( llstack( llsentptr-1).attribute);
when 56 =>
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 57 =>
EMITTOKEN(llstack(llsentptr-2) .attribute);
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 58 =>
EMITTOKEN( llstack( llsentptr-1).attribute);
when 59 =>
EMITTOKEN(llstack(llsentptr-2) .attribute);
EMITTOKEN(llstack(lisentptr-1) .attribute);
when 60 =>
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 61 =>
EMITTOKEN( llstack( llsentptr-2).attribute);
EMITTOKEN ( lstack (llsentptr-1).attribute);when 62 =>
EMITTOKEN( llstack( llsentptr-1).attribute);
when 63 =>
EMITTOKEN( llstack( llsentptr-1).attribute);
when 64 =>
EMITTOKEN(llstack(llsentptr-1) .attribute);
when 65 =>
listack(llstack(llsentptr).parent).attribute
listack (llsentptr-1).attribute;
51
when 66 =>
llstack(llstack(llsentptr) .parent) .attribute :=null;
when others =>
PUTLINE( STANDARD_ERROR, "**Zuse -- Error in action code **
end case;
end LLTAKEACTION;
520
# 1
( 1
) 1
::= 1
=> 1
@ gASCII 1
CharLit g
Identifier g
Other g
StringLit g
] 1
actions 1
any g
begin 1
case 1
end 1
if 1
is 1
lexicon 1
loop 1
others 1
patterns 1
separate 1
when 1f 1
] 1
1
1 16 2
n 2 0 1
1 24 0 1
g 12 0 1
1 23 0 1
a 1
n 4 0 1
53
1 27 0 1
n 26 0 1
a 2
1 17 0 1
a 3
n 44 0 1
1 21 0 1
n 63 0 1
1 7 0 1
a 4
24 28
2 5 1
1 28
1 2
g 12
1 3
a 5
28
2 0 2
18 24
4 0 1
27
4 4 18
n 6 0 6
1 7 0 6
a 6 0
n 4 0 6
2 3 4 5 7 8 10 11 12 13 14 18 20 21 23 26 29 31
6 3 16
n 8
a 7
n 6
2 3 4 5 8 10 11 12 13 14 20 21 23 26 29 31
6 0 1
7
8 2 1
1 20
a 8
54
20
8 2 1
S1 21
a 9
21
8 2 1
1 29
a 10
29
8 2 13
n 12
a 11
2 3 4 5 8 10 11 12 13 14 23 26 31
12 2 2
n 24
a 12
10 11
12 2 1
g 12
a 13
12
12 2 1
g 13
a 14
13
12 2 1
g 14
a 15
14
12 2 1
1 23
a 16
23
12 2 1
1 26
a 17
26
12 2 1
55
1 2
a 18
2
12 2 1
1 3
a 19
3
12 2 1
1 4
a 20
4
12 2 1
1 5
a 21
5
12 2 1
1 8
a 22
8
12 2 1
1 31
a 23
31
24 2 1
g 11
a 24 S11
24 4 1
1 10
1 4
g 12
a 25
10
26 0 1
17
26 8 2
g 12 0 11
1 6 0 11
56
p 26
n 28 0 11
n 42 0 11
1 7 0 11
a 27
n 26 0 11
12 18
28 3 6
n 29
n 40
a 28
10 11 12 14 15 30
29 3 7
n 30 29 16
n 38
a 30
10 11 12 14 15 18 30
30 2 2
n 35
a 31
10 11
30 2 1
g 12
a 32
12
30 2 1
g 14
a 33
14
30 5 1
1 15
n 28
1 16
p 34
a 35
15
30 5 1
1 30
57
n 28
1 32
p 36
a 37
30
35 3 2
n 24
n 36
a 38
10 11
36 3 1
1 5
n 24
a 39
5
36 1 11
a 40
1 7 10 11 12 14 15 16 30. 31 32
38 3 6
n 30
n 38
a 41
10 11 12 14 15 30
38 1 5
a 42
1 7 16 31 32 S40 3 1
1 31
n 28
a 43
31
40 1 4
a 44
1 7 16 32
42 3 1
1 1
n 28
a 45
58
1
42 1 1
a 46
7
44 0 1
21
44 8 2
1 29 0 20
a 47
n 46 0 20
1 8 0 20
p 48
a 49
n 50 0 20
n 44 0 20
18 29
46 3 1
g 12
a 50
n 48
12
46 2 1
1 26
a 51
26
48 4 1
1 31
g 12
a 52
n 48
31
48 0 1
8
50 0 2
21 29
50 4 19
n 52 0 23
1 7 0 23
59
a 53
n 50 0 23
2 3 4 5 7 8 10 11 12 13 14 18 19 20 22 23 25 26 31
52 5 1
1 19
a 54
n 57
121
a 55
19
52 6 1
1 20
a 56
n 57
1 21
1 20
a 57
20
52 6 1
1 22
a 58
n 57
1 21
1 22
a 59
22 S52 6 1
1 25
a 60
n 57 51 21
1 25
a 61
25
52 1 14 9n 61
2 3 4 5 7 8 10 11 12 13 14 23 26 31
57 4 19
60
n 59
1 7
a 62
n 57
2 3 4 5 7 8 10 11 12 13 14 19 20 22 23 25 26 29 31
57 0 1
21
59 1 18
* 52
2 3 4 5 7 8 10 11 12 13 14 19 20 22 23 25 26 31
59 3 1
1 29
a 63
n 61
29
61 3 13
n 12
a 64
n 61
2 3 4 5 8 10 11 12 13 14 23 26 31
61 0 1
7
63 2 1
g 12
a 65
12
63 1 1
a 66
7
24 2
27 6
17 8
9 -1
0 0
7 2
27 -1
17 -1
9 -1
61
0 0
12 1
6 2
17 -1
9 -1
0 o7 -1
17 -1
9 -1
o 029 1
9 -1
0 0
7 2
29 -1
9 -1
00 0
62
with LLDECLARATIONS, INTEGERTEXT_IO, TEXT_IO;
procedure LLCOMPILE is
Skeletal compiler to parse a candidate string
-- May 8, 1981
-- Version: 1.0
-- Author: Arthur Pyster
-- The original Pascal version of this program was copyrighted
-- by The Regents of the University of California
-- August 1984
-- Modified for use on an IBM/PC with IBM's Pascal compiler
-- Change Author: Reg Meeson
-- November 1986
-- Ported to AT&T UNIX PC7300
-- Change Author: Reg Meeson
-- August 1987
-- Ported to VAX 8600 and converted from Pascal to Ada
-- Change Author: Reg Meeson
-- The Ada version of this program was produced for the DoD
-- STARS Program
-- Purpose: This program is a skeletal compiler which is fleshed
-- out by the inclusion of two packages supplied by the user:
-- LLSUPPORT -- support routines called directly or indirectly
-- as translation action routines
-- LLTOKENS -- lexical analysis routines that produce a stream
-- of tokens
63
and 2 units produced by GENERATE, the parser generator:
-- LLDECLARATIONS -- constant, type, and variable declarations
-- specified in grammar
-- LLTAKEACTION -- procedure which calls action routines as
-- dictated by grammar rules
-- First, the translation table file produced by GENERATE is read.
-- It contains an encoded form of the literals symbol table, the
-- parsing action tables, and error recovery data. Source code is
-- read from the STANDARDINPUT file and object code is written to
-- the STANDARDOUTPUT file. Error messages are written to a file
-- called STANDARDERROR.
use LLDECLARATIONS, INTEGERTEXTIO, TEXT_IO;
PARSINGERROR: exception; -- for fatal parsing errors
LLMAXSTACK: constant := 500;
-- max number of sentential form elements in parse tree at one time
type LLTOKEN is -- for tokens produced by the lexical analyzer
record
PRINTVALUE: LLSTRINGS; -- the literal token value
TABLEINDEX: INTEGER; -- where token is in symbol table
LINENUMBER: INTEGER; -- where token appeared in source
ATTRIBUTE: LLATTRIBUTE; -- user's token attributes
end record;
type LLSTYLE is (LITERAL, NONTERMINAL, GROUP, ACTION, PATCH);
-- literal: a terminal that stands for itself
-- group: a terminal that is a member of a lexical group
-- action: an action code segment
64
-- patch: action code to patch a syntax error
type LLSYMTABENTRY is -- for symbol table entries
record
KEY: LLSTRINGS; -- literal string or group identifier
KIND: LLSTYLE; -- literal or group
end record;
type LLRIGHT is -- for grammar vocabulary symbols
record
CASEINDEX: INTEGER; -- action code case index
SYNCHINDEX: INTEGER; -- synchronization table index
WHICHCHILD: INTEGER; -- position in production right hand side
KIND: LLSTYLE; -- type of vocabulary symbol
TABLEINDEX: INTEGER; -- symbol table or production start index
end record;
type LLSENTENTIAL is -- for sentential forms
record
LASTCHILD: BOOLEAN; -- is this the rightmost child?
TOP: INTEGER; -- pointer to lastchild
PARENT: INTEGER; -- pointer to parent of this node
ATTRIBUTE: LLATTRIBUTE; -- derived attributes returned
DATA: LLRIGHT; -- vocabulary symbol information
end record;
LLADVANCE: BOOLEAN; -- advance llsentptr to next node?
LLiOTOKS: BOOLEAN; -- end of token stream encountered
LLLOCEOS: INTEGER; -- location of end-of-input in symboltable
LLSENTPTR: INTEGER; -- current sentential form element
LLCURTOK: LLTOKEN; -- the current token
LLSYMBOLTABLE: array ( 1.. LLTABLESIZE ) of LLSYMTABENTRY;
-- the symbol table for literal terms
LLSTACK: array ( 1 .. LLMAXSTACK ) of LLSENTENTIAL;
-- stack which represents the parse tree
65
procedure LLNEXTTOKEN;
-- get the next token from the input stream (defined below)
function LLFIND( ITEM: LLSTRINGS; WHICH: LLSTYLE ) return INTEGER is
-- Find item in symbol table -- return index or 0 if not found.
-- Assumes symbol table is sorted in ascending order.
LOW, MIDPOINT, HIGH: INTEGER;
begin
LOW : 1;
HIGH : LLTABLESIZE + 1;
while LOW /= HIGH loop
MIDPOINT : (HIGH + LOW) / 2;
if ITEM < LLSYMBOLTABLE(MIDPOINT).KEY then
HIGH : MIDPOINT;
elsif ITEM = LLSYMBOLTABLE(MIDPOINT).KEY then
if LLSYMBOLTABLE(MIDPOINT).KIND = WHICH then
return( MIDPOINT );
else
return( 0 );
end if;
else -- ITEM > LLSYMBOLTABLE(MIDPOINT).KEY
LOW : MIDPOINT + 1;
end if;
end loop;
return( 0 ); -- item is not in table Send LLFIND;
procedure LLPRTSTRING( STR: LLSTRINGS ) is
-- print non-blank prefix of str in quotes
begin
PUT( STANDARDERROR, '"'
for I in STR'RANGE loop
exit when STR(I) = ' ';
PUT( STANDARDERROR, STR(I) );
end loop;
PUT( STANDARDERROR, " '
66
end LLPRTSTRING;
procedure LLPRTTOKEN is
-- print the current token
begin
if LLCURTOK.PRINTVALUE(1) in '!'..'' then -- printable ASCII
LLPRTSTRING(LLCURTOK.PRINTVALUE);
else
PUT( STANDARDERROR,
"unprintable token beginning with CHARACTER'POS = "
PUT( STANDARDERROR, CHARACTER'POS(LLCURTOK.PRINTVALUE(1)), 1 )
end if;
end LLPRTTOKEN;
procedure LLSKIPTOKEN is
-- remove current token
begin
LLADVANCE := FALSE;
PUT( STANDARDERROR, "*** Skipping ");
LLPRTTOKEN;
PUT_LINE( STANDARDERROR, " in line ");
PUT( STANDARDERROR, LLCURTOK.LINENUMBER, 1 );
PUTLINE( STANDARDERROR, '." );
L-..EXTTOKEN;
end LLSKIPTOKEN;
procedure LLSKIPNODE is
-- skip over sentential form node leaving current token as is
begin
PUT( STANDARDERROR, "*** Inserting "
LLPRTSTRING( LLSYMBOLTABLE(LLSTACK(LLSENTPTR).DATA.TABLEINDEX).KEY );
PUT( STANDARDERROR, " before "
LLPRTTOKEN;
PUT( STANDARD ERROR, " in line "
PUT( STANDARDERROR, LLCURTOK.LINENUMBER, 1 );
67
PUTLINE( STANDARDERROR, "." );
LLSENTPTR := LLSENTPTR + 1;
end LLSKIPNODE;
procedure LLSKIPBOTH is
-- Skip over both sentential form node and current token. Used
-- when replacement is assumed to be correct, and attribute of
-- replacement does not need to be set; otherwise, use llreplace.
begin
PUT( STANDARDERROR, "*** "
LLPRTTOKEN;
PUT( STANDARDERROR, " replaced by ")
LLPRTSTRING( LLSYMBOLTABLE(LLSTACK(LLSENTPTR).DATA.TABLEINDEX).KEY );
PUT( STANDARDERROR, " in line " );
PUT( STANDARDERROR, LLCURTOK.LINENUMBER, 1 );
PUT_LINE( STANDARDERROR, "." );
LLSENTPTR := LLSENTPTR + 1;
LLNEXTTOKEN;
end LLSKIPBOTH;
procedure LLFATAL is
-- To recover from syntactic error, terminate compilation
begin
PUT( STANDARDERROR, "*** Fatal " ); •
LLPRTTOKEN;
PUT( STANDARDERROR, " found in line ");
PUT( STANDARDERROR, LLCURTOK.LINENUMBER, 1 );
PUT_LINE( STANDARDERROR, " -- terminating translation." );
raise PARSINGERROR;
end LLFATAL;
procedure GETCHARACTER( EOS: out BOOLEAN;
NEXT: out CHARACTER;
MORE: in BOOLEAN := TRUE ) is
-- Produce input characters for the lexical analyzer.
68
begin
if ENDOFFILE(STANDARDINPUT) then
EOS := TRUE;
elsif ENDOFLINE(STANDARDINPUT) then
SKIPLINE(STANDARDINPUT);
EOS FALSE;
NEXT ASCII.CR;
else
EOS : FALSE;
GET(STANDARDINPUT,NEXT);
end if;
end;
function MAKETOKEN( NODE: NODETYPE; SYMBOL: STRING; LINENUMBER: NATURAL
return LLTOKEN is
-- construct a token value from input lexical information
PRINTVALUE: LLSTRINGS;
TABLEINDEX: INTEGER;
ATTRIBUTE: LLATTRIBUTE;
function CVTSTRING( STR: in STRING ) return LLSTRINGS is
-- Convert an arbitrary-length string to a fixed length string.
RESULT: LLSTRINGS;
begin
for I in LLSTRINGS'RANGE loop
if I <= STR'LAST then
RESULT(I) : STR(I);
else
RESULT(I) : '
end if;
end loop;
return RESULT;
end;
begin
PRINTVALUE := CVTSTRING(SYMBOL);
case NODE is
69
when CHAR =>
TABLEINDEX :=LLFIND("CharLit ",GROUP);
when IDENT =>
TABLEINDEX :=LLFIND(PRINTVALUE, LITERAL);
if TABLEINDEX =0 then
TABLEINDEX LLFIND(t tldentifier ",GROUP);
end if;
when LIT =>
TABLEINDEX := LLFIND( PRINTVALUE, LITERAL);
if TABLEINDEX =0 then
TABLEINDEX LLFIND("Other ",GROUP);
end if;
when STR =>
TABLEINDEX :=LLFIND("StringLit ",GROUP);
when others =>
TABLEINDEX :=0;
end case;
case NODE is
when CHAR =>
ATTRIBUTE :=new TREENODE' (CHAR,PRINTVALUE, (others => FALSE),
FALSE,FALSE,PRINTVALUE(2));
when IDENT =>
ATTRIBUTE :=new TREENODE'(IDENT,PRINTVALUE, (others => FALSE),
FALSE, FALSE, PRINTVALUE);
when LIT =>
ATTRIBUTE new TREENODE' (LIT,PRINTVALUE, (others => FALSE),
FALSE, FALSE,PRINTVALUE);
when STR =>
ATTRIBUTE new TREENODE'(STR,PRINTVALUE, (others => FALSE),
FALSE,FALSE,PRINTVALUE);
when others =>
ATTRIBUTE := new TREENODE' (BAD,PRINTVALUE, (others => FALSE),
FALSE,FALSE);
end case;
return LLTOKEN' (PRINTVALUE, TABLEINDEX, LINENUMBER, ATTRIBUTE);
end MAKETOKEN;
70
package LLTOKENS is
-- produces a stream of tokens from the STANDARDINPUT file
procedure ADVANCE( EOS: out BOOLEAN;
NEXT: out LLTOKEN;
MORE: in BOOLEAN := TRUE );
end LLTOKENS;
package body LLTOKENS is separate;
procedure LLNEXTTOKEN is
-- get the next token from the input stream
begin
LLTOKENS.ADVANCE( LLEOTOKS, LLCURTOK );
if LLEOTOKS then
LLCURTOK.PRINTVALUE := (LLSTRINGS'RANGE => '
LLCURTOK.PRINTVALUE(I..5) := "(eofl";
LLCURTOK.TABLEINDEX := LLLOCEOS;
end if;
end LLNEXTTOKEN;
procedure LLTAKEACTION( CASEINDEX: in INTEGER ) is separate;
-- perform the translation action proscribed in the grammar
procedure LLMAIN is
LOCOFNULL: constant := 0; -- location of null string in symbol table
type INTSET is array ( 1 .. LLTABLESIZE ) of BOOLEAN;
type SYNCHTYPE is
record
TOKEN: INTEGER; -- index to table entry for token
SENT: INTEGER; -- How far in llsentential form to goto?
end record;
71
type PROD is
record
LHS: INTEGER; -- tableindex of lhs
RHS: INTEGER; -- index into rhsarray where rhs begins
CARDRHS: INTEGER; -- cardinality of rhs
SELSET: INTSET; -- production selection set
CARDSEL: INTEGER; -- cardinality of selection set
end record;
THISRHS: INTEGER; -- index into rhsarray
RHSARRAY: array ( 1 .. LLRHSSIZE ) of LLRIGHT;
-- rhs elements of productions
SYNCHDATA: array ( 0 .. LLSYNCHSIZE ) of SYNCHTYPE;
AXIOM: INTEGER;
-- pointer to first production whose lhs is the axiom
PRODUCTIONS: array ( 1 LLPRODSIZE ) of PROD;
procedure READGRAM is -- read grammar from disk
CH: CHARACTER;
LLGRAM: FILETYPE; -- where grammar is stored
procedure BUILDRIGHT( WHICHPROD: INTEGER ) is
-- establish contents of rhs
CHILDCOUNT: INTEGER; -- which child in rhs is this?
TABLEINDEX: INTEGER;
begin
PRODUCTIONS(WHICHPROD).RHS := THISRHS + 1;
CHILDCOUNT := 0;
for I in THISRHS+I .. THISRHS+PRODUCTIONS(WHICHPROD).CARDRHS loop
if I <= LLRHSSIZE then
THISRHS := THISRHS+I;
GET( LLGRAM, CH );
case CH is
when 'I' =>
72
CHILDCOUNT :=CHILDCOUNT+l;
RHSARRAY(I).WHICHCHILD :=CHILDCOUNT;
RHSARRAY(I).KIND :=LITERAL;
GET( LLGRAM, RHSARRAY(I).TABLEINDEX )
when 'a' =>
RHSARRAY(I).KIND :=ACTION;
when In' =>
CHILDCOUNT :=CHILDCOUNT+l;
RHSARRAY(I).WHICHCHILD :=CHILDCOUNT;
RHSARRAY(I).KIND :=NONTERMINAL;
GET( LLGRAM, RHSARRAY(I).TABLEINDEX )
when 'g' =>
CHILDCOUNT :=CHILDCOUNT+1;
RHSARRAY( I).WHICHCHILD := CHILDCOUNT;
RHSARRAY(I).KIND :=GROUP;
GET( LLGRAM, RHSARRAY(I).TABLEINDEX )
when 'p' =>
RHSARRAY(I).KIND :=PATCH;
when others =>
-- the ligram table is screwed up
PUT( STANDARDERROR,
"*** Zuse -- Error in table file (360)
raise PARSINGERROR;
end case;
if ENDOF-LINE( LLGRAM )then
RHSARRAY(I)..CASEINDEX 0;
else
GET( LLGRAM, RHSARRAY(I).CASEINDEX )
end if;
if ENDOFLINE( LLGRAM ) then
RHSARRAY(I).SYNCHINDEX :=0;
else
GET( LLGRAM, RHSARRAY(I).SYNCHINDEX )
end if;
SKIPLINE( LLGRAM )
else
-- llgram table is screwed up
PUT-LINE ( STANDARD-ERROR,
73
"*** Zuse -- Error in table file (372) "
-- This is a catastrophic error -- the grammar used to'
-- generate the compiler probably contained errors.
raise PARSINGERROR;
end if;
end loop;
end BUILDRIGHT;
procedure BUILDSELECT( WHICHPROD: INTEGER ) is
-- build selection set
TABLEINDEX: INTEGER; -- Where in table can element be found?
begin
PRODUCTIONS(WHICHPROD).SELSET := (others => FALSE); -- empty set
for I in 1 .. PRODUCTIONS(WHICHPROD).CARDSEL loop
GET( LLGRAM, TABLEINDEX );
PRODUCTIONS(WHICHPROD).SELSET(TABLEINDEX) := TRUE;
end loop;
SKIP_LINE( LLGRAM );
end BUILDSELECT;
begin -- READGRAM
OPEN( LLGRAM, IN_FILE, "TABLE" );
-- read in symbol tables
for I in 1 .. LLTABLESIZE loop 0for J in 1 .. LLSTRINGLENGTH loop
GET( LLGRAM, LLSYMBOLTABLE(I).KEY(J) );
end loop;
GET( LLGRAM, CH );
SKIP LINE( LLGRAM );
if CH = 'g' then
LLSYMBOLTABLE(I).KIND : GROUP;
else -- assume ch = 1
LLSYMBOLTABLE(I).KIND LITERAL; S
end if;
end loop;
-- read in grammar
74
THISRHS := 0;
GET( LLGRAM, AXIOM );
SKIPLINE( LLGRAM );
for I in 1 .. LLPRODSIZE loop
GET( LLGRAM, PRODUCTIONS(I).LHS );
GET( LLGRAM, PRODUCTIONS(I).CARDRHS );
GET( LLGRAM, PRODUCTIONS(I).CARDSEL );
SKIPLINE( LLGRAM );
BUILDRIGHT(I);
BUILDSELECT(I);
end loop;
-- now read in synchronization info
for I in 1 .. LLSYNCHSIZE loop
GET( LLGRAM, SYNCHDATA(I).TOKEN ); -- llsymboltable location
GET( LLGRAM, SYNCHDATA(I).SENT ); -- syrmbol to skip to
SKIPLINE( LLGRAM );
end loop;
CLOSE( LLGRAM );
end READGRAM;
procedure PARSE is -- parse the candidate
LLTOP: INTEGER; -- top of stack pointer
LOCOFANY: INTEGER; -- location of "any" in llsymboltable
procedure ERASE is
-- Has rhs of prod been matched? If so then erase rhs.
begin
-- only erase if at farthest point to the right in a production
while LLSTACK(LLSENTPTR).LASTCHILD loop
-- erase rhs
LLSENTPTR := LLSTACK(LLSENTPTR).PARENT;
if LLSENTPTR = 0 then -- stack is empty
LLTOP := 0;
LLADVANCE := FALSE; -- don't try to advance beyond axiom
return;
75
end if;
LLTOP LLSTACK(LLSENTPTR).TOP; -- lastchild of current rhs
end loop;
end ERASE;
procedure TESTSYNCH; -- forward
procedure EXPAND is
-- expand nonterminal in sentential form
WHERE: INTEGER; -- production being examined
OLDTOP: INTEGER; -- top of stack ptr before expansion
function MATCH( SENTINDEX: INTEGER ) return INTEGER is
-- Does a production whose lhs is sentindex and whose selection
-- set includes token exist? If so, return index to that
-- production as value of match; otherwise, set match to 0.
begin
for I in SENTINDEX .. LLPRODSIZE loop
if PRODUCTIONS(I).LHS = SENTINDEX then
-- production has proper lhs
if PRODUCTIONS(I).SELSET(LLCURTOK.TABLEINDEX) or •
PRODUCTIONS(I).SELSET(LOCOFANY) then
return( I ); -- match found
end if;
else
return( 0 ); -- no match
end if;
end loop;
return( 0 ); -- no match
end MATCH; •
begin -- EXPAND
76
WHERE := MATCH( LLSTACK (LLSENTPTR) .DATA. TABLEINDEX )
UWNERE /= 0 then
* -- rhs of new production will be placed in list
if PRODUCTIONS(WHERE).CARDRHS > 0 then
LLADVANCE :=FALSE;
if ( LLSTACK(LLSENTPTR).LASTCHILD and
(LLSTACK(LLSENTPTR).PARENT > 0) ) and then
LLSTACK(LLSTACK(LLSENTPTR) .PARENT) .LASTCHILD then
LLTOP :=LLSTACK(LLSENTPTR) .PARENT;
LLSENTPTR :=LLSTACK(LLSENTPTR) .PARENT;
end if;
* OLDTOP :=LLTOP;
if LLTOP + PRODUCTIONS(WHERE).CARDRHS > LLMAXSTACK then
PUTLINE( STANDARDERROl',
"** Zuse -- Stack overflow (493)
-- This may be fixed by increasing llmaxstack.
LLFATAL;
end if;
for I in 1 .. PRODUCTIONS(WHERE).CARDRHS loop
LLTOP :=LLTOP + 1;
* LLSTACK(LLTOP) .PARENT := LLSENTPTR;
-- put data into children from the selected production
LLSTACK(LLTOP) .DATA :=RHSARRAY(PRODUCTIONS(WHERE) .RHS+I-l);
LLSTACK(LLTOP). LASTCHILD :=FALSE;
if LLSTACK(LLTOP).DATA.KIND =NONTERMINAL then
* LLSTACK(LLTOP).TOP := OLDTOP + PRODUCTIONS(WHERE).CARDRHS;
end if;
end loop;
-- m~ark rightmost child as the last
LLSTACK(LLTOP).LASTCHILD :=TRUE;
-- move llsentptr to the first new child
LLSENTPTR :=OLDTOP + 1;
end if;
else
* TESTSYNCH;
end if;
end EXPAND;
77
procedure TESTSYNCH is
procedure SYNCHRONIZE is
-synchronize token and Ilsentential form to recover from
-- syntactic error
OLDCURTOKINDEX: INTEGER;
I: INTEGER;
begin
PUT( STANDARDERROR, "*** Unexpected "
LLPRTTOKEN;
PUT( STANDARD_ERROR, " found in line "
PUT( STANDARD_ERROR, LLCURTOK.LINENbMDBER, 1 )
OLDCURTOKINDEX := LLCURTOK. TABLEINDEX;
loop
I := LLSTACK (LLSENTPTR) .DATA. SYNCHINDEX;
while SYNCHDATA(I).SENT /= 0 loop
if ( (LLCURTOK .TABLEINDEX =SYNCHDATA( I).TOKEN) or
(SYNCHDATA(I).TOKEN =LOCOFANY) ) and
((SYNCHDATA(I).SENT =-1) or
(LLSTACK(LLSENTPTR) .DATA .WHICHCHILD <= SYNCHDATA( I) .SENT) )thenif LLCURTOK.TABLEINDEX 1=OLDCURTOKINDEX then
PUT( STANDARDERROR, "-- skipping to
LLPRTSTRING( LLCURTOK.PRINTVALUE )
PUT( STANDARDERROR, " in line ")
PUT( STANDARDERROR, LLCURTOK.LINENUMBER, 1I)
PUTLINE( STANDARDERROR, "')
end if;
if LLSTACK(LLSENTPTR).DATA.CASEINDE.X /= 0 then
-- execute code after ";"
LLTAKEACTION( LLSTACK(LLSENTPTR).DATA.C(ASEINDEX )
end if;
if SYNCHDATA(I).SENT = -1 then
-- skip to rightmost node and signal reduction
while not LLSTACK( LLSENTPTR) .LASTCHILD loop
LLSENTPTR := LLSENTPTR + 1;
end loop;
78
else
-- skip to correct symbol in rhs
wlile LLSTACX(LLSETPTR).DATA.WHICHCHILD /=
SYNCHDATA(I) SENT loop
LLSENTPTR := LLSENTPTR + 1;
end loop;
LLADVANCE := FALSE;
* end if;
return;
end if;
I := I+1;
end loop;
if LLCURTOK.TABLEINDEX = LLLOCEOS then
PUTLINE( STANDARDERROR, " -- terminating translation." );
raise PARSINGERROR;
else
LLNEXTTOKEN;
end if;
end loop;
end SYNCHRONIZE;
begin -- TESTSYNCH
while LLSTACK(LLSENTPTR).DATA.SYNCHINDEX = 0 loop
-- no synch info there
if LLSTACK(LLSENTPTR).PARENT /= 0 then
-- there really is a parent
LLSENTPTR LLSTACK(LLSENTPTR).PARENT;
else
LLFATAL;
end if;
end loop;
SYNCHRONIZE;
end TESTSYNCH;
begin -- PARSE
LLSENTPTR := 1; -- initialize sentform to be axiom
79
LLTOP 1;
LLSTACK(LLSENTPTR) .LASTCHILD TRUE,
LLSTACK(LILSENTPTR) PARENT 0;
LLSTACK(LLSENTPTR) .DATA. SYNCHINDEX :=0;
LLSTACK( LLSENTPTR) .DATA. KIND :=NONTERMINAL;
LLSTACK(LLSENTPTR) .DATA.TABLEINDEX :=AXIOM;
-find location of "any" in ilsymboltable
LOCOFANY :=LLFIND( (l1')'a', 2=>'n', 3=>'y', others => I ) GROUP )
-- find location of endof source ("@") in ilsymboltable
LLLOCEOS :=LLFIND( (1=>'@', others => '),GROUP )
LLNEXTTOKEN;
while LLTOP /=0 loop -- derivation isn't finished
LLADVANCE TRUE; -- presume llsentptr advanced after iteration
case LLSTACK(LLSENTPTR) .DATA.KIND is
when GROUP I LITERAL =>
if LLSTACK (LLSENTPTR) .DATA. TABLEINDEX
LLCURTOK. TABLEINDEX. then
LLSTACK( LLSENTPTR) .ATTRIBUTE := LLCURTOK .ATTRIBUTE;
LLNEXTTOKEN;
elsif LLSTACK(LLSENTPTR) .DATA. TABLEINDEX = LOCOFNULL then
null;
elsif not LLSTACK(LLSENTPTR).LASTCHILD then
if LLSTACK( LLSENTPTR + 1) .DATA. KIND = PATCH then
LLTAKEACTION( LLSTACK(LLSENTPTR + 1) .DATA. CASEINDEX )
else
TESTSYNCH;
end if;
else
TEST SYNCH ;
end if;
when NONTERMINAL >
EXPAND;
when ACTION =>
LLTAKEACTION( LLSTACK(LLSENTPTR) .DATA.CASEINDEX )
when PATCH =>
null;
end case;
if LLADVANCE then
80
-- finished with current llstack(llsentptr).
-- move on to next node in tree
ERASE;
LLSENTPTR := LLSENTPTR + 1;
end if;
end loop;
if LLCURTOK.TABLEINDEX /= LLLOCEOS then
-- Only matched against part of candidate, which is not a sentence.
-- Terminate parsing action.
LLFATAL;
end if;
end PARSE;
begin -- LLMAIN
READGRAM; -- Get the grammar from the user.
PARSE; -- Parse the curient input stream.
end LLMAIN;
begin -- LLCOMPILE
CREATE( STANDARDERROR, OUT_FILE, "SYS$ERROR" ); -- just in case
LLMAIN;
CLOSE( STANDARD ERROR );
exception
when PARSINGERROR =>
CLOSE( STANDARDERROR );
end LLCOMPILE;
81
0
separate ( LLCOMPILE
package body LL TOKENS is
-- This package is the bootstrap token stream generator for the
-- lexical analyzer generator, ADALEX.
subtype UPPERCASELETTER is CHARACTER range 'A'..'Z';
subtype LOWERCASELETTER is CHARACTER range 'a'..'z';
subtype DIGIT is CHARACTER range '0'..'9';
STARTOFLINE: BOOLEAN := TRUE;
CURRENTCHAR: CHARACTER := ' ';
CURRENTLINE: INTEGER : 0; 0
BUFFERSIZE: constant := 100;
CHARBUFFER: array (1 .. BUFFER-SIZE) of CHARACTER;
LOOKCHAR: CHARACTER;
CURBUFNDX: INTEGER;
TOPBUFNDX: INTEGER;
procedure ADVANCE( EOS: out BOOLEAN;
NEXT: out LLTOKEN;
MORE: in BOOLEAN := TRUE ) is
PRINTVALUE: LLSTRINGS;
TABLEINDEX: INTEGER;
procedure GETCHAR( CHAR: out CHARACTER ) is
begin
if ENDOFFILE(STANDARDINPUT) then
CHAR := ASCII.EOT;
elsif END OF LINE(STANDARDINPUT) then
82
SKIPLINE(STANDARDINPUT);
CHAR := ASCII.ETX;
STARTOFLINE := TRUE;
else
GET(STANDARDINPUT, CHAR);
end if;
end GETCHAR;
procedure CHARADVANCE is
begin
if STARTOFLINE then
STARTOFLINE := FALSE;
CURRENTLINE CURRENTLINE + 1;
CURBUFNDX 0;
TOPBUFNDX 0;
GETCHAR(CURRENT CHAR);
elsif CURBUFNDX < TOPBUFNDX then
-- take char from buffer
CURBUFNDX CURBUFNDX + 1;
CURRENTCHAR CHARBUFFER(CURBUFNDX);
else
GETCHAR(CURRENT CHAR); -- from input file
end if;
end CHARADVANCE;
procedure LOOKAHEAD is
begin
GET CHAR(LOOK CHAR);
TOPBUFNDX := TOPBUFNDX + 1;
CHARBUFFER(TOP BUFNDX) := LOOKCHAR;
end;
procedure NEXTCHARACTER is
begin
PRINTVALUE(1) CURRENTCHAR;
CHARADVANCE;
LOOK-AHEAD;
if LOOKCHAR = ''' then
83
PRINTVALUE(2) CURRENTCHAR;
CHARADVANCE;
PRINTVALUE( 3) = CURRENTCHAR;
CHARADVANCE;
TABLEINDEX := LLFIND("CharLit ", GROUP);
NEXT.ATTRIBUTE := new TREENODE'(CHAR,ANONYMOUS,
(others => FALSE),FALSE,FALSE,PRINTVALUE(2));
else
TABLEINDEX := LLFIND(PRINTVALUE, LITERAL);
if TABLEINDEX = 0 then
TABLEINDEX LLFIND("Other ", GROUP);
end if;
NEXT.ATTRIBUTE := new TREENODE'(LIT,ANONIYMOUS,
(others => FALSE),FALSE,FALSE,PRINTVALUE);
end if;
end NEXTCHARACTER;
procedure NEXTIDENTIFIER is
I: INTEGER := 1;
begin
while (CURRENTCHAR in UPPERCASELETTER) or
(CURRENTCHAR in LOWERCASELETTER) or
(CURRENTCHAR in DIGIT) or
(CURRENTCHAR = '_') loop
if I <= LLSTRINGLENGTH then
PRINTVALUE(I) := CURRENTCHAR; 0I := I + 1;
end if;
CHARADVANCE;
end loop;
TABLEINDEX := LLFIND(PRINTVALUE, LITERAL);
if TABLEINDEX = 0 then
TABLEINDEX LLFIND("Identifier ", GROUP);
end if;
NEXT.ATTRIBUTE := new TREENODE'(IDENT,ANONYMOUS,
(others => FALSE),FALSE,FALSE,PRINTVALUE);
end NEXTIDENTIFIER;
84
procedure NEXTSPECSYM is
begin
PRINTVALUE(1) := CURRENTCHAR;
if CURRENTCHAR = '.' then
CHARADVANCE;
if CURRENTCHAR - '.' then
PRINTVALUE(2) CURRENTCHAR;
CHARADVANCE;
end if;
elsif CURRENTCHAR = '.' then
CHARADVANCE;
if CURRENTCHAR - '-' then
PRINTVALUE(2) CURRENTCHAR;
CHARADVANCE;
elsif CURRENTCHAR = '.' then
LOOKAHEAD;
if LOOKCHAR - '-' then
PRINTVALUE(2) CURRENTCHAR;
CHARADVANCE;
PRINTVALUE(3) : CURRENTCHAR;
CHARADVANCE;
end if;
end if;
elsif CURRENTCHAR - '-' then
CHARADVANCE;
if CURRENTCHAR = '>' then
PRINTVALUE(2) := CURRENTCHAR;
CHARADVANCE;
end if;
else
CHARADVANCE;
end if;
TABLEINDEX := LLFIND(PRINTVALUE, LITERAL);
if TABLEINDEX = 0 then
TABLEINDEX LLFIND("Other ", GROUP);
end if;
NEXT.ATTRIBUTE := new TREENODE'(LITANONYMOUS,
(others => FALSE),FALSEFALSEPRINTVALUE);
85
end NEXTSPEC_SYM;
procedure NEXTSTRING is
I: INTEGER := 2;
begin
PRINTVALDE(l)
CHARADVANCE;
while CURRENT CHAR /= '"' loop
if I < LLSTRINGLENGTH then
PRINTVALUE(I) := CURRENTCHAR;
I := I + 1;
end if;
exit when ENDOFLINE(STANDARDINPUT);
CHARADVANCE;
end loop;
PRINTVALUE(I)
CHARADVANCE;
TABLEINDEX := LLFIND("StringLit ", GROUP);
NEXT.ATTRIBUTE := new TREENODE'(STR,ANONYMOUS,
(others => FALSE),FALSEFALSE,PRINTVALUE);
end NEXTSTRING;
begin -- ADVANCE
PRINTVALUE := " ";
-- Skip white space and comments
while (CURRENTCHAR = ASCII.ETX) or
(CURRENTCHAR = ASCII.HT) or
(CURRENTCHAR = ' ') or
(CURRENTCHAR = '-') loop
if CURRENTCHAR - '-' then
LOOKAHEAD;
exit when LOOKCHAR /= '-';
SKIP_LINE(STANDARDINPUT);
STARTOFLINE := TRUE;
end if; 0CHARADVANCE;
end loop;
if CURRENT-CHAR = ASCII.EOT then
086
S
EOS := TRUE;
elsif CURRENTCHAR = '"' then
NEXTSTRING;
elsif CURRENTCHAR = ''' then
NEXTCHARACTER;
elsif (CURRENTCHAR in UPR_ CASELETTER) or
(CURRENTCHAR in LOWERCASELETTER) then
NEXTIDENTIFIER;
else
NEXTSPECSYM;
end if;
NEXT.PRINTVALUE PRINTVALUE;
NEXT.TABLEINDEX TABLEINDEX;
NEXT.LINENUMBER CURRENTLINE;
end ADVANCE;
end LLTOKENS;
87
-- ADALEX SUPPORT PACKAGE SPECIFICATION
-- This package contains all the supporting translation routines for
-- the Adalex translator. These procedures and functions are called
from the parsing actions specified in the translation grammar.
-- Associated packages, procedures, and files:
-- o The body of this package is defined in file LLSUPBODY.ADA
-- o The Adalex translation grammar is defined in file ADALEX.GRM
-- o The parsing actions are included in procedure LLTAKEACTION in
-- file LLACTIONS.ADA
-- o Declarations for data structures defined in the grammar are
-- included in package LLDECLARATIONS in file LLDECLS.ADA
with LLDECLARATIONS;
package LL_SUPPORT is
use LLDECLARATIONS;
PATTERNTABLEFULL: exception;
-- Raised if the pattern table overflows.
function ALTERNATE ( LEFT, RIGHT: in LLATTRIBUTE ) return LLATTRIBUTE;
-- Form the alternation of two patterns.
function CHARRANGE ( STARTCH, END_CH: in LLATTRIBUTE
return LLATTRIBUTE;
88
-- Form a character or range pattern.
procedure COMPLETEPATTERNS;
-- Complete the construction of all the patterns defined.
function CONCATENATE ( LEFT, RIGHT: in LLATTRIBUTE ) return LLATTRIBUTE;
-- Concatenate two patterns.
function CVTASCII ( NAME: in LLATTRIBUTE ) return LLATTRIBUTE;
-- Convert an ASCII character name into a character pattern.
function CVTSTRING ( LIT: in LLATTRIBUTE ) return LLATTRIBUTE;
-- Convert a literal string into a pattern.
procedure EMITADVANCEHDR;
-- Emit the beginning part of the procedure ADVANCE.
procedure EMITADVANCETLR;
-- Emit the end of the procedure ADVANCE.
procedure EMIT_PKG_DECLS;
-- Emit the declarations for the generated package body.
procedure EMITSCANPROC;
-- Generate the pattern-matching code for all referenced patterns.
procedure EMIT-TOKEN ( TOKEN: in LLATTRIBUTE );
-- Emit an identifier or literal token value.
procedure INCLUDEPATTERN( PAT_ID: in LLATTRIBUTE );
-- Include the referenced pattern for code generation.
function LOOK_AHEAD ( PAT: in LLATTRIBUTE ) eturn LLATTRIBUTE;
-- Create a look-ahead pattern.
function OPTION ( PAT: in LLATTRIBUTE ) return LLATTRIBUTE;
-- Form an optional pattern.
89
function REPEAT ( PAT: in LLATTRIBUTE ) return LLATTRIBUTE;
-- Form a repetition pattern.
procedure STOREPATTERN ( PAT ID, PAT: in LLATTRIBUTE );
-- Store a pattern definition in the pattern table.
end LLSUPPORT;
900
0
-- ADALEX SUPPORT PACKAGE BODY
-- This package contains all the supporting translation routines for
-- the Adalex translator. These procedures and functions are called
-- from the parsing actions specified in the translation grammar.
-- Associated packages, procedures, and files:
-- o The specification for this package is defined in file
-- LLSUPSPEC.ADA
-- o The Adalex translation grammar is defined in file ADALEX.GRM
-- o The parsing actions are included in procedure LLTAKEACTION in
-- file LLACTIONS.ADA
-- o Declarations for dat? structures defined in the grammar are
-- included in package LLDECLARATIONS in file LLDECLS.ADA
with LLDECLARATIONS, TEXT_IO, INTEGERTEXT_IO;
package body LLSUPPORT is
use LLDECLARATIONS, TEXT_IO, INTEGERTEXTIO;
EMPTYPATTERN: constant LLATTRIBUTE :=
new TREENODE'(EMPTY,ANONYMOUS,(others => FALSE),TRUE,FALSE);
OUTPUTLINELIMIT: constant := 60;
-- If an output line exceeds this limit, start a new line.
PATTERNTABLESIZE: constant := 100;
91
-- The maximum number of entries in the pattern table.
CURTABLEENTRIES: INTEGER range 0 .. PATTERNTABLESIZE := 0;
-- Current number of pattern table entries.
-- Updated by procedure STOREPATTERN.
EMITTEDCHARS: INTEGER := 0;
-- The number of characters emitted on the current output line.
LEXICON: LLATTRIBUTE := null;
-- Pattern constructed for code generation.
-- Created by calls to procedure INCLUDEPATTERN.
-- Consumed by procedure EMITSCANPROC.
PATTERNTABLE: array (I .. PATTERNTABLESIZE) of LLATTRIBUTE;
-- Table containing all defined patterns in alphabetical order.
-- Updated by procedure STOREPATTERN.
ROOTPATTERNNAME: LLSTRINGS;
-- Holds the name of the current pattern being completed.
-- Used to check for recursive references.
procedure COMPLETEPAT ( PAT: in out LLATTRIBUTE );
-- Complete the construction of an arbitrary pattern.
procedure EMITPATTERN NAME ( FILE: in FILETYPE; NAME: in LLSTRINGS );
-- Write the name of a pattern to a specified file.
procedure EMITPATTERN NAME ( NAME: in LLSTRINGS );
-- Write the name of a pattern to the standard output file.
function LOOKUPPATTERN ( PATID: in LLATTRIBUTE ) return INTEGER;
-- Return the index of the named pattern in the pattern table.
function ALTERNATE ( LEFT, RIGHT: in LLATTRIBUTE ) return LLATTRIBUTE is
92
-- Form the alternation of two patterns.
NEWLEFT, NEWRIGHT: LLATTRIBUTE;
function MERGERANGES ( LEFT, RIGHT: in LLATTRIBUTE
return LLATTRIBUTE is
-- Merge the selection sets of two range nodes into a single node.
SELSET: SELECTIONSET;
begin
for CH in SELECTION SET'RANGE loop
SELSET(CH) := LEFT.SEL SET(CH) or RIGHT.SELSET(CH);
end loop;
return new TREENODE'(RNG,LEFT.NAME,SEL_SET,FALSE,FALSE);
end MERGERANGES;
begin -- ALTERNATE(LEFT,RIGHT)
-- Form the alternation of two patterns.
-- Create an alternation node if the right term is not empty.
if RIGHT = null or else RIGHT.VARIANT = BAD then
return LEFT;
elsif LEFT.VARIANT = BAD then
return RIGHT;
end if;
if LEFT.VARIANT = ALT then
if RIGHT.VARIANT = ALT then
if LEFT.NAME = ANONYMOUS then
NEWLEFT : LEFT.LEFT;
NEWRIGHT := ALTERNATE(LEFT.RIGHT,RIGHT);
elsif RIGHT.NAME = ANONYMOUS then
NEWLEFT : RIGHT.LEFT;
NEWRIGHT := ALTERNATE(RIGHT.RIGHT,LEFT);
else
NEWLEFT := new TREENODE'(LEFT.LEFT.all);
NEWLEFT.NAME := LEFT.NAME;
NEWRIGHT := new TREENODE'(LEFT.RIGHT.all);
NEWRIGHT.NAME := LEFT.NAME;
NEWRIGHT := ALTERNATE(NEWRIGHT,RIGHT);
end if;
93
else
NEWLEFT RIGHT;
NEWRIGHT LEFT;
end if;
else
NEWLEFT :LEFT;
NEWRIGHT :RIGHT;
end if;
if NEWLEFT.VARIANT =RNG then
if NEWRIGHT.VARIANT = RNG
and NEWLEFT.NAME = NEWRIGHT.NAME then
return MERGERANGES(NEW _LEFT,NEWRIGHT);
elsif (NEWRIGHT.VARIANT = ALT and NEWRIGHT.NAME ANONYMOUS)
and then NEWRIGHT.LEFT.VARIANT = RNG
and then NEWLEFT.NAME = NEWRIGIIT.LEFT.NAME then
return new TREENODE' (ALT,ANONYMOUS, (others => FALSE),
FALSE, FALSE,MERGERANGES(NEWLEFT,NEWRIGHT. LEFT),
NEWRIGHT.RIGHT )
else
return new TREENODE' (ALT,ANONYMOUS, (others => FALSE),
FALSE,FALSE,NEWLEFT,NEW RIGHT);
end if;
elsif NEWRIGHT.VARIANT = RNG then
-- keep ranges on the left for convenience
return new TREENODE' (ALT,ANONYMOUS, (others => FALSE),
FALSE,FALSE,NEWRIGHT,NEWLEFT);
else
return new TREENODE' (ALT ,ANONYMOUS, (others => FALSE),
FALSE,FALSE,NEWLEFT,NEWRIGHT);
end if;
end ALTERNATE;
function CHARRANGE ( STARTCH, ENDCH: in LLATTRIBUTE
return LLATTRIBUTE is
-- Form a character or range pattern.
-- Create a range node for a single character or range expression.
RESULT: LLATTRIBUTE;
94
begin
RESIqT := new TREENODE'(RNG,ANONYMOUS,(others => FALSE),FALSE,FALSE);
if ENDCH = null then
-- the pattern is a single character
RESULT.SELSET(START_CH.CHARVAL) := TRUE;
else
-- the pattern is a range expression
for CH in STARTCH.CHARVAL .. ENDCH.CHARVAL loop
RESULT.SEL SET(CH) := TRUE;
end loop;
end if;
return RESULT;
end CHARRANGE;
procedure COMPLETE_PAT ( PAT: in out LLATTRIBUTE ) is
-- Complete the construction of an arbitrary pattern.
N: INTEGER range 0 .. PATTERNTABLESIZE;
procedure COMPLETECONCAT ( PAT: in out LLATTRIBUTE );
-- Complete the construction of a concatenation node.
procedure COMPLETEOPT ( PAT: in out LLATTRIBUTE );
-- Complete the construction of an optional pattern.
procedure COMPLETEALT ( PAT: in out LLATTRIBUTE ) is
-- Complete the construction of an alternation pattern.
-- Maintain the pattern in normal form for code generation.
-- Convert patterns with empty alternatives into option patterns.
-- Convert ambiguous patterns into equivalent unambiguous patterns.
NAME: LLSTRINGS;
INTERSECT: BOOLEAN : FALSE;
function RESTRICT ( PAT: in LLATTRIBUTE; SUBSET: in SELECTION_SET
95
return LLATTRIBUTE is
-- Return the subset of a pattern that is restricted to a
-- specified selection set.
EMPTY: BOOLEAN := TRUE;
NEWPAT, NEWLEFT, NEWRIGHT: LLATTRIBUTE;
NEWSET: SELECTIONSET;
begin
case PAT.VARIANT is
when ALT =>
-- Restrict both alternatives
NEWLEFT := RESTRICT(PAT.LEFT,SUBSET);
NEWRIGHT := RESTRICT(PAT.RIGHT,SUBSET);
if NEWLEFT = EMPTYPATTERN then
if PAT.NAME /= ANONYMOUS then
NEWRIGHT.NAME := PAT.NAME;
end if;
return NEWRIGHT; •
elsif NEWRIGHT = EMPTYPATTERN then
if PAT.NAME /= ANONYMOUS then
NEWLEFT.NAME := PAT.NAME;
end if;
return NEWLEFT;
else
NEWPAT := ALTERNATE(NEWLEFT,NEWRIGHT);
NEWPAT.NAME := PAT.NAME;
COMPLETE PAT (NEW PAT) ; 0return NEWPAT;
end if;
when CAT =>
-- Restrict the left component.
NEWLEFT := RESTRICT(PAT.LEFT,SUBSET);
if NEWLEFT = EMPTYPATTERN then
return EMPTYPATTERN;
else
NEWPAT := CONCATENATE(NEWLEFT,PAT.RIGHT);
NEWPAT.NAME :- PAT.NAME;
COMPLETECONCAT(NEWPAT);
return NEWPAT;
96
end if;
when OPT I REP =>
-- Restrict the optional or repeated pattern
NEWPAT := RESTRICT(PAT.EXPR,SUBSET);
if NEWPAT = EMPTYPATTERN then
return EMPTYPATTERN;
elsif PAT.VARIANT = OPT then
NEWPAT := OPTION(NEWPAT);
NEWPAT.NAME := PAT.NAME;
COMPLETE OPT(NEWPAT);
return NEW PAT;
else -- PAT.VARIANT = REP
NEWPAT := OPTION( CONCATENATE(NEWPAT,PAT) );
NEWPAT.NAME := PAT.NAME;
COMPLETEOPT(NEW PAT);
return NEWPAT;
end if;
when RNG =>
-- Restrict a simple range
for CH in SELECTIONSET'RANGE loop
NEWSET(CH) := SUBSET(CH) and PAT.SELSET(CH);
EMPTY := EMPTY and not NEWSET(CH);
end loop;
if EMPTY then
return EMPTYPATTERN;
else
return new TREENODE'(RNG,PAT.NAME,NEWSETFALSEFALSE);
end if;
when others =>
-- No other kinds of patterns should show up here.
return BADPATTERN;
end case;
end RESTRICT;
function TAIL ( PAT: in LLATTRIBUTE; SUBSET: in SELECTION-SET
return LLATTRIBUTE is
-- Return the tail of pattern PAT with the selection set SUBSET.
LEFTSET, RIGHTSET: SELECTIONSET;
97
NEWPAT, NEWLEFT, NEWRIGHT: LLATTRIBUTE;
begin
case PAT.VARIANT is
when ALT =>
-- Combine the tails of the two alternatives
for CH in SELECTIONSET'RANGE loop
LEFTSET(CH) := PAT.LEFT.SELSET(CH) and SUBSET(CH);
RIGHTSET(CH) := PAT.RIGHT.SELSET(CH) and SUBSET(CH);
end loop;
NEWPAT := ALTERNATE( TAIL(PAT.LEFT,LEFTSET),
TAIL(PAT.RIGHT,RIGHTSET) );
when CAT =>
case PAT.LEFT.VARIANT is
when ALT =>
-- Convert pattern (AIB)C into ACIBC then find tail.
NEWLEFT := CONCATENATE(PAT.LEFT.LEFT,PAT.RIGHT);
COMPLETECONCAT(NEWLEFT);
NEWRIGHT := CONCATENATE(PAT.LEFT.RIGHT,PAT.RIGHT);
COMPLETECONCAT(NEWRIGHT);
for CH in SELECTIONSET'RANGE loop
LEFT_SET(CH) := SUBSET(CH) and
NEWLEFT.SELSET(CH);
RIGHT_SET(CH) := SUBSET(CH) and
NEWRIGHT.SELSET(CH);
end loop;
NEWPAT := ALTERNATE( TAIL(NEWLEFT,LEFT_SET), 0TAIL(NEW RIGHT,RIGHT_SET) );
when OPT I REP =>
-- Convert pattern (A]B into ABIB then find tail.
for CH in SELECTIONSET'RANGE loop
LEFT_SET(CH) := SUBSET(CH) and
PAT.LEFT.SELSET(CH);
RIGHTSET(CH) := SUBSET(CH) and
not PAT.LEFT.SELSET(CH);
end loop; 0if PAT.LEFT.VARIANT = OPT then
NEWLEFT := CONCATENATE(
TAIL(PAT.LEFT.EXPR,LEFT_SET),
98
I ! • •
PAT.RIGHT );
else -- PAT.LEFT.VARIANT = REP
NEWLEFT := CONCATENATE(
TAIL(PAT.LEFT.EXPR,LEFT_SET),
PAT );
end if;
COMPLETECONCAT(NEW LEFT);
NEWPAT := ALTERNATE( NEW-LEFT,
TAIL(PAT.RIGHT,RIGHTSET) );
when RNG =>
-- This one is easy.
return new TREENODE'(PAT.RIGHT.all);
when others =>
-- No other kinds of patterns should show up here.
return BADPATTERN;
end case;
when RNG =>
-- This one is easy too.
return EMPTYPATTERN;
when others =>
-- No other kinds of patterns should show up here.
return BADPATTERN;
end case;
COMPLETEPAT(NEWPAT);
return NEWPAT;
end TAIL;
procedure RESOLVEAMBIGUITY ( PAT: in out LLATTRIBUTE ; i6
-- Resolve the ambiguity in an alternative pattern.
-- Ambiguity arises when two alternatives have overlapping
-- selection sets. Several transformations are applied here
-- to create an equivalent pattern without any overlap.
LEFT, RIGHT, MIDDLE : LLATTRIBUTE;
LEFTTAIL, RIGHTTAIL: LLATTRIBUTE;
LEFTSET, MIDDLESET, RIGHTSET: SELECTIONSET;
NAME: LLSTRINGS := PAT.NAME;
begin
-- Separate the selection sets into left, middle, and right sets.
99
f or CH in SELECTIONSET'RANGE loop
LEFTSET (CH) =PAT. LEFT. SEL SET (CH) and
not PAT. RIGHT. SELSET (CH);
RIGHTSET(CH) PAT. RIGHT. SEL-SET(CH) and
not PAT.LEFT.SELSET(CH);
MIDDLESET(CH) =PAT, LEFT. SEL-SET(CH) and
PAT.RIGHT.SEL_SET(CH);
end loop;
-- Construct a new pattern for the overlapping middle part.
LEFT RESTRICT(PAT.LEFT,MIDDLE SET);
RIGHT =RESTRICT(PAT .RIGHT,MIDDLE SET);
if LEFT.VARIANT = ALT then
MIDDLE :=new TREENODE' (LEFT.RIGHT.all);
MIDDLE.NAME :=LEFT.NAIE;
RIGHT :=ALTERNATE(MIDDLE,RIGHT);
COMPLETEPAT(RIGHT);
MIDDLE :=new TREENODE'(LEFT.LEFT.all);0
MIDDLE.NAME :=LEFT.NAME;
MIDDLE := ALTERNATE(MIDDLE,RIGHT);
elsif RIGHT.VARIANT =ALT then
MIDDLE :=new TREENODE'(RIGHT.LEFT.all);
MIDDLE.NAME :=RIGHT.NAME;
LEFT :=ALTERNATE(LEFT,MIDDLE);
COMPLETEPAT(LEFT);
MIDDLE :=new TREENODE' (RIGHT.RIGHT.all);
MIDDLE.NAME := RIGHT.NAME;0
MIDDLE :=ALTERNATE(LEFT,MIDDLE);
else
LEFTTAIL TAIL(LEFT,MIDDLE_SET);
RIGHTTAIL TAIL(RIGHT,MIDDLESET);
if LEFTTAIL =EMPTYPATTERN then
RIGHTTAIL :=OPTION(RIGHT TAIL);
RIGHTTAIL.NAME :=RIGHT.NAME;-
COMPLETE OPT (RIGHT TAIL);
MIDDLE := CONCATENATE( new
TREE NODE' (RNG, LEFT. NAME,MIDDLESET, FALSE, FALSE),
RIGHT-TAIL);
elsif RIGHTTAIL - EMPTYPATTERN then
100
0
LEFTTAIL :=OPTION(LEFTTAIL);
LEFTTAIL.NAME :-LEFT.NAME;
COMPLETEOPT(LEFTTAIL);
MIDDLE :=CONCATENATE( new TREENODE' (RNG,RIGHT.NAME,
MIDDLESET,FALSE,FALSE), LEFT_TAIL )
elsif LEFT.VARIANT =CAT and then
(LEFT.LEFT.VARIANT =RNG and LEFT.RIGHT.VARIANT =OPT) then
RIGHTTAIL.NAME :=RIGHT.NAME;
if LEFT.NAME = ANONYMOUS then
MIDDLE =CONCATENATE( new TREE NODE' (RNG, LEFT. LEFT. NAME,
MIDDLE_SET,FALSE,FALSE),
ALTERNATE(LEFT.RIGHT,RIGHTTAIL) )
else
MIDDLE =CONCATENATE( new TREE NODE' (RNG,LEFT. NAME,
MIDDLE_SET,FALSE,FALSE),
ALTERNATE(LEFT-TAIL,RIGHTTAIL) )
end if;
elsif RIGHT.VARIANT = CAT and then
(RIGHT.LEFT.VARIANT = RNG and RIGHT.RIGHT.VARIANT = OPT) then
LEFTTAIL.NAME :=LEFT.NAME;
if RIGHT.NAME = ANONYMOUS then
MIDDLE =CONCATENATE( new TREE NODE' (RNG, RIGHT. LEFT. NAME,
MIDDLE_SET,FALSE,FALSE),
ALTERNATE(LEFT-TAIL,RIGHT.RIGHT) )
else
MIDDLE CONCATENATE( new TREE NODE' (RNG,RIGHT.NAME,
MIDDLE_SET,FALSE,FALSE),
ALTERNATE(LEFT TAIL,RIGHT TAIL) )
end if;
else
LEFTTAIL.NAME LEFT.NAME;
RIGHTTAIL.NAME RIGHT.NAME;
MIDDLE := CONCATENATE ( new TREE_-NODE' (RNG, ANONYMOUS,MIDDLESET,FALSE,FALSE),
ALTERNATE(LEFTTAIL,RIGHTTAIL) )
end if;
end if;
COMPLETE PAT (MIDDLE);
101
-- Restrict the non-overlapping parts of the pattern to their
-- respective subsets and reconstruct the complete pattern.
LEFT = RESTRICT(PAT.LEFT,LEFT_SET);
RIGHT = RESTRICT(PAT.RIGHT,RICHT_SET);
if LEFT = EMPTYPATTERN then
if RIGHT = EMPTYPATTERN then
PAT MIDDLE;
else
PAT ALTERNATE(MIDDLE,RIGHT);
COMPLETEPAT(PAT);
end if;
elsif RIGHT = EMPTYPATTERN then
PAT := ALTERNATE(LEFT,MIDDLE);
COMPLETEPAT(PAT);
else
PAT - ALTERNATE( LEFT, ALTERNATE(MIDDLE,RIGHTJ );
COMPLETEPAT(PAT); Send if;
PAT.NAME := NAME;
end RESOLVEAMBIGUITY;
begin -- COMPLETE ALT(PAT)
-- Complete the construction of an alternation pattern.
-- Make the remaining alternative optional if one side is empty.
if PAT.LEFT = EMPTYPATTERN then
if PAT.RIGHT - EMPTYPATTERN then •
PAT EMPTYPATTERN;
else
PAT OPTION(PAT.RIGHT);
COMPLETEOPT(PAT);
end if;
elsif PAT.RIGHT = EMPTYPATTERN then
PAT := OPTION(PAT.LEFT);
COMPLETEOPT(PAT);
else 0
COMPLETE PAT(PAT.LEFT);
COMPLETEPAT(PAT.RIGHT);
-- Combine the two selection sets and see if they overlap.
102
40
for CH in SELECTIONSET'RANGE loop
PAT.SELSET(CH) := PAT.LEFT.SEL_SET(CH) or PAT.RIGHT.SELSET(CH);
INTERSECT := INTERSECT or
(PAT.LEFT.SELSET(CH) and PAT.RIGHT.SELSET(CH));
end loop;
if INTERSECT then
RESOLVEAMBIGUITY(PAT);
else
-- If either alternative is optional simplify the pattern.
PAT.NULLABLE := PAT.LEFT.NULLABLE or PAT.RIGHT.NULLABLE;
if PAT.LEFT.VARIANT = OPT then
NAME := PAT.LEFT.NAME;
PAT.LEFT := new TREENODE'(PAT.LEFT.EXPR.all);
PAT.LEFT.NAME := NAME;
end if;
if PAT.RIGHT.VARIANT = OPT then
NAME := PAT.RIGHT.NAME;
PAT.RIGHT := new TREENODE'(PAT.RIGHT.EXPR.all);
PAT.RIGHT.NAME := NAME;
end if;
end if;
end if;
end COMPLETEALT;
procedure COMPLETECONCAT ( PAT: in out LLATTRIBUTE ) is
-- Complete the construction of a concatenation node.
-- Maintain the pattern in normal form for code generation.
SELSET: SELECTIONSET;
begin
if PAT.LEFT = EMPTYPATTERN then
PAT := PAT.RIGHT;
COMPLETE-PAT(PAT);
else
COMPLETEPAT(PAT.LEFT);
COMPLETEPAT(PAT.RIGHT);
-- Make concatenations right associative for code generation.
while PAT.LEFT.VARIANT = CAT loop
PAT.RIGHT := CONCATENATE(PAT.LEFT.RIGHT,PAT.RIGHT);
103
COMPLETECONCAT(PAT.RIGHT);
PAT.LEFT := PAT.LEFT.LEFT;
end loop;
if PAT.LEFT.NULLABLE then
case PAT.LEFT.VARIANT is
when ALT =>
PAT.LEFT := new TREE NODE'(PAT.LEFT.all);
PAT.LEFT.NULLABLE := FALSE;
PAT := ALTERNATE( CONCATENATE(PAT.LEFT,PAT.RIGHT),
PAT.RIGHT );
COMPLETE ALT(PAT);
when REP =>
for CH in SELECTIONSET'RANGE loop
PAT.SELSET(CH) := PAT.LEFT.SELSET(CH) or
PAT.RIGHT.SELSET(CH);
end loop;
PAT.NULLABLE := PAT.RIGHT.NULLABLE;
when OPT =>
PAT :- ALTERNATE( CONCATENATE(PAT.LEFT.EXPR,PAT.RIGHT),
PAT.RIGHT );
COMPLETE ALT(PAT);
when others =>
-- No other kinds of patterns should show up here.
PAT := BADPATTERN;
end case;
else
PAT.SELSET := PAT.LEFT.SELSET;
end if;
end if;
end COMPLETECONCAT;
procedure COMPLETE-OPT ( PAT: in out LLATTRIBUTE ) is
-- Complete the construction of an optional pattern.
-- Maintain the pattern in normal form for code generation.
-- Fill in the selection set and make the pattern nullable.
NAME: LLSTRINGS;
begin
COMPLETEPAT(PAT.EXPR);
104
case PAT.EXPR.VARIANT is
when ALT =>
NAME PAT.NAME;
PAT PAT.EXPR;
PAT.NAME := NAME;
when CAT I RNG =>
PAT.SELSET := PAT.EXPR.SELSET;
when others =>
-- No other kinds of patterns should show up here.
PAT := BADPATTERN;
return;
end case;
PAT.NULLABLE := TRUE;
end COMPLETEOPT;
begin -- COMPLETEPAT(PAT)
-- Complete the construction of an arbitrary pattern.
if PAT.SELSET = EMPTYPATTERN.SELSET then
-- It has not been completed yet.
case PAT.VARIANT is
when ALT =>
COMPLETEALT(PAT);
when BAD =>
null;
when CAT =>
COMPLETECONCAT(PAT);
when IDENT =>
-- Check for a recursive pattern reference.
if PAT.STRINGVAL = ROOTPATTERNNAME then
PUT(STANDARDERROR,"*** Pattern """);
EMITPATTERNNAME(STANDARDERROR,PAT.STRINGVAL);
PUT(STANDARDERROR,""" on line ");
PUT(STANDARDERROR,0,1);
PUTLINE(STANDARDERROR," is defined recursively.");
PAT :- EMPTYPATTERN;
else
-- Pick up the definition from the pattern table.
N :- LOOKUPPATTERN(PAT);
105
if N = 0 then
PUT(STANDARDERROR,"*** Pattern ...);
EMITPATTERNNAME(STANDARDERROR,PAT.STRING_VAL);
PUT(STANDARDERROR,""" refered to in line ");
PUT(STANDARDERROR,0,1);
PUTLINE(STANDARDERROR," is not defined.");
PAT EMPTYPATTERN;
else
PAT PATTERNTABLE(N);
COMPLETEPAT(PAT);
end if;
end if;
when LOOK =>
COMPLETEPAT(PAT.LEFT);
COMPLETEPAT(PAT.RIGHT);
for CH in SELECTIONSET'RANGE loop
PAT.SELSET(CH) := PAT.LEFT.SELSET(CH) or SPAT.RIGHT.SELSET(CH);
end loop;
when OPT =>
COMPLETEOPT(PAT);
when REP =>
COMPLETEPAT(PAT.EXPR);
PAT.SELSET := PAT.EXPR.SELSET;
when others =>
-- No other kinds of patterns should show up here.
PAT := BADPATTERN;
end case;
end if;
end COMPLETEPAT;
procedure COMPLETEPATTERNS is
-- Complete the construction of all the patterns defined.
begin
for I in 1 .. CURTABLEENTRIES loop
ROOTPATTERNNAME := PATTERNTABLE(I).NAME;
COMPLETE PAT( PATTERNTABLE(I) );
106
end loop;
end COMPLETEPATTERNS;
function CONCATENATE ( LEFT, RIGHT: in LLATTRIBUTE
return LLATTRIBUTE is
Concatenate two patterns.
- Create a concatenation node if the right term is not empty.
begin
if RIGHT = null or else RIGHT.VARIANT = BAD then
return LEFT;
elsif LEFT.VARIANT = BAD then
return RIGHT;
else
return new TREENODE'(CAT,ANONYMOUS,(others => FALSE),
FALSE,FALSE,LEFT,RIGHT);
end if;
end CONCATENATE;
function CVTASCII ( NAME: in LLATTRIBUTE
return LLATTRIBUTE is
-- Convert an ASCII character name into a character pattern.
CH: CHARACTER;
begin
if NAME.STRINGVAL(1..4) = "BEL " then
CH := ASCII.BEL;
elsif NAME.STRINGVAL(l. .3) = "BS " then
CH := ASCII.BS;
elsif NAME.STRINGVAL(1. .3) = "HT " then
CH := ASCII.HT;
elsif NAME.STRINGVAL(i. .3) = "LF " then
CH := ASCII.LF;
elsif NAME.STRINGVAL(l. .3) = "VT " then
CH :- ASCII.VT;
elsif NAME.STRINGVAL(l. .3) = "FF " then
CH := ASCII.FF;
elsif NAME.STRINGVAL(l. .3) = "CR " then
107
CH := ASCII.CR;
elsif NAME.STRINGVAL(l..4) = "ESC " then
CH := ASCII.ESC;
elsif NAME.STRINGVAL(l..4) = "DEL " then
CH := ASCII.DEL;
else
CH : ASCII.NUL;
end if;
return new TREENODE'(CHAR,ANONYMOUS,(others => FALSE),
FALSE,FALSE,CH);
end CVTASCII;
function CVTSTRING ( LIT: in LLATTRIBUTE ) return LLATTRIBUTE is
-- Convert a literal string into a pattern.
-- The string "ABC" becomes the concatenation 'A' 'B' 'C'.
LEFT, RIGHT: LLATTRIBUTE;
begin
if LIT.STRINGVAL(2) = '"' then
return EMPTYPATTERN;
else
LEFT := new TREENODE'(RNG,ANONYMOUS,(others => FALSE),FALSE,FALSE);
LEFT.SELSET(LIT.STRINGVAL(2)) := TRUE;
for I in 3 .. LLSTRINGS'LAST loop
exit when LIT.STRINGVAL(I) = '"';
RIGHT := new TREENODE'(RNG,ANONYMOUS,(others => FALSE),
FALSE,FALSE);
RIGHT.SELSET(LIT.STRINGVAL(I)) :- TRUE;
LEFT := CONCATENATE(LEFT,RIGHT);
end loop;
return LEFT;
end if;
end CVTSTRING;
procedure EMITADVANCEHDR is
-- Emit the beginning of the definition of procedure ADVANCE.
begin
108
NEWLINE;
PUTLINE(" procedure ADVANCE(EOS: out BOOLEAN;");
PUTLINE(" NEXT: out TOKEN;");
PUT_LINE(" MORE: in BOOLEAN := TRUE) is");
PUT_LINE(" begin");
PUT_LINE(" EOS := FALSE;");
PUTLINE(" loop");
PUTLINE(" SCANPATTERN;");
PUT_LINE(" case CURPATTERN is");
PUT_LINE(" when ENDOFINPUT =>");
PUT_LINE(" EOS := TRUE;");
PUTLINE(" return;");
PUT_LINE(" when ENDOF LINE => null;");
end EMITADVANCEHDR;
procedure EMITADVANCETLR is
-- Emit the end of the definition of procedure ADVANCE.
begin
PUTLINE(" end case;");
PUT_LINE(" end loop;");
PUTLINE(" end ADVANCE;");
NEWLINE;
end EMITADVANCETLR;
procedure EMITPKGDECLS is
-- Emit the declarations for the generated package body.
begin
NEWLINE;
PUTLINE(" BUFFERSIZE: constant := 100;");
PUTLINE(" subtype BUFFERINDEX is INTEGER range 1..BUFFERSIZE;");
NEWLINE;
-- Emit an enumerated type definition for the defined pattern names.
PUT_LINE(" type PATTERNID is (");
EMITTEDCHARS := 22;
for I in 1 .. CUR TABLEENTRIES loop
EMITPATTERNNAME(PATTERNTABLE(I).NAME);
109
PUT(', ');
EMITTEDCHARS EMITTEDCHARS + 1;
end loop;
if EMITTEDCHARS /= 0 then
NEWLINE;
end if;
PUTLINE(" ENDOFINPUT, ENDOFLINE, UNRECOGNIZED);");
NEW_LINE;
-- Emit the package variable declarations
PUTLINE(" CURLINENUM: NATURAL := 0;");
PUT_LINE(" CURPATTERN: PATTERNID := ENDOFLINE;");
PUTLINE(" STARTOFLINE: BOOLEAN;");
PUT_LINE(" CHARBUFFER: STRING(BUFFERINDEX);");
PUTLINE(" CURCHARNDX: BUFFERINDEX;");
PUT_LINE(" TOPCHAR NDX: BUFFERINDEX;");
NEW_LINE;
-- Emit the fixed procedure definitions
PUTLINE(" procedure SCANPATTERN; -- forward");
NEW_LINE;
PUTLINE(" function CURRENTSYMBOL return STRING is");
PUTLINE(" begin");
PUTLINE(" return CHARBUFFER(I..(CURCHARNDX-I));");
PUTLINE(" end;");
end EMITPKGDECLS;
procedure EMITPATTERNNAME ( FILE: in FILETYPE; NAME: in LLSTRINGS ) is
-- Write the name of a pattern to a specified file.
begin
for I in LLSTRINGS'RANGE loop
exit when NAME(I) = ' f;
PUT( FILE, NAME(I) );
end loop;
end EMITPATTERNNAME;
procedure EMITPATTERNNAME ( NAME: in LLSTRINGS ) is
-- Write the name of a pattern to the standard output file.
110
begin
for I in LLSTRINGS'RANGE loop
exit when NAME(I) = '
PUT( NAME(I) );
EMITTEDCHARS :- EMITTEDCHARS + 1;
end loop;
if EMITTEDCHARS > OUTPUTLINELIMIT then
NEWLINE;
EMITTEDCHARS := 0;
end if;
end EMITPATTERNNAME;
procedure EMITSCANPROC is
-- Generate the pattern-matching code for referenced patterns.
procedure EMITSELECT ( SELSET: in SELECTION-SET );
-- Generate an expression for the selection set SELSET.
procedure EMITPATTERNMATCH ( PAT: in out LLATTRIBUTE;
NAME: in LLSTRINGS;
SHOWNAME: in BGOLEAN;
PARENTNULLABLE: in BOOLEAN;
LOOKAHEAD: BOOLEAN ) is
-- Generate pattern-matching code from a normal-form pattern.
procedure EMITALTCASES( INITPAT: in out LLATTRIBUTE;
PARENTNULLABLE: in BOOLEAN ) is
-- Generate "when" clauses for an alternation pattern.
PAT: LLATTRIBUTE := INITPAT;
begin
while PAT.VARIANT = ALT loop
-- emit successive alternatives
PUT(" when ");
EMITSELECT(PAT.LEFT.SELSET);
PUTLINE(" ->");
111
if NAME = ANONYMOUS then
EMITPATTERNMATCH( PAT.LEFT, PAT.LEFT.NAME, SHOWNAME,
PARENTNULLABLE, LOOK-AHEAD );
else
EMITPATTERNMATCH( PAT.LEFT, NAME, SHOWNAME,
PARENTNULLABLE, LOOK-AHEAD );
end if;
INITPAT.COULDFAIL := INITPAT.COULDFAIL or
PAT.LEFT.COULDFAIL;
PAT := PAT.RIGHT;
end loop;
-- emit the last alternative
PUT(" when ");
EMITSELECT(PAT.SELSET);
PUTLINE(" =>");
if NAME = ANONYMOUS then
EMITPATTERNMATCH( PAT, PAT.NAME, SHOWNAME,
PARENTNULLABLE, LOOKAHEAD );
else
EMITPATTERNMATCH( PAT, NAME, SHOWNAME,
PARENTNULLABLE, LOOKAHEAD );
end if;
INITPAT.COULDFAIL :- INITPAT.COULDFAIL or PAT.COULDFAIL;
end EMITALTCASES;
procedure EMITCONCATRIGHT( SHOWNAME: in BOOLEAN ) is
-- Emit the right-hand part of a concatenation pattern.
procedure EMITCONCATCASES is
begin
case PAT.RIGHT.VARIANT is
when ALT I LOOK I OPT I REP =)
EMITPATTERNMATCH( PAT.RIGHT, ANONYMOUS, SHOWNAME,
PARENTNULLABLE, LOOKAHEAD ); •
when CAT I RNG =>
PUTLINE("case CURRENTCHAR is");
*PUT(" when ");
112
EMITSELECT(PAT.RIGHT.SEL SET);
PUTLINE(" =>");
if NAME = ANONYMOUS then
EMITPATTERNMATCH( PAT.RIGHT, PAT.RIGHT.NAME,
SHOWNAME, PARENTNULLABLE,
LOOKAHEAD and PAT.RIGHT.VARIANT = CAT );
else
EMITPATTERNMATCH( PAT.RIGHT, NAME, SHOWNAME,
PARENTNULLABLE,
LOOKAHEAD and PAT.RIGHT.VARIANT = CAT );
end if;
PUTLINE(" when others =>");
if PARENTNULLABLE then
PUTLINE(" CURCHARNDX := FALLBACKNDX;");
PUTLINE(" LOOKAHEADFAILED := TRUE;");
else
PUT LINE(" CURPATTERN := UNRECOGNIZED;");
end if;
PUTLINE(rend case;");
PAT.RIGHT.COULDFAIL := TRUE;
when others =>
-- No other kinds of patterns should show up here.
PUTLINE("CUR_PATTERN UNRECOGNIZED;");
PAT.RIGHT.COULDFAIL TRUE;
end case;
end EMITCONCATCASES;
begin -- EMITCONCATRIGHT(SHOW NAME)
-- Emit the right-hand part of a concatenation pattern.
if PAT.LEFT.COULDFAIL then
if PARENTNULLABLE then
PUTLINE("if not LOOKAHEADFAILED then");
else
PUTLINE("if CURPATTERN /= UNRECOGNIZED then");
end if;
EMITCONCATCASES;
PUTLINE("end if;");
else
113
EMITCONCATCASES;
end if;
end EMITCONCATRIGHT;
begin -- EMITPATTERNMATCH(PAT,NAME,SHOWNAME,
-- PARENTNULLABLE,LOOK_AHEAD)
-- Generate pattern-matching code from a normal-form pattern.
case PAT.VARIANT is
when ALT =>
PUTLINE("case CURRENTCHAR is");
EMITALTCASES( PAT, PAT.NULLABLE or PARENTNULLABLE )
PUT(" when others =>");
if PAT.NULLABLE then
PUT_LINE(" null;");
PUT_LINE("end case;");
if SHOWNAME and NAME /= ANONYMOUS then
PUT("CURPATTERN : )
EMITPATTERNNAME(NA4E);
PUTLINE(";");
end if;
else
NEW-LINE;
if PARENTNULLABLE then
PUTLINE(" CURCHARMDX :- FALLBACK MDX;");
PUTLINE(" LOOKAHEADFAILED :=TRUE;");
else
PUTLINE(" CURPATTERN :=UNRECOGNIZED;");
end if;
PAT.COULD FAIL := TRUE;
PUT_LINE("end case;");
end if;
when CAT )
if PAT.RIGHT.NULLABLE then
if NAM-E = ANONYMOUS thcn
EMITPATTERNMATCH( PAT. LEFT, PAT. LEFT. NAME, SHOWNAME,
PARENTMULLABLE, LOOKAHEAD);
EMITCONCATRIGHT(SHOWNAME);
else
114
EMITPATTERNMATCH(PAT. LEFT,NAME, SHOWNAME,
PARENTNULLABLE,I-OOK AHEAD);
EMITCONCATRIGHT(FALSE);
end if;
else
EMITPATTERNMATCH (PAT .LEFT, ANONYMOUS, FALSE,
PARENTNULLABLE,PARENTNULLABLE);
EMITCONCAT_RIGHT(SHOW NAME);
end if;
PAT.COULDFAIL :=PAT.LEFT.COULDFAIL or PAT.RIGHT.COULDFAIL;
when LOOK =>
PUT_LINE("case CURRENTCHAR is"t);
PUT(" when");
EMITSELECT(PAT.LEFT.SEL_SET);
PUT_LINE(" =>)");
PUT_LINE("1 LOOKAHEADNDX :=CUR CHARNDX;");
EMITPATTERN MATCH (PAT. LEFT, PAT. LEFT. NAME,
SHOW_-NANE,TRUE,FALSE);
PUT_LINE(" when others =>");
PUT_LINE(- LOOKAHEADFAILED :=TRUE;");
PUT_LINE("end case;");
PUT_LINE("1CURCHARNDX :=LOOKAHEADNDX;");
PUT_LINE("lif LOOKAHEADFAILED then");
PUT_-LINE(" case CURRENTCHAR is");
PUTLINE(" when");
EMITSELECT(PAT.LEFT. SEL_SET);
PUT_LINE(" =>");
EMITPATTERN MATCH(PAT. RIGHT,PAT. RIGHT. NAME,
SHOWNAME,FALSE,FALSE);
PUT_LINE(" when others=")
if PAT.RIGHT.NULLABLE then
PUTLINE(" null;");
else
PUT-LINE(" CURPATTERN :=UNRECOGNIZED;");
end if;
PUT_LINE(" end case;");
PUT_LINE("end if;");
PAT.COULDFAIL :=PAT.RIGHT.COULD_FAIL;
115
when OPT =>
PUT LINE( "case CURRENTCHAR is");
PUT(" when "); a
EMITSELECT(PAT.SELSET);
PUT LINE(" =>");
if NAME = ANONYMOUS then
EMITPATTERNMATCH(PAT.EXPR,PAT.NAME,
SHOWNAME,TRUE,LOOKAHEAD);
else
EMITPATTERNMATCH(PAT.EXPR,NAME,
SHOWNAME,TRUE,LOOKAHEAD);
end if;
PUTLINE(" when others => null;");
PUTLINE("end case;");
PAT.COULDFAIL := PAT.EXPR.COULDFAIL;
when REP =>
PUTLINE("loop");
PUTLINE(" case CURRENTCHAR is");
if PAT.EXPR.VARIANT = ALT then
EMITALTCASES(PAT.EXPR,TRUE);
else
PUT(" when ");
EMITSELECT(PAT.SEL_SET);
PUTLINE(" =>");
EMITPATTERNMATCH(PAT.EXPR,NAME,SHOWNAME,TRUE,LOOKAHEAD);
end if;
PUTLINE(" when others => exit;");
PUTJ.INE(" end case;");
if PAT.EXPR.COULDFAIL then
PUTLINE("exit when LOOKAHEAD_FAILED;");
PAT.COULDFAIL := TRUE;
end if;
PUTLINE("end loop;");
when RNG =>
if LOOKAHEAD then
PUT_LINE("LOOKAHEAD;");
else
PUTLINE("CHAR ADVANCE;");
116
• nm m m0
end if;
if SHOWNAME and NAME /= ANONYMOUS then
PUT("CURPATTERN := ");
EMITPATTERNNAME (NAME);
PUTLINE(";");
end if;
when others =>
-- No other kinds of patterns should show up here.
if PARENTNULLABLE then
PUTLINE("LOOKAHEADFAILED := TRUE;");
end if;
PUTLINE("CURPATTERN := UNRECOGNIZED;");
PAT.COULDFAIL := TRUE;
end case;
end EMITPATTERNMATCH;
procedure EMITSELECT ( SELSET: in SELECTIONSET ) is
-- Generate an expression for the selection set SELSET.
STATE: INTEGER range 0..3 := 0;
procedure EMITCHAR( CH: in CHARACTER ) is
begin
case CH is
when ASCII.BEL =)
PUT("ASCII.BEL");
when ASCII.BS =)
PUT("ASCII.BS");
when ASCII.HT =>
PUT("ASCII.HT");
when ASCII.LF =>
PUT("ASCII.LF");
when ASCII.VT =>
PUT("ASCII.VT");
when ASCII.FF =>
PUT("ASCII.FF");
when ASCII.CR =>
PUT("ASCII.CR");
when ASCII.ESC =>
117
PUT("ASCII.ESC");
when ' '..' '
PUT('''); PUT(CH); PUT(''');
when ASCII.DEL =>
PUT("ASCII.DEL");
when others =>
PUT("ASCII.NUL");
end case;
end;
begin
for CH in SELECTIONSET'RANGE loop
case STATE is
when 0 =>
-- Initial state, looking for selection set characters.
if SELSET(CH) then
EMITCHAR(CH);
STATE := 1;
end if;
when 1 =>
-- Have produced first character, is it a range?
if SELSET(CH) thenPUT("..");
STATE 2;
else
STATE 3; 0end if;
when 2 =>
-- Have produced first char and "..", looking for end char.
if not SEL SET(CH) then
EMITCHAR(CHARACTER'PRED(CH));
STATE := 3;
end if;
when 3 =>
-- Have produced one or more alt. terms, looking for more.
if SELSET(CH) then
PUT(" I ");
EMITCHAR(CH);
118
Ii
STATE 1;
end if;
end case;
end loop;
-- Check for a possible loose end.
if STATE = 2 then
EMITCHAR(SELSET'LAST);
end if;
end EMITSELECT;
begin -- EMITSCANPROC
NEWLINE;
-- Generate the pattern-matching code for referenced patterns.
PUT-LINE(" procedure SCANPATTERN is");
NEWLINE;
PUTLINE(" CURRENTCHAR: CHARACTER;");
PUT-LINE(" ENDOFINPUTSTREAM: BOOLEAN;");
PUTLINE(" LOOKAHEADFAILED: BOOLEAN : FALSE;");
PUTLINE(" FALLBACKNDX: BUFFER-INDEX 1;");
PUTLINE(" LOOKAHEADNDX: BUFFERINDEX;");
NEWLINE;
PUTLINE(" procedure CHARADVANCE is");
PUTLINE(" begin");
PUTLINE(" CURCHARNDX : CURCHARNDX+I;");
PUTLINE(" FALLBACKNDX : CURCHARNDX;");
PUT LINE(" if CURCHARNDX <= TOPCHARNDX then");
-- take the next character from the buffer
PUTLINE(" CURRENTCHAR := CHARBUFFER(CUR CHARNDX);");
PUTLINE(" else");
-- fetch the next character and put it in the buffer
PUTLINE(" GETCHARACTER(ENDOFINPUTSTREAM,CURRENTCHAR);");
PUT-LINE(" if ENDOFINPUTSTREAM then");
PUT LINE(" CURRENTCHAR := ASCII.etx;");
PUTLINE(" end if;");
PUTLINE(" CHARBUFFER(CUR CHAR NDX) :- CURRENTCHAR;");
PUTLINE(" TOPCHARNDX := CURCHARNDX;");
PUT-LINE(" end if;");
PUTLINE(" end;");
119
NEWLINE;
PUTLINE(" procedure LOOKAHEAD is");
PUT_LINE(" begin");
PUTLINE(" CURCHARNDX := CURCHARNDX+I;");
PUTLINE(" if CURCHARNDX <= TOPCHARNDX then");
-- take the next character from the buffer
PUTLINE(" CURRENTCHAR := CHARBUFFER(CURCHAR NDX);");
PUTLINE(" else");
-- fetch the next character and put it in the buffer
PUTLINE(" GETCHARACTER(END OFINPUTSTREAM,CURRENTCHAR);");
PUTLINE(" if ENDOFINPUTSTREAM then");
PUT_LINE(" CURRENTCHAR := ASCII.etx;");
PUTLINE(" end if;");
PUTLINE(" CHARBUFFER(CUR CHAR NDX) := CURRENTCHAR;");
PUTLINE(" TOPCHARNDX := CURCHARNDX;");
PUTLINE(" end if;");
PUTLINE(" end;");
NEWLINE;
PUTLINE(" begin");
if LEXICON = null then
PUTLINE(STANDARDERROR,
"*** No patterns were referenced for code generation.");
PUTLINE(" CURPATTERN := UNRECOGNIZED;");
else
PUT LINE(" STARTOFLINE := CURPATTERN = END OFLINE;");
PUTLINE(" if STARTOFLINE then");
PUTLINE(" CURLINENUM CURLINENUM+I;");
PUTLINE(" TOPCHARNDX := 1;");
PUTLINE(" GETCHARACTER(ENDOFINPUTSTREAM,CHARBUFFER(1));");
PUTLINE(" if ENDOFINPUTSTREAM then");
PUTLINE(" CHARBUFFER(l) := ASCII.etx;");
PUTLINE(" end if;");
PUTLINE(" else");
-- shift the buffer contents forward
PUT-LINE(" TOPCHARNDX :- TOPCHARNDX-CURCHARNDX+1;");
PUT LINE(" for N in 1..TOPCHARNDX loop");
PUTLINE(" CHARBUFFER(N) :- CHARBUFFER(N+CURCHARNDX-1);");
PUTLINE(" end loop;");
120
PUT LINE(" end if;");
PUT LINE(" CURCHARNDX 1;)
PUT LINE(" CURRENTCHAR CHARBUFFER(l);");
PUT LINE(" case CURRENTCHAR is");
PUT LINE(" when ASCII.etx =>");
PUTLINE(" CURPATTERN :=END_-OFINPUT;");
PUT -LINE(" when ASCII.lf. .ASCII.cr =>");
PUT LINE(" CURPATTERN :=END_-OF_-LINE;");
while LEXICON.VARIANT = ALT loop
-- Emit successive alternatives.
PUT(" when ");
* EMITSELECT(LEXICON.LEFT.SELSET);
PUTLINE(" =>");
it LEXICON.NAME = ANONYMOUS then
EMITPATTERN -MATCH( LEXICON. LEFT, LEXICON. LEFT. NAME,
TRUE, FALSE, FALSE);
* else
EMITPATTERN MATCH (LEXICON. LEFT, LEXICON. NAME,
TRUE,FALSE, FALSE);
LEXICON.RIGHT.NAME :=LEXICON.NAME;
end if;
LEXICON :=LEXICON.RIGHT;
end loop;
-- Emit the last alternative.
PUT(-' when ");
* EMITSELECT(LEXICON.SELSET);
PUTLINE(" =>");
EMITPATTERNMATCH(LEXICON, LEXICON. NAME, TRUE, FALSE, FALSE);
PUTLINE(" when others =>");
PUTLINE(" CHARADVANCE;");
PUTLINE(" CURPATTERN :=UNRECOGNIZED;");
PUTLINE(" end case;");
end if;
PUT-LINE(" end;");
* NEWLINE;
end EMITSCANPROC;
121
procedure EMITTOKEN( TOKEN: in LLATTRIBUTE ) is
-- Emit an identifier or literal token value.
begin
case TOKEN.VARIANT is
when CHAR =)
PUT(''' );
PUT(TOKEN.CHARVAL);P U T ( ' ' ) ; O
EMITTEDCHARS := EMITTEDCHARS + 3;
when IDENT I-LIT =>
if TOKEN.VARIANT = IDENT then
-- Precede it with a blank.
PUT(' ');
EMITTEDCHARS := EMITTEDCHARS + 1;
elsif TOKEN.STRINGVAL(l) ';' then
PUTLINE(";");
EMITTEDCHARS := 0;
return;
end if;
for I in LLSTRINGS'RANGE loop
exit when TOKEN.STRINGVAL(I) = ' ';
PUT( TOKEN.STRINGVAL(I) );
EMITTEDCHARS := EMITTEDCHARS + 1;
end loop;
when STR =>
PUT(.",); •
EMITTEDCHARS := EMITTEDCHARS + 1;
for I in 2 .. LLSTRINGS'LAST loop
PUT(TOKEN.STRINGVAL(I));
EMITTEDCHARS := EMITTEDCHARS + 1;
exit when TOKEN.STRINGVAL(I) =
end loop;
when others =>
-- No other kinds of patterns should show up here.
null; 0end case;
if EMITTEDCHARS > OUTPUTLINELIMIT then
NEWLINE;
122
EMITTEDCHARS := 0;
end if;
end EMITTOKEN;
procedure INCLUDEPATTERN( PATID: in LLATTRIBUTE ) is
-- Include a referenced pattern for code generation.
-- Global variable LEXICON holds the complete definition of all
-- patterns encountered so far in the actions part of a lexical
-- analyzer specification.
N: INTEGER range 0 .. PATTERNTABLESIZE;
begin
N := LOOKUPPATTERN(PATID);
if N = 0 then
PUT(STANDARDERROR,"*** Pattern """);
EMITPATTERNNAME(STANDARDERROR,PATID.STRINGVAL);
PUT(STANDARDERROR,""" called for in line ");
PUT(STANDARDERROR,0,1);
PUTLINE(STANDARDERROR," is not defined.");
else
LEXICON := ALTERNATE(PATTERNTABLE(N),LEXICON);
COMPLETE_PAT(LEXICON);
end if;
end INCLUDEPATTERN;
function LOOKAHEAD ( PAT: in LLATTRIBUTE ) return LLATTRIBUTE is
-- Create a look-ahead pattern.
begin
return new TREENODE'(LOOK,ANONYMOUS,(others => FALSE),
FALSE,FALSE,PAT,BADPATTERN);
end LOOKAHEAD;
function LOOKUPPATTERN ( PATID: in LLATTRIBUTE ) return INTEGER is
-- Return the index of the named pattern in the pattern table.
begir
for I in 1 .. CURTABLEENTRIES loop
123
if PAT ID.STRINGVAL = PATTERN-TABLE(I).NAME then
-- You found it.
return I;
end if;
end loop;
-- If the name is not in the table then
return 0;
end LOOKUPPATTERN;
function OPTION ( PAT: in LLATTRIBUTE ) return LLATTRIBUTE is
-- Form an optional pattern.
begin
case PAT.VARIANT is
when ALT I CAT I IDENT I RNG =>
return new TREENODE'(OPT,ANONYMOUS,(others => FALSE),
TRUE,FALSE,PAT); 0when OPT I REP =>
-- Just copy the original node.
return new TREENODE'(PAT.all);
when others =>
-- No other kinds of patterns should show up here.
return BADPATTERN;
end case;
end OPTION;
function REPEAT ( PAT: in LLATTRIBUTE ) return LLATTRIBUTE is
-- Form a repetition pattern.
begin
case PAT.VARIANT is
when ALT I CAT I IDENT I RNG =>
return new TREENODE'(REP,ANONYMOUS,(others => FALSE),
TRUE,FALSE,PAT);
when OPT I REP =>
-- Simplify the repeated pattern.
return new TREENODE'(REP,ANONYMOUS,(others => FALSE),
TRUE,FALSE,PAT.EXPR);
124
when others =>
-- No other kinds of patterns should show up here.
return BADPATTERN;
end case;
end REPEAT;
procedure STOREPATTERN ( PATID, PAT: in LLATTRIBUTE ) is
-- Store a pattern definition in the pattern table.
-- Patterns are stored in alphabetical order by name.
begin
if CURTABLEENTRIES = PATTERNTABLESIZE then
-- I guess I didn't make the table big enough.
raise PATTERNTABLEFULL;
end if;
for I in 1 .. CURTABLEENTRIES loop
if PAT ID.STRINGVAL < PATTERNTABLE(I).NAME then
-- Insert the name here.
for K in reverse I .. CURTABLEENTRIES loop
PATTERNTABLE(K+l) := PATTERNTABLE(K);
end loop;
PATTERNTABLE(I) := PAT;
PATTERNTABLE(I).NAME := PATID.STRINGVAL;
CURTABLEENTRIES := CURTABLEENTRIES + 1;
return;
elsif PATID.STRINGVAL = PATTERN TABLE(I).NAME then
-- Combine this definition with the previous one(s).
PATTERNTABLE(I) := ALTERNATE( PAT, PATTERNTABLE(I) );
PATTERNTABLE(I).NAME := PAT ID.STRINGVAL;
return;
end if;
end loop;
CURTABLEENTRIES := CURTABLEENTRIES + 1;
PATTERNTABLE(CUR TABLEENTRIES) := PAT;
PATTERNTABLE(CUR TABLEENTRIES).NAME := PATID.STRINGVAL;
end STOREPATTERN;
end LLSUPPORT;
125
126
APPENDIX C
Lexical Analyzer Test Data
This appendix contains listings of programs and data used to test the lexical
analyzer generator. The first listing is the test driver program used to exercise generated
code. Following this are three test cases. Test #1 is a simple test of the code-generation
templates. Test #2 exercises the generator's handling of look-ahead and conversion of
patterns into canonical form. Test #3 is a test of the lexical analyzer used to replace the
generator's bootstrap analyzer. The analyzer specification and generated code for Test
#3 are given in Appendix A. Input for Test #3 was its own specification from Appendix
A.
Contents:
Lexical analyzer test driver program .......................................... 128
Specification for test #1 .......................................................... 131
Code generated for test #1 ...................................................... 133Input and output data for test #1 ............................................... 138Specification for test #2 .......................................................... 140
Code generated for test #2 ...................................................... 142Input and output data for test #2 ............................................... 149
Output data for test #3 ........................................................... 151
127
with INTEGERTEXTIO, TEXT_IO;
procedure TESTDRIVER is
-- This procedure is a simple test program for exercising code
-- produced by the Lexical Analyzer Generator.
use INTEGERTEXT_IO, TEXT_IO;
type TOKENTYPE is
(ALT, CAT, CHAR, DOTS, IDENT, KEY, LIT, NOTMINE,
NUMBER, OPERATOR, OPT, REP, RNG, SPECIAL, STR);
subtype SHORTSTRING is STRING(l..12);
type TOKEN is
record
KIND: TOKENTYPE;
PRINTVALUE: SHORTSTRING;
LINENUMbER: INTEGER;
end record;
EOS: BOOLEAN;
TOK: TOKEN;
procedure GETCHARACTER( EOS: out BOOLEAN;
NEXT: out CHARACTER;
MORE: in BOOLEAN := TRUE ) is
-- Produce input characters for the lexical analyzer.
begin
if ENDOFFILE(STANDARDINPUT) then
EOS := TRUE;
elsif ENDOF LINE(STANDARDINPUT) then
SKIPLINE(STANDARDINPUT);
EOS : FALSE;
NEXT : ASCII.CR;
else
EOS : FALSE;
128
i
GET(STANDARDINPUT,NEXT);
end if;
end;
function MAKETOKEN(KIND: TOKENTYPE; SYMBOL: STRING; LINENUMBER: NATURAL
return TOKEN is
-- construct a token value from input lexical information
function CVTSTRING( STR: in STRING ) return SHORTSTRING is
-- Convert an arbitrary-length string to a fixed length string.
RESULT: SHORTSTRING;
begin
for I in SHORTSTRING'RANGE loop
if I <= STR'LAST then
RESULT(I) : STR(I);
else
RESULT(I) : '
end if;
end loop;
return RESULT;
end;
begin
return TOKEN'(KIND, CVT_STRING(SYMBOL), LINENUMBER);
end;
package TOKENSTREAM is
procedure ADVANCE(EOS: out BOOLEAN;
NEXT: out TOKEN;
MORE: in BOOLEAN := TRUE);
end TOKENSTREAM;
package body TOKENSTREAM is separate;
129
begin
loop
TOKENSTREAM.ADVANCE(EOS,TOK);
exit when LOS;
PUT(TOR. PRINTVALUE);
PUT(" ");
PUT(TOK. LINENUMBER);
PUT(" ") ;
case TOK.KIND is
when ALT => PUT("Alternation");
when CAT => PUT("Concatenation");
when CHAR => PUT("Character");
when DOTS => PUT("Dots");
when IDENT => PUT("Identifier");
when KEY => PUT("Keyword");
when LIT => PUT("Literalu);
when NOTMINE => PUT("Unrecognized");0
when NUMBER => PUT("Nuinber");
when OPERATOR => PUT("Operator");
when OPT => PUT("Option");
when REP => PUT("Repetition");
when RNG => PUT("Range");
when SPECIAL => PUTQ'Special Symbol");
when STR => PUT("String");
end case;
NEWLINE;0
end loop;
end TESTDRIVER;
130
separate ( TESTDRIVER
lexicon TOKENSTREAM is
-- The following patterns test the lexical analyzer
-- generator's basic code generation templates.
patterns
Alternate ::= Letter I Digit
Concat '&' Digit ;
Digit : 0'..91
Letter 'A'..'Z' Ia'..'z'
Option '' [ Digit
Repetition '*' [ Digit ]
actions
when Alternate =>
NEXT := MAKETOKEN(ALT,CURRENT_SYMBOL,CURLINENUM);
return;
when Concat =>
NEXT := MAKETOKEN(CAT,CURRENTSYMBOL,CURLINENUM);
return;
when Option =>
NEXT := MAKETOKEN(OPT,CURRENTSYMBOL,CURLINENUM);
return;
when Repetition =>
NEXT := MAKETOKEN(REP,CURRENT_SYMBOL,CURLINENUM);
return;
131
when others =>
NEXT := MAKE TOKEN(NOTMINE,CURRENTSYMBOL,CURLINENUM);
return;
end TOKENSTREAM;
132
separate ( TESTDRIVER
package body TOKENSTREAM is
BUFFERSIZE: constant := 100;
subtype BUFFERINDEX is INTEGER range 1..BUFFERSIZE;
type PATTERNID is
(Alternate,Concat,Digit,Letter,Option,Repetition,
ENDOFINPUT, ENDOFLINE, UNRECOGNIZED);
CURLINENUM: NATURAL := 0;
CURPATTERN: PATTERNID : ENDOFLINE;
STARTOFLINE: BOOLEAN;
CHARBUFFER: STRING(BUFFERINDEX);
CURCHARNDX: BUFFERINDEX;
TOPCHARNDX: BUFFERINDEX;
procedure SCANPATTERN; -- forward
function CURRENTSYMBOL return STRING is
begin
return CHARBUFFER(I..(CURCHARNDX-1));
end;
procedure ADVANCE(EOS: out BOOLEAN;
NEXT: out TOKEN;
MORE: in BOOLEAN := TRUE) is
begin
EOS := FALSE;
loop
SCANPATTERN;
case CURPATTERN is
when ENDOFINPUT =>
EOS := TRUE;
return;
when ENDOFLINE => null;
when Alternate =>
133
NEXT: = MAKETOKEN( ALT, CURRENT-SYMBOL, CURLINENUM);
return;
when Concat =>
NEXT:= MAKETOKEN( CAT, CURRENTSYMBOL, CURLINE NUM);
return;
when Option =>
NEXT:= MAKETOKEN( OPT, CURRENT-SYMBOL, CURLINENUM);
return; •
when Repetition =>
NEXT:= MAKETOKEN( REP, CURRENTSYMBOL, CURLINENUM);
return;
when others =>
NEXT:= MAKETOKEN( NOT-MINE, CURRENT_SYMBOL, CURLINENUM);
return;
end case;
end loop;
end ADVANCE; 0
procedure SCANPATTERN is
CURRENTCHAR: CHARACTER;
ENDOFINPUTSTREAM: BOOLEAN;
LOOKAHEADFAILED: BOOLEAN : FALSE;
FALLBACKNDX: BUFFERINDEX : 1;
LOOKAHEADNDX: BUFFERINDEX; •
procedure CHARADVANCE is
begin
CURCHARNDX := CURCHARNDX+i; •
FALLBACKNDX : CURCHARNDX;
if CURCHARNDX <= TOPCHARNDX then
CURRENTCHAR := CHARBUFFER(CURCHARNDX);
else
GETCHARACTER(ENDOFINPUTSTREAM,CURRENTCHAR); 0
if ENDOFINPUTSTREAM then
CURRENTCHAR := ASCII.etx;
end if;
134
CHARBUFFER(CUR CHAR - DX) :- CURRENTCHAR;
TOPCHARMDX :=CURCHARMDX;
* end if;
end;
procedure LOOKAHEAD is
begin
0CURCHARMDX :=CURCHAR_-NDX+1;
if CURCHARMDX <=TOPCHARMDX then
CURRENTCHAR =CHARBUFFER (CUR CHAR MDX) ;
elise
* GETCHARACTER (ENDOFINPUTSTREAM, CURRENT CHAR);
if ENDOFINPUTSTREAM then
CURRENT-CHAR :=ASCII.etx;
end if;
CHARBUFFER(CUR CHAR M DX) :=CURRENTCHAR;
* TOPCHARMDX :=CURCHARMDX;
end if;
end;
begin
STARTOFLINE :=CURPATTERN = ENDOFLINE;
if STARTOFLINE then
CURLINENUN CURLINENU-Ii;
TOPCHARMDX 1;
* GETCHARACTER(ENDOFINPUTSTREAM,CHARBUFFER());
if ENDOFINPUTSTREAM then
CHARBUFFER(l) :=ASCII.etx;
end if;
else
TOPCHARMDX :-TOPCHARMDX-CURCHARNDX+1;
for N in 1. .TOPCHARM DX loop
CHARBUFFER(M) :- CHARBUFFER(N+CURCHARNDXl1);
end loop;
* end if;
CURCHARMDX :1;
CURRENTCHAR :CHARBUFFER(1);
case CURRENTCHAR is
135
when ASCII.etx =>
CURPATTERN := ENDOFINPUT;
when ASCII.lf..ASCII.cr =>
CURPATTERN := ENDOFLINE;
when '*' => -- Code for repetition pattern
CHARADVANCE;
CURPATTERN := Repetition;
loop 0case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when others => exit;
end case;
end loop;
when '' -- Code for option pattern
CHARADVANCE;
CURPATTERN := Option; 0case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when others -> null;
end case;
when '&' => -- Code for concatenation pattern
CHARADVANCE;
case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
CURPATTERN Concat;
when others =>
CURPATTERN UNRECOGNIZED;
end case;
when 'A'..'Z' I 'a'..'z' > -- Code for alternation and range
CHARADVANCE;
CURPATTERN := Alternate;
when '0'..'9' => •
CHARADVANCE;
CURPATTERN := Alternate;
when others -)
136
CHARADVANCE;
CURPATTERN UNRECOGNIZED;
end case;
end;
end TOKENSTREAM;
0
137
S
INPUT:
A&1*
1&2 3*4567890
abc&l&2&3 4 5**6*7890/&/ /*/
OUTPUT:
A 1 Alternation
&l 1 Concatenation
1 Option
1 Repetition
1 2 Alternation
&2 2 Concatenation
3 2 Option
4567890 2 Repetition
a 3 Alternation
b 3 Alternation
C 3 Alternation
61 3 Concatenation
&2 3 Concatenation
&3 3 Concatenation
4 3 Option
3 Option
5 3 Option
• 3 Repetition
*6 3 Repetition
*7890 3 Repetition
/ 4 Unrecognized
& 4 Unrecognized
138
/ 4 Unrecognized
4 Option
/ 4 Unrecognized
* 4 Repetition
/ 4 Unrecognized
139
separate ( TEST DRIVER
lexicon TOKENSTREAM is
-- The following specification tests the lexical analyzer
-- generator's handling of look-ahead, conversion of patterns
-- into canonical form, and pattern simplifications.
patterns
Digit '0'..'9 '
Dots '.' I
Identifier ='A'..'Z' [ Digit
Keyword "FOR" I "GO" I "IF" I "LET" I "NEXT"
Number Digit f [''] Digit I ;
actions
when Dots =>
NEXT := MAKE TOKEN(DOTS,CURRENTSYMBOL,CURLINENUM);
return;
when Identifier =>
NEXT := MAKE TOKEN(IDENT,CURRENT_SYMBOL,CURLINENUM);
return;
when Keyword =>
NEXT := MAKE TOKEN(KEY,CURRENTSYMBOL,CURLINE NUM);
return;
when Number => 0NEXT := MAKE TOKEN(NUMBER,CURRENTSYMBOL,CURLINENUM);
return;
1
m mmmmmmmmmmmmmmmmm lli N140
when others =>
NEXT :=MAKE TOKEN(NOTMINE,CURRENT_SYMBOL,CURLINEMUM);
* return;
end TOKENSTREAM;
14
separate ( TESTDRIVER )
package body TOKENSTREAM is
BUFFERSIZE: constant = 00;
subtype BUFFERINDEX is INTEGER range 1..BUFFERSIZE;
type PATTERNID is •
(Digit,Dots,Identifier,Keyword,Number,
ENDOFINPUT, ENDOFLINE, UNRECOGNIZED);
CURLINENUM: NATURAL := 0;
CURPATTERN: PATTERNID := ENDOFLINE;
STARTOFLINE: BOOLEAN;
CHARBUFFER: STRING(BUFFERINDEX);
CURCHARNDX: BUFFERINDEX;
TOPCHARNDX: BUFFERINDEX;
procedure SCANPATTERN; -- forward
function CURRENTSYMBOL return STRING is
begin
return CHARBUFFER(1..(CURCHARNDX-1));
end;
procedure ADVANCE(EOS: out BOOLEAN;
NEXT: out TOKEN;
MORE: in BOOLEAN := TRUE) is
begin
EOS : FALSE;
loop
SCANPATTERN;
case CURPATTERN is
when ENDOFINPUT =>
EOS := TRUE; 9return;
when ENDOFLINE => null;
when Dots =)
142
! ! ! • m I m a D
NEXT:= MAKETOKEN( DOTS, CURRENTSYMBOL, CURLINENUM);
return;
when Identifier =)
NEXT:= MAKETOKEN( IDENT, CURRENTSYMBOL, CURLINE NUM);
return;
when Keyword =>
NEXT:= MAKETOKEN( KEY, CURRENTSYMBOL, CURLINENUM);
return;
when Number =)
NEXT:= MAKETOKEN( NUMBER, CURRENTSYMBOL, CURLINENUM);
return;
when others =>
NEXT:= MAKETOKEN( NOT-MINE, CURRENT_SYMBOL, CURLINENUM);
return;
end case;
end loop;
end ADVANCE;
procedure SCANPATTERN is
CURRENTCHAR: CHARACTER;
ENDOFINPUTSTREAM: BOOLEAN;
LOOKAHEADFAILED: BOOLEAN := FALSE;
FALLBACKNDX: BUFFERINDEX : 1;
LOOKAHEADNDX: BUFFERINDEX;
procedure CHARADVANCE is
begin
CURCHARNDX : CURCHARNDX+I;
FALLBACKNDX : CURCHARNDX;
if CURCHARNDX <= TOPCHARNDX then
CURRENTCHAR := CHARBUFFER(CURCHARNDX);
else
GETCHARACTER(END OFINPUTSTREAM,CURRENTCHAR);
if ENDOFINPUTSTREAM then
CURRENTCHAR := ASCII.etx;
end if;
143
CHARBUFFER(CUR -CHARM DX) :=CURRENTCHAR;
TOPCHARMDX :=CURCHARMDX;
end if;
end;
procedure LOOKAHEAD is
begin
CURCHARMDX :=CURCHARNDX+1;
if CURCHARNDX <= TOPCHARNDX then
CURRENTCHAR :=CHARBUFFER(CUR CHARMDX);
else
GETCHARACTER (ENDOFINPUTSTREAM, CURRENT CHAR) ;
if END OFINPUTSTREAM then
CURRENTCHAR :=ASCII.etx;
end if;
CHARBUFFER(CUR CHARM DX) :=CURRENTCHAR;
TOPCHARMDX -=CURCHARMDX;
end if;
end;
begin
STARTOFLINE :=CURPATTERN = ENDOFLINE;
if STARTOFLIME then
CURLINENUM CURLINEW4+l;
TOPCHARMDX 1;
GETCHARACTER(ENDOFINPUTSTREAM,CHARBUFFER());
if ENDOFINPUTSTREAM then
CHARBUFFER(1) :=ASCII.etx;
end if;
else
TOPCHARMDX :=TOPCHARMDX-CURCHARNDX+l;
for N in 1. .TOPCHARMDX loop
CHARBUFFER(N) :=CHARBUFFER(N+CURCHAR NDX-l);
end loop;
end if;
CURCHARMDX 1;
CURRENTCHAR CHARBUFFER(l);
case CURRENTCHAR is
144
when ASCII.etx =>
CURPATTERN ENDOFINPUT;
when ASCII.If..ASCII.cr =>
CURPATTERN := ENDOFLINE;
when '0'..'9' =)
CHARADVANCE;
CURPATTERN := Number;
loop
case CURRENTCHAR is
when '0'..'9' => -- Code for numbers
CHARADVANCE;
when '' =>
LOOKAHEAD;
case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when others =>
CURCHARNDX := FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when others => exit;
end case;
exit when LOOKAHEAD_FAILED;
end loop;
when 'A'..'E' I 'H' I 'J'..'K' ,M' I 'O..'Z' =>
CHARADVANCE;
CURPATTERN := Identifier; -- Code for identifiers
case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when others => null;
end case;
when '.' => -- Code for dots
CHARADVANCE;
CURPATTERN :- Dots;
case CURRENTCHAR is
when '.' =>
CHARADVANCE;
145
when others -> null;
end case;
when 'F' =) -- Code for keyword "FOR"
CHARADVANCE;
CURPATTERN := Identifier;
case CURRENTCHAR is
when '0'..'9' =>
CHARADVANCE;
when '0' =>
LOOKAHEAD;
case CURRENTCHAR is
when 'R' =>
CHARADVANCE;
CURPATTERN = Keyword;
when others =>
CURCHARNDX FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when others => null;
end case;
when 'G' => -- Code for keyword "GO"
CHARADVANCE;
CURPATTERN := Identifier;
case CURRENTCHAR is
when '0' =>
CHARADVANCE;
CURPATTERN := Keyword;
when '0'..'9' =>
CHARADVANCE;
when others => null;
end case;
when 'I' => -- Code for keyword "IF"
CHARADVANCE;
CURPATTERN := Identifier
case CURRENT_CHAR is 0when 'F' ->
CHARADVANCE;
CURPATTERN :- Keyword;
146
when '0'..' 9 1 =>
CHARADVANCE;
when others => null;
end case;
when 'N' => -- Code for keyword "NEXT"
CHARADVANCE;
CURPATTERN := Identifier;
case CURRENTCHAR is
when 'E' =>
LOOKAHEAD;
case CURRENTCHAR is
when 'X' =>
LOOKAHEAD;
case CURRENTCHAR is
when 'T' =>
CHARADVANCE;
CURPATTERN Keyword;
when others =>
CURCHARNDX FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when others =>
CURCHARNDX - FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when '0'..'9' =>
CHARADVANCE;
when others => null;
end case;
when 'L' => -- Code for keyword "LET"
CHARADVANCE;
CURPATTERN := Identifier;
case CURRENTCHAR is
when '0'..'9' =>
0 CHARADVANCE;
when 'E' ->
LOOKAHEAD;
case CURRENTCHAR is
147
when 'T' =>
CHARADVANCE;
CURPATTERN Keyword;
when others =>
CURCHARNDX FALLBACKNDX;
LOOKAHEADFAILED := TRUE;
end case;
when others => null;
end case;
when others =>
CHARADVANCE;
CURPATTERN UNRECOGNIZED;
end case;
end;
end TOKENSTREAM;
148
n I I I I I I I
INPUT:
.AFOR1..Z8IF123
A1B2C3 ..... 123 456 789
FORGOIFLETNEXT
FO1G2I3LE4NEX5
OUTPUT:
* 1 Dots
A 1 Identifier
FOR 1 Keyword
1 1 Number
1 Dots
Z8 1 Identifier
IF 1 Keyword
123 1 Number
Al 2 Identifier
* B2 2 Identifier
C3 2 Identifier
2 Dots
2 Dots
2 Dots
123 2 Number
2 Unrecognized
456 2 Number
2 Unrecognized
* 789 2 Number
FOR 3 Keyword
GO 3 Keyword
IF 3 Keyword
149
LET 3 Keyword
NEXT 3 Keyword
F 4 Identifier
01 4 Identifier
G2 4 Identifier
13 4 Identifier
L 4 Identifier
E4 4 Identifier
N 4 Identifier
E 4 Identifier
X5 4 Identifier
150
separate 2 Identifier
2 Literal
LLCOMPILE 2 Identifier
2 Literal
lexicon 4 Identifier
LLTOKENS 4 Identifier
is 4 Identifier
patterns 9 Identifier
Graphic Char 11 Identifier
= 11 Literal
11 Literal
;11 Literal
Letter 13 Identifier
*:= 13 Literal
13 Literal
13 Literal
.. 13 Literal
13 Literal
Digit 15 Identifier
::= 15 Literal
15 Literal
; 15 Literal
Letter or Di 17 Identifier
17 Literal
Letter 17 Identifier
1 17 Literal
Digit 17 Identifier
17 Literal
CharacterLi 19 Identifier
= 19 Literal
Graphic-Char 19 Identifier
; 19 Literal
Comment 21 Identifier
= 21 Literal
-- 21 String
[ 21 Literal
Graphic-Char 21 Identifier
1 21 Literal
151
0
21 Literal
Delimiter 23 Identifier
= 23 Literal
23 Literal
23 Literal
23 Literal
23 String
23 Literal
24 Literal
24 Literal
24 Literal
24 Literal
24 String
24 Literal
25 Literal
= 25 String
25 Literal
25 Literal"W 25 String
25 Literal
25 String
25 Literal"W 25 String
26 Literal
26 Literal
26 Literal
26 String
26 Literal"W" 26 String
26 Literal
Identifier 28 Identifier
= 28 Literal
Letter 28 Identifier
28 Literal
28 Literal
J 28 Literal
Letter or Di 28 Identifier
28 Literal
152
28 Literal
Number 30 Identifier
* = 30 Literal
Digit 30 Identifier
[ 30 Literal
30 Literal
30 Literal
Digit 30 Identifier
1 30 Literal
30 Literal
SpecialSymb 32 Identifier
*:= 32 Literal
32 Literal
32 Literal
32 Literal
32 Literal
".." 32 String
33 Literal
33 String
33 Literal
33 Literal
33 String
33 Literal
33 Literal
34 Literal
I 34 Literal
34 Literal
34 Literal
StringLiter 36 Identifier
= 36 Literal
QuotedStrin 36 Identifier
[ 36 Literal
QuotedStrin 36 Identifier
) 36 Literal
; 36 Literal
QuotedStrin 38 Identifier
= 38 Literal
[ 38 Literal
153
NonQuoteCh 38 Identifibr
) 38 Literal
38 Literal
NonQuoteCh 40 identifier
= 40 Literal
40 Literal
40 Literal
40 Literal
40 Literal
WhiteSpace 42 Identifier
= 42 Literal
Separator 42 Identifier
1 42 Literal
Separator 42 Identifier
1 42 Literal
42 Literal
Separator 44 Identifier 0= 44 Literal
t 44 Literal
ASCII 44 Identifier
44 Literal
HT 44 Identifier
44 Literal
actions 46 Identifier
when 48 Identifier
CharacterLi 48 Identifier •
=> 48 Literal
NEXT 49 Identifier
49 Literal
MAKETOKEN 49 Identifier
( 49 Literal
CHAR 49 Identifier
49 Literal
CURRENTSYMB 49 Identifier
P 49 Literal •
CURLINENUM 49 Identifier
49 Literal
49 Literal
154
return 50 Identifier
; 50 Literal
when 52 Identifier
Comment 52 Identifier
1 52 Literal
White-Space 52 Identifier
=> 52 Literal
null 52 Identifier
52 Literal
when 54 Identifier
Delimiter 54 Identifier
1 54 Literal
Number 54 Identifier
1 54 Literal
SpecialSymb 54 Identifier
=> 54 Literal
NEXT 55 Identifier
55 Literal
MAKETOKEN 55 Identifier
55 Literal
LIT 55 Identifier
, 55 Literal
CURRENTSYMB 55 Identifier
, 55 Literal
CURLINENUM 55 Identifier
) 55 Literal
55 Literal
return 56 Identifier
56 Literal
when 58 Identifier
Identifier 58 Identifier
=> 58 Literal
NEXT 59 Identifier
S= 59 Literal
MAKETOKEN 59 Identifier
59 Literal
IDENT 59 Identifier
59 Literal
155
CURRENTSYMB 59 Identifier
, 59 Literal
CURLINENUM 59 Identifier
) 59 Literal
59 Literal
return 60 Identifier
60 Literal
when 62 Identifier
StringLiter 62 Identifier
=> 62 Literal
NEXT 63 Identifier
63 Literal
MAKETOKEN 63 Identifier
63 Literal
STR 63 Identifier
, 63 Literal
CURRENTSYMB 63 Identifier
, 63 Literal
CURLINENUM 63 Identifier
63 Literal
63 Literal
return 64 Ideatifier
; 64 Literal
when 66 Identifier
others 66 Identifier
=> 66 Literal
NEXT 67 Identifier
67 Literal
MAKETOKEN 67 Identifier
67 Literal
LIT 67 Identifier
F 67 Literal
CURRENTSYMB 67 Identifier
67 Literal
CURLINENUM 67 Identifier
67 Literal
67 Literal
return 68 Identifier
156
68 Literal
end 70 Identifier
LLTOKENS 70 Identifier
70 Literal
157
Distribution List for IDA Paper P-2108
NAME AND ADDRESS NUMBER OF COPIES
Sponsor
Dr. John F. Kramer 4Program ManagerSTARSDARPA/ISTO1400 Wilson Blvd.Arlington, VA 22209-2308
Other
Defense Technical Information Center 2Cameron StationAlexandria, VA 22314
IDA
General W.Y. Smith, HQ 1Ms. Ruth L: Greenstein, HQ 1Mr. Philip L. Major, HQ 1Dr. Robert E. Roberts, HQ 1Ms. Katydean Price, CSED 1Dr. Reginald N. Meeson, CSED 1IDA Control & Distribution Vault 2
Distribution List-1