18
Compz~r~r Lan~cuaoes. Vol. 3. pp. 13 30. Pergamon Press. 1978. Printed in Great Britain FORMAL SEMANTICS OF A SNOBOL4 SUBSET F. G. PAGAN Department of Mathematics, Statistics and Computer Science, Memorial University of Newfoundland, St. John's, Newfoundland, Canada (Received 9 August 1976; in revisedform 6 January 1977) Abstract--The Vienna Definition Language is used to formally define the semantics of a basic but useful subset of SNOBOL4. A brief, informal description of the subset is given, and its concrete syntax and abstract syntax are defined formally. The structure of an abstract interpreter for programs expressed in the subset is specified. The main part of the paper presents the detailed interpretation rules dealing with statement sequencing and execution, evaluation of expressions, pattern matching, and function invocation. Programming languages SNOBOL4 Formal definition Formal semantics Abstract syntax Abstract interpreters Vienna Definition Language INTRODUCTION THIS paper presents a formal definition of a subset of the SNOBOL4 programming language [1]. The definition is expressed in the Vienna Definition Language [2-4], which has been applied to several languages, including Algol 60 [5], PL/I I-6], and Basic 17]. The approach can hardly be described as a trivially simple one, however, and definitions using it have tended to be difficult both to construct and to understand. It is therefore felt that applying it to more programming languages with diverse structures will help in the development of improved methods. Interpreter-oriented definitional formalisms such as the Vienna Definition Language would appear to be particularly suitable for defining languages such as SNOBOL4, which have a simple syntax, exhibit very late binding time, and are usually implemented by interpreters. When stripped of its many bells and whistles, SNOBOL4 can be seen to have an unusual and, in its own way, elegant semantic structure. It is therefore hoped that this case study may also provide the reader with a deeper understanding of this language. INFORMAL DESCRIPTION AND CONCRETE SYNTAX OF THE SUBSET The subset of SNOBOL4 under consideration was originally designed for implemen- tation on a minicomputer system with limited memory. The only data types are integer, string, and pattern, and the facilities for tracing and the runtime creation of code are omitted. The binary operations are alternation ('1'), concatenation, immediate value ations are plus, minus, indirection ('$'), and assignment of cursor position ('~!') during pattern matching. Labels are required to have the same form as identifiers. The pattern component ARB is included, and the primitive functions are LEN, SPAN, BREAK, ANY, NOTANY, TAB, RTAB, POS, RPOS, LE, EQ, NE, IDENT, DIFFER, SIZE, and DEFINE. All pattern matching is carried out in the anchored fullscan mode. A function call must contain the exact number of expected arguments, and the NRETURN feature is excluded. 13

Formal semantics of a SNOBOL4 subset

Embed Size (px)

Citation preview

Compz~r~r Lan~cuaoes. Vol. 3. pp. 13 30. Pergamon Press. 1978. Printed in Great Britain

F O R M A L S E M A N T I C S O F A S N O B O L 4 S U B S E T

F. G. PAGAN Department of Mathematics, Statistics and Computer Science, Memorial University of

Newfoundland, St. John's, Newfoundland, Canada

(Received 9 August 1976; in revised form 6 January 1977)

Abstract--The Vienna Definition Language is used to formally define the semantics of a basic but useful subset of SNOBOL4. A brief, informal description of the subset is given, and its concrete syntax and abstract syntax are defined formally. The structure of an abstract interpreter for programs expressed in the subset is specified. The main part of the paper presents the detailed interpretation rules dealing with statement sequencing and execution, evaluation of expressions, pattern matching, and function invocation.

Programming languages SNOBOL4 Formal definition Formal semantics Abstract syntax Abstract interpreters Vienna Definition Language

I N T R O D U C T I O N

THIS paper presents a formal definition of a subset of the SNOBOL4 programming language [1]. The definition is expressed in the Vienna Definition Language [2-4], which has been applied to several languages, including Algol 60 [5], PL/I I-6], and Basic 17]. The approach can hardly be described as a trivially simple one, however, and definitions using it have tended to be difficult both to construct and to understand. It is therefore felt that applying it to more programming languages with diverse structures will help in the development of improved methods.

Interpreter-oriented definitional formalisms such as the Vienna Definition Language would appear to be particularly suitable for defining languages such as SNOBOL4, which have a simple syntax, exhibit very late binding time, and are usually implemented by interpreters. When stripped of its many bells and whistles, SNOBOL4 can be seen to have an unusual and, in its own way, elegant semantic structure. It is therefore hoped that this case study may also provide the reader with a deeper understanding of this language.

I N F O R M A L D E S C R I P T I O N A N D C O N C R E T E S Y N T A X O F T H E S U B S E T

The subset of SNOBOL4 under consideration was originally designed for implemen- tation on a minicomputer system with limited memory. The only data types are integer, string, and pattern, and the facilities for tracing and the runtime creation of code are omitted. The binary operations are alternation ('1'), concatenation, immediate value

ations are plus, minus, indirection ('$'), and assignment of cursor position ('~!') during pattern matching. Labels are required to have the same form as identifiers. The pattern component ARB is included, and the primitive functions are LEN, SPAN, BREAK, ANY, NOTANY, TAB, RTAB, POS, RPOS, LE, EQ, NE, IDENT, DIFFER, SIZE, and DEFINE. All pattern matching is carried out in the anchored fullscan mode. A function call must contain the exact number of expected arguments, and the NRETURN feature is excluded.

13

14 F .G . PAGAN

Although a large number of facilities are thus omitted, the subset constitutes a useful language which would seem to form the essential "core" of SNOBOL4. Most of the omitted features either are of limited importance or can be readily simulated using other features at the expense of efficiency. Conditional value assignment can usually be replaced by immediate value assignment, and the name operator can be avoided by using literal strings. Unanchored pattern matching can be achieved by inserting ARB at the beginning of the pattern. The predicates GE, GT, and LT can be simulated using LE, EQ, and NE. Arrays, tables, and programmer-defined data types can be simulated by liberal use of the indirection operator. Unevaluated expressions can gener- ally be eliminated by reprogramming.

The following production rules in modified BNF notation specify the concrete syntax of the language; square brackets denote optionality, and braces denote zero or more repetitions. The preterminal symbols 'name', 'number', and 'lstring' correspond to identi- fiers, integer constants, and literal strings, respectively; 'blank' denotes a sequence of one or more spaces, and 'eos' in the end-of-statement taken (semicolon or end of record). The rules are. numbered on the left and cross-referenced on the right.

(C1) <program> ::= ~<st>} <end> (C2) <end> ::= END [blank name] cos (C3) (st> ::= (assign> I <match> I (repl) I <degen) (C4) <assign> ::= [name] <subj> <equal> [<obj>] [(gofield>] eos (C5) <match> ::= [name] <subj> <pat> [<gofield>] eos (C6) <repl> := [name] <subj> (pat> <equal> [<obj>] [<gofield)] eos

(C7) (c8) (C9) (clo) (Cll) (C12) (C13)

(C14) (C~5) (C16) (C17) (C18) (C19) (C20) (C21) (C22) (C23) (C24)

<degen> ::= [name] [<subj>] [<gofield>] eos <equal> : '= blank = <subj> ::= blank <elem> <pat> ::= blank <expr) (obj> ::= blank <expr> <gofield> ::= blank :[blank] <gopart> (gopart> ::= <goto>

E S (goto> [blank] IF (goto>] I F <goto> [blank] [S <goto>]

<goto> ::= ( [blank] <expr> [blank] ) <expr> ::= <exprl> Iblank I blank <expr>} (exprl> ::= (expr2> Iblank (expr2>l (expr2> ::= (term) [blank (addop) blank <term\] (addop> ":= +q - <term> ::= (term1> Iblank / blank (terml>] (term1> ::= <term2> ~,blank * blank <term2)} <term2) ::= (elem> ',blank S blank (elem> ~, (elem) ::= [<unary)] (element) <unary> ::= S I + I - l~a (element> ":= name l number I lstring

I name ( [blank] [(expr>] [blank] ~, [blank] [<expr>] [blank]} )

I( [blank] <expr) [blank] )

(C2. 3)

(C4, 5. 6. 7j (C8, 9, 11.12)

(C9, 10, 12)

(C8, 9, 10, 11.12) {C9, 12)

{C22) (C15) (C15) {C13) (C14) (C14) (C14) (C15) (C16} (C17)

(C18, 19)

(C20) (C21) (C22)

(C23. 24)

(C15) (C15}

The remainder of the paper assumes a knowledge of the Vienna Definition Language. The reader may find it advisable to read (or reread) the description by Wegner [4] or the one by Lee [2] before undertaking a detailed reading of the following sections.

A B S T R A C T S Y N T A X

A program in its abstract form contains all and only the information necessary for its interpretation. No formal correspondence between abstract and concrete programs

Formal semantics of a SNOBOL4 subset 15

is given here, since the relationship between the c~tlerete and abstract syntax rules is fairly obvious. The abstract syntax rules are definitions of predicates characterizing sets of Vienna objects. Symbols enclosed by single quotes and words written in upper- case letters represent elementary objects.

(A1} is-program = is-st-list (A2) is-st = (<s-label : is-id * is-fl),

(s-st-core : is-asmt v is-match ~ is-repl ~ is-expr), <s-goto : is-gotofield v is-f~))

(A3) is-asmt = ((s-subject : is-proper-expr), (s-object : is-expr))

(A4) is-match = (<s-subject • is-proper-expr), <s-pattern : is-proper-expr))

(A5) is-repl = ((s-subject : is-proper-expr), <s-pattern : is-proper-expr), (s-object : is-expr))

(A6) is-expr = is-proper-expr * is-f~

(A2} (A15}

(A3.4. 5. 6) (A13}

[AT} [A6} (A7t

• ( A 7 )

(A7} [A7) (A61 (A7)

Rules (A1)-(A6) correspond to (C1)--(Cll), and optionality is handled with the aid of the predicate "is-ft. Some approximate correspondences between predicates of the abstract syntax and nonterminals of the concrete syntax are as follows:

is-program <program) is-st ( s t ) is-asmt (assign) is-match <match) is-repl <repl) is-proper-expr (expr ) is-expr [<expr)]

A program in its abstract form (i.e. a Vienna object satisfying 'is-program'l includes only those components that have semantic significance; elements that play a strictly syntactic role. such as END. 'cos', ' = ' , and blanks, are excluded.

(A7)

(A8)

(A9)

(A10)

{All) {A12)

is-proper-expr = is-unary-expr ~ is-binary-expr ~ (A9, 10/ is-id ~ is-number ~ is-lstring v is-call (A15. 16. 17. 8)

is-call = ((s-fn-name : is-id), (A15) <s-args : is-expr-list)) (A6)

is-unary-expr = (<s-operand : is-proper-expr), (A7) <s-operator : is-unary-op)) (A 1 11

is-binary-expr = (<s-operand-1 : is-proper-expr), (A7) (s-operand-2 : is-proper-expr), (A7) < s-operator : is-binary-opt) (A 121

is-unary-op = is:S' * is-" + ' ~ i s - ' - ' v is-'fa' is-binary-op = is-'S' v is-'*' ~ is-'/" * is - '+ ' * i s - ' - ' v is-CAT ~ is-'l"

The precedence of operators is specified in the concrete syntax, and there is no need to incorporate it in the abstract syntax as well. The predicate 'is-CAT" is satisfied (only) by the concatenation operator, which has no explicit concrete symbol. Additional ap- proximate correspondences are as follows:

is-call fourth alternative in (C24) is-unary-expr <elem) is-binary-expr (expr) , ( exp r l ) , (expr2) , ( te rm) ,

( te rm 17, ( t e rm2) is-unary-op (una ry )

16 F.G. PAGAN

(A13)

(A14)

is-gotofield = ((s-uncond : is-dest), (s-succ : is-dest), (s-fail : is-dest))

is-dest = is-expr ~ is-END - is-RETURN * is-FRETURN

(A14) (AI4) (A14)

(A6)

If a goto field has an 's-uncond" component (containing the destination of an uncondi- tional jump), the concrete syntax rules (C12)-(C14) guarantee that it will be the only component.

(A15) is-id = ((s-type :is-ID), (s-addr : is-address)l (M6)

(A16) is-number = ((s-type : i s -NUM), (s-addr :is-address)) (M6)

(A17) is-lstring = ((s-type :is-LSTR), (s-addr :is-address)) (M6)

i

The names, constants, and literal strings in a concrete program correspond to pos- itions in the storage component of the abstract machine described in the following section.

THE ABSTRACT MACHINE

The structure of a state ~ of the abstract machine is described by the predicates given in this section.

(M1) is-~ = ((s-c : is-c), (s-stg :is-item-list), (M3) (s-input :is-record-list), (M2) (s-output : is-record-list), (M2) (s-run-stack : is-intg-list), (s-pattern-stack : is-ps-entry-list), (MI0) (s-function-table :

(I (s-desc(np) : is-fn-desc I is-address(np)})) (M6, 11) (s-fail-flag : is-OFF v is-ON))

The 's-c' component is a standard Vienna control tree. The run stack is used for saving various values during the invocation of programmer-defined functions. The 's-fail- flag' component indicates whether the statement currently being interpreted has failed.

(M2) is-record = is-char-list

Only the standard input/output files are allowed for. If the INPUT and OUTPUT functions were included in the subset, there would be an additional state component to keep track of the I/O associations.

(M3) is-item = is-integer ~ is-string v is-pattern (M4, 5, 7) (M4) is-integer = ((s-val : is-intg)) (M5) is-string = ((s-val : is-char-list),

((s-ref : is-address)) (M6) (M6) is-address = is-intg

The 's-val" component of a string item is the string itself, while the 's-ref' component is the position number ("address") of the value (which may be of any type) referred to by the string. An address of zero corresponds to the null string.

(M7) is-pattern = is-component-list (M8) (M8) is-component = ((s-routine : is-MSTR ~ is-MLEN ~ is-MBRK

is-MSPN ~ is-MANY * is-MNUL v is-MIV1

Formal semantics of a SNOBOL4 subsel 17

v is-miv2 ~ is-M1 * is-MAT ~ is-MPOS (M9t v is-MTAB v is-MRPOS ~ is-MRTAB is-MNTY),

(s-subsequent • is-intg), (s-alternate : is-intg:,, (s-arg : is-intg)

(M9) is-miv2 = ((s-type : is-MIV2), (s-offset : is-intg))

Each component of a pattern specifies a routine to be executed during pattern match- ing. Many of the routines correspond to primitive functions:

Routine Function Routine Function

MLEN LEN MTAB TAB MBRK BREAK MRPOS RPOS MSPN SPAN MRTAB RTAB MANY ANY MNTY NOTANY MPOS POS

MAT corresponds to the '~d' operator, and MSTR corresponds to a string-valued expres- sion in the concrete pattern. M N U L and M1, which only occur as components of ARB, match substrings containing zero and one characters, respectively. The portion of a pattern associated with an operation of immediate value assignment is preceded by an MIV1 component and followed by an MIV2 component; the latter contains the offset (component number within the pattern) of the corresponding MIV1 com- ponent. The "s-subsequent" part of a pattern component gives the offset of the next component to be considered if the current matching routine succeeds, and the value of the 's-alternate" offset is the result of an alternation operation; in either case, an offset of zero indicates the absence of a link to another component. With the exception of an MIV1 component, the "s-arg' part of a pattern component is the address of an argument of the matching routine.

(M10) is-ps-entry = ((s-cursor :is-intg), (s-alternate : is-intg))

The information in the pattern stack permits backt rackingto occur during pattern matching.

(Ml l ) is-fn-desc = ((s-entry-name : is-address), (M6) (s-params "is-address-list), (M6t (s-locals • is-address-list)) (M6)

The function table records the relevant information for each programmer-defined func- tion which can currently be invoked.

The undefined predicate "is-intg" is satisfied by all integers, although only the 's-val' components of integer items can ever be negative. The undefined predicate 'is-char' is satisfied by a finite set of elementary objects, including the ten digit characters and the 26 upper-case letters.

In the initial state, the 's-stg" component contains an integer item corresponding to each distinct constant in the program and a string item corresponding to each distinct name and/or literal string. It also contains the string item ARB and the following stan- dard pattern item to which it refers:

(/~o((S-routine : MNUL) . (s-subsequent : 2), (s-alternate : 0), (s-arg : 0)), #o((S-routine : MNUL) , (s-subsequent : 0), (s-alternate : 3), (s-arg : 0)), po((s-routine : M I ) , (s-subsequent :25, (s-alternate :0) , (s-arg :0) ) )

The 's-input" component initially contains all the data records for the program. The 's-c' component initially contains the instruction

e x e c - s t (i)

c . J ~ I" iI

18 F .G . PAGAN

where T is the number of the frs t statement to be executed: 'i" is one unless the END line in the concrete program specifies a different starting point. All the other components are initially empty.

During interpretation, the 's-stg" component will continually grow and accumulate an increasing proportion of inaccessible items. The problem of garbage collection, how- ever, is of implementational rather than linguistic significance and is not considered in this paper. Interpretation is complete when the control tree becomes empty.

I N T E R P R E T A T I O N O F P R O G R A M S

This section contains the instruction and function definitions which specify the seman- tics of abstract programs. The current machine state ~ and the abstract program T are implicit parameters of many of the instructions. Following Lee [2], the search func- tion ~ forms the set of all objects which satisfy a given predicate, providing a useful alternative to the ~ facility. The interpretation of the instruction error and the value of the function 'error' are urLdefined, and these correspond to abnormal termination of the program. The following four instructions and functions are used ubiquitously and are not cross-referenced:

(I1) pass (val) = PASS : val (I2) null = PASS : f~ (I3) item(ptr) = elem(ptr) ° s-stg(¢) (I4) substr(str,first,no) =

#o(~(elem(i - first + l):elem(i)(str)) I first ~ i ~ first + no - 11)

Statement sequencing and execution

(I5) exec-st(st-no) = st-no > length(T) --* s-c : t) T --. process-goto(s-goto ° elem(st-no)(T),st-no); (I7)

exec-cort~s-st -core : elem(st-no) (T)); (I 10) reset-fail-flag (16)

The first line of (I5) terminates interpretation if control has flowed past the last state- ment in the program.

(I6) reset-fail-flag = s-fail-flag : O F F (17) process-goto(field.st -no) =

is-fl(field) --~ exec-st(st-no + 1) (15) not-Ws-uncond(field) --. jump(s-uncond(field)) (18) not-Ws-succ(field) ^ is-OFFCs-fail-flag(~) - .

jump(s-succ(field)) (18) not-fl :s-faii(fieid) ^ is-OWs-fail-flag(~) ---,

jump(s-fail(field)) (18) T---, exec-st(st-no + 1) (15)

Here the first and last cases correspond to default sequential execution of statements.

(18) jump(dest) = is-proper-expr(dest)

exec-st(n): (I5) n : find-label(lab-ptr); (19)

iab-ptr : eval-softly(dest); (118) reset-fail-flag (16)

is-END(dest) ---, s-c : f~ is-RETURN(dest) --.

stack(1 ), (166) reset-fail-flag (16)

is-FRETURN(dest) ---, stack(0), (I66) reset-fail-flag (I6)

Formal semantics of a SNOBOL4 subset 19

When the dest inat ion is R E T U R N or F R E T U R N . there should be a re turn instruction (I69) higher in the control tree (see (I64)).

(I9) find-label(lab-ptr) = is-ON ~s-fail-flag(~) --, e r r o r

is- ~ ] °(Tx)(lab-ptr = s-addr:s-label°elem(x)(T)) ~ error T ~ PASS : (tx)(lab-ptr = s-addr%-labeFelem(x)(T))

This instruction passes the number of the s tatement with the label s tored in the locat ion with address ' lab-ptr ' . An error occurs if there is no s ta tement with that label or if failure has occurred during evaluat ion of the goto field.

(I10) exec-eore(core) = is-expr(core) ~ eval-strongly(core) is-asmt(core)

assign(sp, obp); obp : eval-strongly(s-object(core));

sp : eval-softly(s-subject(core)) is-match(core)

mateh(svpp,pp); pp :convert-to-pat(p);

p : eval--strongly(s-pattern(core)); svpp : convert-to-str(svp);

svp : eval.-strongly(s-subject(core)) is-repl(core)

assign(sp, a); a : concatenate(obp.b): • obp : evai-strongly(s-object(core));

b : make-str(d); d : pass-l(svp.c):

c : mateh(svp.pp): svp : convert-to-str(e):

e : p a s s - r e f ( s p ) ;

pp : convert-to-pat(p): , p : eval-strongly(s-pattern(core));

sp : eval-softly(s-subject(core)) ( I l l ) Imss-l(p,i) =

PASS : substr(s-val:item(p).i.length(item(p)) - i + 1)) (I12) pass-ref(p) - PASS :s-reffi tem(p)

(120)

(113) (120) 018)

(I48) (132) (120) (I33) 020)

(I13) (I41) (I20) (I39) (Ili) (I48) (I33) (I12) (132) (120) (118)

If s ta tement failure occurs for any reason, all remaining instructions for the execution of the s tatement core must be deleted and the goto field processed. Thus many of the following instruction definitions begin with the line

(113)

(I14)

is-ON: s-fail-flag(~)---, null

assign(subj-pt r,obj-ptr) = is-ON°s-fail-flag(~) ---, null s-val°item(subj-ptr) = ( O , U . T , P , U , T ) --.

s-output : s-output(~)c~record(obj-ptr) s-stg :/z(s-stg(¢); (s-ref(item(su bj-ptr)): ob j -p t r ) )

T ~ s-stg :/z(s-stg(¢): (s-ref(item(subj-ptr)) : ob j -p t r ) ) record(ptr) =

ptr = 0 ~ ( is-string°item(ptr) - - (s-val 'f i tem(ptr)) is-integer°item(ptr) .--. (unpack~s-val°i tem(ptr)) is-pattern°item(ptr) ---, error

(114)

(I15)

20 F.G. PAGAN

015)

(I16)

(I17)

unpack(int) = int = 0 ---* ( ' 0 ' ) T ~ (int < 0 ~ ( ' - ' ) , T ---. ( ) )c~unpack-l ( ( int < 0 ~ - int. T ---* int)) (116)

The funct ion ' u n p a c k ' yields the charac te r list co r respond ing to a given integer.

unpack- l ( in t ) = int = 0 --~ ( )

T ~ unpack- l ( in t + 10)n( t rans-2( in t - (int + 10) x 10)) trans-2(int) --*

(int = 0 --~ '0', int = 1 -- , '1', int = 2 -~, '2', int = 3 -- , '3', int = 4 ~ '4', int = 5 ~ '5". int = 6 - - , '6', int = 7 - * '7". int = 8 ---, '8', int = 9 ---, '9')

(117)

Evaluation of expressions Express ions can be evaluated "softly", yielding a variable, o r "s t rongly" , yielding any

type of value. An expression is eva lua ted softly if and only if it is the subject of an ass ignment or rep lacement s ta tement , an immed ia t e c o m p o n e n t of a go to field, the o p e r a n d of an ' @ ' opera to r , or the right o p e r a n d of a b inary '$' opera tor . Expression evaluat ion m a y involve type convers ions and pa t te rn construct ion. The pa ramete r s and re turned values of mos t of the fol lowing ins t ruct ions are addresses ra ther than the i tems themselves.

(I18)

(I19)

(I20)

(I21)

eval-softly(exp) = is-ON°s-fail-flag(~) ---* null is-id(exp) - - , pass-var(s-addr(exp)) is-unary-expr(exp) ^ i s - '$ '%-operator(exp)

pass-vat(a); a : convert-to-str(b);

b : evai-strongly(s-operand(exp)) T - - , e r r o r

pass-var(ptr) = pt r = 0 ---, e r r o r

T ---, PASS : p t r eval-strongly(exp) =

i s -ON °s-fail-flag(~) ---, null is-f~(exp)--, PASS : 0 is-id(exp) -- ,

pass(s-ref°item°s-addr(exp)); input(s-addr(exp))

i s -number(exp) v is-lstring(exp) ---, PASS : s-addr(exp) is-unary-expr(exp) ---,

eval-unary(a, s-operator(exp)) ; a : eval-opd-l(exp)

is-binary-expr(exp) eval-binary(a.b.s-operator(exp)) ;

b : eval-opd-2(exp): a : evai-s t rongly(s-operand- 1 (exp))

is-call(exp) ---, invoke(s-addr ° s-fn-name(exp).arglist);

arglist : eval-args(s-args(exp)) eval-opd-l(exp) =

i s - ' ~ '~s-operator(exp) ---* eval-soft ly(s-operand(exp)) T -- , eval-strongly(s-operand(exp))

(I19)

(119) 033) (120)

(I23)

(I25) (I21)

(I27) (I22) (I20)

(I64) (I62)

(]]8) (I20)

Formal semantics of a SNOBOL4 subset 21

(122)

(I23)

eval-opd-2(exp) = is-ON°s-fail-flag(c ") ---, null is- '$ '°s-operator(exp) ---. eval-softly(s-operand-2(exp))

T --- evul-strongly(s-operand-2(exp)) input(addr) =

s-val°item(addr) :# ( I . N . P . U . T ) ---, null i s - ( )°s- input(~) --, s-fail-flag : O N T --. assign(addr.b);

advanee- input - t i l e ; b : make-str(head°s-input(~))

(I18) (I20)

(I13) (I24) (I39)

Again, only s tandard I /O is allowed. An a t tempt to read past the end of the input file results in s ta tement failure.

(I24) (I25)

advance-input-tile = s-input : ta i l°s- input( i ) evai-unary(opd-ptr .opr) =

is-ON°s-fail-flag(¢) -- , null is-'$'(opr) ^ opd-pt r = 0 ---. error is-'$'(opr) - , pass(s-ref°item(a));

a : convert-to-str(opd-ptr) is-' + ' (opr ) --* convert-to-int(opd-ptr) is-' - ' ( o p r ) - , make- int(b) ;

b : pass-2(a); a : couvert-to-int(opd-ptr)

is- '~: '(opr) --- make-pat -comp(MAT.opd-pt r )

(I33) (I31) (I38) (I26) (131) (140)

The indirection ope ra to r cannot be applied to the null string.

(I26) (I27)

(I28)

(I29) (130)

pass-2(p) = P A S S : - s-val°item(p) eval-binary(p,q,opr) =

is-ON °s-fail-flag(c.') --- null is-'$'(opr)

coneatenate(a.b); b : make-pat-comp(r.q):

r : pass-3(a); a : eoneatenate(c.p);

c : make-pat -eomlSMIV 1.0) is-'*'(opr) v is-'/ '(opr) v is- '+ '(opr) ~ i s - ' - ' ( op r ) --,

arith(a.b,opr); b : convert - to-Jut(q);

a : eonver t - to - in t (p ) is-CAT(opr) --~ eoncatenate(p,q) is-' ['(opr) ---. alternate(c,d);

d : pass-item(b); b :convert-to-pat(q);

c : pass-item(a); a : convert-to-pat(p)

pass-3(p) = PASS : t~o((s-type : MIV2) , (s-offset : length ° i tem(p)))

pass-item(ptr) = PASS : i tem(ptr) arith(p,q,opr) =

is-'*'(opr)---* make-int(s-val°item(p) x s-val°item(q) is-'/ '(opr) ~ muke-int(s-val°item(p) + s-val°item(q)). is-' + ' (opr ) ~ make-int(s-val°item(p) + s-val'-item(q)) i s - ' - ' ( o p r ) ~ make-int(s-val°item(p) - s-val°item(q))

(I41) 040) (I28) 041) (I40)

f130) (I31) (131) (141) (144) (129) (I32) (129) (132)

(I38) (I38) (138) (I38)

22 F.G. PAGAN

(I31)

(I32)

(I33)

convert- to- int(ptr) = i s - O N = s-fail-flag(¢ ) ---, null p t r = 0 - . make- in t (0) (I381 i s - in teger : i t em(pt r ) - -* P A S S : p t r i s -pa t t em: i t em(p t r ) - - , e r r o r

i s - s t r ing : i tem(pt r ) --* make-int(pack°s-valCi tem(ptr) ) (I34. 38) eonver t - to-pat (pt r ) =

i s - O N ~s-fail-flag(~) - - , null i s -pa t t e rn° i t em(p t r ) - - , P A S S : p t r T - * m a k e - l m t - e o m p ( M g T R . a ) ; (I40)

a : eonvert - to-s t r (ptr) 11331 eonver t - to-s t r (pt r ) =

is-ON=s-fail-f lag(~) ---* null p t r = 0 * i s - s t r i n g ° i t e m ( p t r ) - . P A S S : p t r i s -pa t te rn° i tem(pt r ) ---* e r r o r

i s - integer° i tem(ptr) ---* make-s t r (unpack=s-vaVi tem(pt r ) ) (115. 39)

P a t t e r n values c a n n o t be c o n v e r t e d to the o t h e r types. T h e func t ion "pack" yields the in teger c o r r e s p o n d i n g to a list of digit cha rac t e r s :

(I34) p a c k ( s t r ) = pack- l~reverse(s t r ) (I35) r eve r se ( l i s t )=

i s - ( ) ( l i s t ) ~ list T ---* reverse° ta i l ( l i s t )n(head( l i s t ) )

(I36) p a c k - l ( s t r ) = i s - ( ) ( s t r ) ---* 0 T---* t r a n s - l ° h e a d ( s t r ) + 10 × pack- l° ta i l ( s t r ) 0371

(I371 t r ans - l ( cha r ) = (char = '0 ' ---, 0, cha r = '1 ' ~ 1, cha r = '2" ---* 2. c h a r = '3 ' ---* 3, cha r = "4' ---* 4, cha r = '5 ' ---* 5, cha r = '6 ' ----, 6. c h a r = "7" ---. 7, cha r = '8 ' ---* 8, c h a r = "9' ---* 9, T ---* e r ror )

(138) make- in t (va l l = i s - O N s-fail-flag(~) ---, null is- ~ ~, °(rx)(is-integer°item(x) ^ val = s-val~item(x))---,

P A S S : length :s -s tg(¢) + 1 s - s t g : s-stg(~) c~ ( l to( (s-val : val))'~

T ~ PASS:( tx) ( i s - in teger° i tem(x) ^ val = s-val°item(x)) (I39) make-s t r lva l ) =

i s - O N : s-fail-flag(~) ---* null length(val) = 0 ---* P A S S : 0 is-', ~, q rx) ( i s - s t r ingqtem(x) ^ val = s-val°item(x))---.

P A S S : length~s-stg(~) + 1 s-stg : s-stg(~)c~Qao((s-ref : 0 ) , ( s -va l : v a l ) ) )

T ~ P A S S : 0x)( is-s t r ing°i tem(x) A val = s-val°item(x))

T h e a v o i d a n c e of dupl ica te cop ies o f s tr ings is essential for the co r rec t func t ion ing of the ind i rec t ion ope ra to r .

(140) make - lmt -eoml~rou t ine , arg) = i s - O N : s-fail-flag(~) ---. null T - * P A S S : length°s-stg(~) + 1

s-stg : s - s t g ( ~ ) n ( # o ( ( s - r o u t i n e : r o u t i n e ) , ( s - s u b s e q u e n t : 0 ) . ( s -a l t e rna te : 0 ) , ( s - a r g : a r g ) ) )

(I41) eoneatenate(p , q) = i s - O N :s-fail-flag(~) ---, null p = 0 ~ P A S S : q q = 0 - - - * P A S S : p i s -pa t te rn q tem(p)

( I35.36)

Formal semantics of a SNOBOL4 subset 23

(I42)

concat-0atterns(item(p), c); c : I~ss-item(b);

b : convert-tO-l~t(q) is-pattern °item(q) ---,

eoncat-lmtterns(c, item(q)); c : lmss-item(a);

a : convert-tO-lint(a) T ---* mak e - s t r ( c ) ;

c : pass-4(a, b); b : convert - to -s tr (q)

a : eonvert - to -s tr (p) ~ss.4(p, q) =

PASS : s-val°item(p)ns-val°item(q)

(143) (I29) (I32)

(I43) (129) (I32) (I391 042) (I33) (I33t

In the concatenation or alternation of pattern operands, both operands are copied and all the non-zero offsets in the second operand are incremented (function 'mod-2') by the number of components in the first operand. For concatenation, all zero sub- sequent offsets in the first operand are set (function "mod-l') equal to the number of the first component in the second operand. For alternation, the alternate links in the first operand, beginning with the first component, are followed until a zero alternate offset is reached (function 'mod-3'); this offset is then set equal to the number of the first component in the second operand.

(I43) coneat-l~tterns(pat-1, pat-2) = PASS : length°s-stg(c ") + 1 s-stg : s-stg(~)n(mod-l(pat-1, length(pat-l) + l )n

mod-2(pat-2, length(pat-1 ))) (I45, 46) (144) alternate(pat-l, pat-2) =

is-ON°s-fail-flag(¢) ~ null T ----, PASS : length°s-stg(¢) + 1

s-stg : s-stg(~)n(mod-3(pat-1), length(pat-I) + 1)n mod-2(pat-2, length(pat- 1 ))) (I46, 47)

(I45) mod-l(pat, offset) = length(pat) = 1 ---*

(/ao((s-routine : s-routine°head(pat)). (s-subsequent : (s-subsequent:head(pat) = 0---* offset,

T ---* s-subsequent°headtpat))) (s-alternate : s-alternate°head(pat)). (s-arg : s-arg~head(pat)) ))

T---* mod-l((head(pat)), offset)nmod-lltailtpat), offset) (I46) mod-2(pat, bump) =

length(pat) = I ---* (/ao((s-routine : s-routine°head(pat)),

(s-subsequent : Is-subsequent°head(pat) :~ 0----, s-subsequent°head(pat) + bump, T ----, 0)),

(s-alternate : (s-alternate~head(pat) :~ 0----, s-alternate°head(pat) + bump, T ---* 0)),

(s-arg : s-arg°head(pat)) )) T ~ mod-2((head(pat)),bump)nmod-21tail(pat), bump)

(I47) mod-3(pat, offset) = s-alternate°head(pat) = 0 ---,

((/to((s-routine : s-routine°head(pat)), (s-subsequent : s-subsequent°head(pat)), (s-alternate : offset), (s-arg : s-arg°head(pat)) ))ntail(pat)

T ---* (head(pat))c~mod-3(tail(pat), offset)

24 F .G. PAGAN

Pattern matchin9 The instruction match passes the final cursor position for use in a replacement oper-

ation. (Since all matching is in the anchored mode. the position of the first character in the matched substring is always one: see (II0).) When an individual component of the pattern fails to match, exec-rnutine-I and exec-routine pass a negative value: other- wise, they pass the updated cursor position.

(I48) match(sp, pp) = is-ON°s-fail-flag(~) -* null T --, match-l(s-valqtem(sp), pp. 1.0): 1150)

set-ps (I491 (I49) set-ps = s-pattern-stack : ( ) (150) match-l(subj, pp.comp-no, cur) =

next-ste~b, s-subsequenVelem(comp-no)qtem(pp), subj. pp); (I52) b : exee-routine(subj, pp, comp-no, cur): (I56)

push-ps(cur, s-alternate°elem(comp-no)(item(pp))) (I51 ) (I51) push-ps(cur, alt) =

alt = 0 --* null T --. s-pattern-stack : Q~o((S-cursor : cur),

(s-alternate : a l t ) ) ) ns-pattern-stack(~) (I52) next-step(code, subs. subj, pp) =

code/> 0 ^ subs = 0- - , PASS :code code/> 0--* match-l(subj, pp. subs. code) (I50) is-()°s-pattern-stack(~)---, s-fail-flag : O N T --* match-l(subj, pp. alt. cur); (I50)

alt : pass-alt(a). (I53) cur :pass-cur(a): (I54)

a : ~O-ps(s-pattern-stack(~)) (I55)

The four possible situations handled by (I52) are (a) successful completion. (b) con- tinuation to the subsequent component. (c) unsuccessful completion, and (d) back- tracking and trying an alternative.

(I53) pass-alt(ps-entry) = PASS :s-altemate(ps-entry) (154) pass-eur(ps-entry) = PASS : s-cursor(ps-entry) (I55) lm~ps(ps)=

PASS : head(ps) s-pattern-stack : tail(ps)

(I56) exee-rouline(subj, pp. comp-no, cur) = is-MIV 1 °s-routine"elem(comp-no)qtem(pp) ---.

PASS : cur s-stg : #(s-stg(¢); (s-argelem(comp-no)-elem(pp) :cur ) )

is-miv2°s-routine~elem(comp-no)°item(pp) ---. pass(cur);

assign(s-argCelemlcomp-no): item(pp).b); (I 13) b : make-st~c); (I39)

c : pass-5{cur.s-arg~elem(comp-no - s-offset ° (I57) s-routine~elem(comp-no)°item(pp)))

T - . exec-rnutine-l(subj.cur.s-routine~elemtcomp-no)°item(pp), s-val°item°s-arg:elem(comp-no)qtem(pp))

(I58)

Immediate value assignment is handled by saving the cursor position in the 's-arg' part of the MIVI component. When the corresponding MIV2 component is reached, this old cursor position is retrieved so that the substring scanned in the meantime may be extracted and assigned to the associated variable.

(157) pass-5(i, j) = PASS : substr(subj, j + 1. i - j )

Formal semantics of a SNOBOL4 subset 25

(I58) e x e c - r o u t i n e - l ( s u b j , cur . rou t ine , a r g l = i s - M N U L ( r o u t i n e ) ~ P A S S • cu r i s - M S T R l r o u t i n e ) A a rg = s u b s t r l s u b j . c u r + 1. l e n g t h l a r g ) ) - - ,

P A S S ' c u r + l eng th (a rg ) i s - M L E N ( r o u t i n e ) ^ a r g ~< l eng th ( sub j ) - c u r - - ,

P A S S ' c u r + a rg i s -M l ( r o u t i n e ) - - , exec - rou t ine - l { sub j , cur. M L E N . 1) i s - M P O S I r o u t i n e ) ~ cur = a rg ~ P A S S • cu r i s - M R P O S ( r o u t i n e ) ^ l eng th (sub j ) - cu r = a rg ----, P A S S : cu r i s - M T A B ( r o u t i n e ) ^ cur ~< a rg- - - , P A S S : a r g i s - M R T A B { r o u t i n e ) ^ l en g th l su b j ) - cu r >/ arg---*

P A S S • i eng th l sub j ) - a rg i s - M A N Y ( r o u t i n e ) ^ any[a rg , e l e m l c u r + 1)(subj)) - - ,

P A S S ' c u r + 1 i s - M N T Y ( r o u t i n e ) ^ ~ any(a rg , e l e m l c u r + l ) (subj) ) - - - ,

P A S S ' c u r + I i s - M S P N ( r o u t i n e ) ^ any (a rg , e l em(cu r + 1)(subj))---*

P A S S : c u r + 1 + s p a n - s c a n ( s u b s t r ( s u b j , cu r + 2, length(subj ) ) , a rg)

i s - M B R K ( r o u t i n e ) ^ b r e a k - s c a n ( s u b s t r ( s u b j , cu r + 1, length(subj) ) , a rg ) < l eng th (sub j ) - c u r - - *

P A S S " c u r + b r e a k - s c a n ( s u b s t r ( s u b j , c u r + 1, length(subj) ) , a rg)

i s - M A T ( r o u t i n e ) --~ pass [cur ) ;

a s s ign ( ( lx ) ( i s - s t r i ngq tem(x) ~ s -va l° i tem(x) = arg), b) b • make- in t I cu r~

T - , P A S S • - 1

(I59)

(I59t

(I59) 060)

(I61)

(I61)

(I13)

(I38)

N o t e t ha t a S P A N c o m p o n e n t m u s t m a t c h a t leas t o n e c h a r a c t e r a n d t h a t a B R E A K c o m p o n e n t m u s t m a t c h a s u b s t r i n g wh ich e x t e n d s up to bu t n o t i n c l u d i n g a b r e a k c h a r a c t e r .

(i59)

(I60)

(I61)

any(s t r , ch) = i s - ( ) ( s t r ) ~ F ch = head(s t r ) - - - . T T --~ any( ta i l ( s t r ) , ch)

span - scan ( s t r , a rg) = i s - ( ) ( s t r ) ---. 0 any (a rg , head{str)) ~ I + span-scan{ta i l ( s t r ) , a rg) T---~ 0

b r e a k - s c a n l s t r , a rg ) = i s - ( ) ( s t r ) ~ 0 any (a rg , head(s t r ) ) ---* 0 T ~ l + break-scan{ta i l{s t r ) , a rg )

0591

(I59)

Function invocation

T h e f o l l o w i n g i n s t r u c t i o n pa s se s a list of a d d r e s s e s of the a r g u m e n t va lues :

(I62) eva l -a rgs (exps ) = i s - ( ) ( e x p s ) - - - . P A S S : ( ) T ---* pass-6(a , b):

b : eval-args~ tailt ex ps)~; a : eva i - s t rong ly(head(exps)}

063 ) l~SS-6 = P A S S : ( a ) r ~ b

( I63j

(I20)

26 F .G . PAGAN

(I64) invoke~np, arglist) = is-ON%-fail-flag(~) --* null is-fl°s-desc(np)%-function-table(~) --,

exec-sys-fn(s-val°item(np), arglist) T --* return;

exec-st(n); n : fmd-labd(s-entry-name%-desc(np) °

s-function-table(~)); staek(np);

proeess-ioeals(s-locals°desc(np)); process-oarnms(s-params°desc(npj.arglist);

assign(np. 05; s tack(s -re f f i t em(np) )

(I74) (169)

(I5) (I9)

(I661 (165.67) (I65.68)

(I13) (166)

(I72)

u n s t a c k = is-()%-run-stack(~) --~ error T --, PASS : head:s-run-stack(~)

s - run-s tack :tail s-run-stack(~) wrao-Ul~SUC-or-fail, ret-val) =

suc-or-fail = 1 --~ PASS : ret-val T --~ s-fai l - f lag : ON

(I71)

Since names of primitive functions can be redefined, the table of programmer-defined functions is checked first. If the function is present, addresses of the following items are stacked: value of function name, values of formal parameters, values of local vari- ables, and function name (for identification upon return): the function name and local variables are then assigned the null string, and each formal parameter is assigned the value of the corresponding argument. This scheme permits recursive invocation.

(I65) desc(d) = s-desc(d)°s-function-table (~) (I66) staek(int) = s-run-stack : (int)c~s-run-stack(~) (I67) process-locals(locals) =

is-()(locals) ~ null T ---, process-locals(tail(locals));

assign(head(locals). 0); (1135 staek(s-ref~item°head(locals)) (1665

(I68) proeess-params(params, args) = length(params) ~ length(args) ~ error is-()(params) --, null T ~ process-params(tail(params), tail(args)):

assign(headiparams), head(args)): (113) stack(s-ref item~head(params)) (I66)

(I69) return = wrap-np(suc-or-fail, ret-val): (I72)

assign(np.a); (I 13) a : uns tack : (I71 )

ret-vai : pass-ref(np) (I 12) restore-var~vars); (173)

vars : pass-7(np): (I70) np : unstaek; (I71)

suc-or-fail : unstaek (171) (I70) Oass-7(np) = PASS : reverse-s-locals°desc(np)c~ (I35,655

reverse ~s-params°desc(np)

The first value taken from the stack indicates the type of return (see (I8)). The actions taken upon invocation are then undone in reverse order.

Formal semantics of a SNOBOL4 subset ":

(I73)

(I74)

r e s to re -va r s [va r s ) =

i s - ( ) ( v a r s ) - - . null T ~ restore-var~Atai l(vars)) ;

a s s ign (head(va t s ) , a) ; a : u n s t a c k

exec - sys - fn (name , argl is t ) =

n a m e = ( L . E ^ l e n g t h ( a r g l i s t ) = 2 tes t - le (a .b) :

b : eonver t - t~- in t (e lem(2) (argl is t ) ) : a : conver t - to - in t (e lem(1) (a rg l i s t ) )

n a m e = ( E , Q ) * length(argl is t ) = 2 --~ tes t -eq(a , b) :

b : eonver t - to - in t i e l em(2) (a rg l i s t ) ) ; a : eonver t - to - in t ( e l em( l )(argl is t))

n a m e = ( N . E ) A l eng th (a rg l i s t ) = 2 ---, tes t -ne(a ,b) ;

b : conver t - to - in t (e lem(2) (a rg l i s t ) ) ; a : conver t - to- in t (e lem(1 )(argl is t))

n a m e = ( I . D . E . N , T ) ^ l eng th (a rg l i s t ) = 2 tes t-eq(elem(1 )(argl is t) . e lem(2)(arg l i s t ) )

n a m e = ( D . I . F . F . E . R : ^ l eng th (a rg l i s t ) = 2 --* tes t -ne(e lem(1)(arg l i s t ) , e lem(2)(arg l i s t ) )

n a m e = ( A . N . Y : ^ l eng th (a rg l i s t ) = 1 - ~ m a k e - c o m p - s ( M A N Y , a ) ;

a : eonver t - to - s t r (e lem(1) (argl is t)) n a m e = ( L . E . N ) ^ l e n g t h ( a r g l i s t ) = 1 - - ,

m a k e - c o m p - i ( M L E N . a ) :

a : conve r t - to - in t ( e l em( l )(arglis t)) n a m e = ( P . O . S ) ^ l eng th (a rg l i s t ) = I - - ,

m a k e - c o m p - i ( M P O S . a ) :

a : conver t - to - in t ( e l em( l )(argl is t)) n a m e = ( T . A . B ) ^ l e n g t h ( a r g l i s t ) = 1--~

m a k e - e o m p - i f M T A B . a ) : a : conver t - to - in t ( e l em( l )(argl is t))

n a m e = ( S . P . A . N ) ^ l e n g t h ( a r g l i s t ) = 1 - ~ m a k e - e o m p - s ~ M S P N . a ) :

a : eonve r t - t o - s t r ( e l em( l ) ( a rg l i s t ) ) n a m e = ( R . P . O . S ) ^ l e n g t h ( a r g l i s t ) = 1--*

m a k e - c o m p - i ( M R P O S.a ):

a : conver t - to - in t ( e l em( l )(arglis t)) n a m e = ( R . T . A , B ) ^ l e n g t h ( a r g l i s t ) = 1---*

m a k e - c o m p - i ( M R T A B.a):

a : eonver t - to - in t ( e l em( l )(arglis t)) n a m e = ( B . R . E , A . K ) ^ l e n g t h ( a r g l i s t ) = 1--~

m a k e - c o m p - s ( M B R K ,a ) ;

a : eonve r t - t o - s t r ( e l em( l )(arglis t)) n a m e = ( N . O . T . A . N . Y ) ^ l eng th (a rg l i s t ) = 1 --*

m a k e - c o m p - s ( M N T Y . a ) : a : conver t - to - s t r~e lem( l )(arglis t))

n a m e - - ( S . I . Z . E ) ~ l e n g t h ( a r g l i s t ) = 1--~ m a ke- in t (b) :

b : pass - leng th(a ) : a : conve r t - t o - s t r ( e l em( l )(argl is t))

n a m e = ( D . E . F . I . N . E ) * l eng th (a rg l i s t ) = 2 define(c.b):

(113) (166)

1177) (I31) (I31)

(I78) (I31) (131)

(I79) (I31 ) (131)

(I78)

(I79)

(I80) (133)

(I81) (I31)

(181)

(I31)

(181) (131)

(I80) (I33)

(181) (131)

(IS1)

(131)

(180)

11331

(180) (133)

(138) (I75) (I33)

(I82)

28 F.G. PAGAN

c : pass-vai(a); (I76) b : convert-to-strlelem(2)(arglist), (I33)

a : convert-to-str(elem(1)(arglist)) (I33) T --* error

(175) pass-length(p) : PASS :length=s-val=item(p) (I76) pass-vai(p) : PASS : s-val~item(p) (I77) test-le(a,b)=

s-val°item(a) ~< s-val°item(b)--, PASS :0 T-- , s - fa i l - f l a g : ON

(I78) test-eq(a,b)= a = b --* PASS : 0 T ---, s - fa i l - f l a g : ON

(I79) test-ne(a.b)= a ~= b---, PASS :0 T ---, s - fa i l - f l ag : ON

(I80) make-comp-s(routine, arg) = arg = 0---, error T --~ make-pat-comp(routine, arg) (140)

(I81) make-comp-i(routine, arg) = s-val°item(arg) < 0 ~ error T ---, make-pat-comp(routine, arg) (I40)

An argument of SPAN. BREAK. ANY, or NOTANY cannot be the null string, and an argument of LEN. POS. TAB. or RTAB cannot be negative. In SNOBOL4. these checks are deferred until the time of pattern matching. The absence of unevaluated expressions here permits them to be performed at the time of pattern construction.

(182) define(prototype.entry)= insert-desc(np.entry,params.locals); (192)

locals : breakup(c): (191) c : pass-8(a,b.prototype); (I83)

params : breakup(d): (I91) d : pass-9~a.b,prototype); (I84)

np :check-and-find(e); (187) e : pass-10(a.prototype); (I85)

b : pass-I I (a,prototype); (I86) a : pass(break-scan(prototype,('(')) (161 )

(I83) pass.-8(a.b.pr) = PASS : substr(pr.a + b + 3,1ength(pr))

(I84) pass-9(a.b,pr) = PASS :substr(pr,a + 2,b) (I85) pass-10(a.pr) = PASS : substr(pr.l.a) (186) pass-I l(a.pr) =

PASS : break-scan(substr(pr.a + 2dength(pr)),(')')) (161)

The function table is updated with a new function definition, possibly superseding a previous definition for the same function name. The syntactic analysis of the first argument (prototype) of DEFINE cannot be performed at any earlier time. Extra commas are not permitted in a prototype.

(187) check-and- f ind~s tr t = identifierlstr) --. make-str(str) (I39) T --* error

(I88) identifier(s) = alphabetic°head(s) A alphanumericOtaii(s) (189, 90) (I89) alphabetic(c) =

any((A.B,C.D,E.F.G.H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V, (159) W.X.Y.Z).c) --, T T - - , F

Formal semantics of a SNOBOL4 subset 29

(I90) alphanumeric(s)= is-< >(s)---. T alphabetic°head(s) * any(<'O','l'.'2','3','4'.'5', (I59.89} '6','T,'8'.'9",'.'>,head(s)}

alphanumeric °tail(s) T--*F

091) breakup(s)= is-< >(s)---} PASS : < > T ~ pass-6(a.b); (I63)

b : breakup(substr(s,break-scan(s,(',')) + 2, length(s))); (I61 )

a : eheck-and-find(substr(s, 1, break-s~an(s,<','>))) (I61.87)

(I92) insert-desc(np,entry,params, locals) = PASS : 0 s-function-table : #(s-function-table(~); ( s-desc(np):

#o(<s-entry-name : entry), <s-params : params>, <s-locals :locals)) >)

CONCLUSION

The rules (C1HC24), (A1)-(A17), (M1)-(Mll), and (I1)-(I92) constitute a Vienna-type definition of an interesting subset of SNOBOL4. With the exception of the correspon- dence between concrete and abstract syntax, the definition is entirely formal. A complete definition of SNOBOL4 would be considerably larger but would have essentially the same character.

SUMMARY

The paper presents an interesting case study in the formal semantic definition of programming languages. A significant definitional formalism, the Vienna Definition Lan- guage (VDL), is applied to the specification of the semantics of a useful subset of a significant programming language, SNOBOL4.

In the second section a brief, informal outline of the SNOBOL4 subset is given. The subset is intended to approximate the essential "core" of SNOBOL4. thus retaining a rich and powerful semantics but omitting much tedious detail. The concrete syntax of the subset is specified in an extended BNF notation. The corresponding abstract syntax is given in the third section in the form of VDL predicate definitions. The paper does not formally specify the correspondence between concrete and abstract syntax. as this would be an uninteresting exercise.

A further set of predicate definitions in the fourth section defines the structure of the abstract machine for interpreting (abstract) Snobol programs. The storage com- ponent of the machine consists of an indefinitely long list of "'items". each representing an integer, string, or pattern value. In addition to its list of characters, a string item may contain a pointer to another item which is the "'value" of the string when considered as a "'variable". A pattern item consists of a set of components connected by "subse- quent" and "alternate" links, corresponding to the pattern construction operations of concatenation and alternation. The abstract machine also includes a stack for use during pattern matching, a table of programmer-defined functions, a run stack to handle invoca- tions of such functions, and a flag to indicate whether statement failure has occurred.

The VDL instruction and function definitions in the fifth section specify how the abstract machine interprets programs. The definitions are ordered in a top-down manner and are divided into four groups under the headings Statement Sequencing and Execu- tion, Evaluation of Expressions. Pattern Matching, and Function Invocation. The first group deals with the overall flow of control within a program, the general ways in which statements are executed, and the assignment and output operations. The second group specifies the two general modes of expression evaluation and includes definitions

30 F .G . PAGAN

of the operations relating to input, arithmetic, concatenation, type conversion, and pat- tern construction. The third group constitutes a compact definition of the pattern match- ing algorithm (anchored fullscan mode). In the last group, invocations of programmer- defined functions are handled with the aid of the run stack: on the other hand. if the function name is not in the function table and has a predefined meaning, the appro- priate system operation is performed, one such action being the updating of the function table (DEFINE function).

In comparison with previously published language definitions using VDL. this defini- tion illustrates some ways of applying the metalanguage that are unusual and. at the same time. simple: unusual because of the unusual semantic structure of SNOBOL4. and relatively simple partly because the interpreter-oriented metalanguage is well- matched to the interpretive character of the language being defined.

R E F E R E N C E S

1. R. E. Griswold. J. F. Poage and 1. P, Polonsky, The SNOBOL4 Pro arammim, i Lamjua.m'. 2nd edn. Prentice-Hall. Englewood Cliffs. NJ (1971).

2. J. A. N. Lee. Computer Senmmics. Van Nostrand, New York (1972). 3. P. Lucas. P. Lauer and H. Stigleimer. Method and Notation for the Formal Definition of Programming

Languages. TR 25.087, IBM Laboratory, Vienna (1968 and 1970). 4. P. Wegner. The Vienna Definition Language. Comput. Surveys 4. 5-63 (1972). 5. P. Lauer, Formal Definition of ALGOL 60. TR 25.088. IBM Laboratory, Vienna 09681. 6. P. Lucas and K. Walk. On the Formal Definition of PL/I. Ann. Rev. Aut. Prog. 6, 105-182 (19691. 7. J. A. N. Lee. The Formal Definition of the BASIC Language, Comput. J. 15. 37--41 (1972).

About the Author--FRANK G. PAGAN received his Ph.D. in computer science from the University of Toronto in 1972. He has taught at the University of Aston in Birmingham (England) and at Memorial University of Newfoundland. and has carried out research in computational linguis- tics and in the design, implementation, and formal definition of programming languages. He is the author of an introductory hook on Algol 68. published in 1976.