Visual language implementation through standard compiler–compiler techniques

ARTICLE IN PRESS

Journal ofVisual Languages & ComputingJournal of Visual Languages and Computing

18 (2007) 165–226

1045-926X/$

doi:10.1016/j

�CorrespoE-mail ad

(G. Polese).

www.elsevier.com/locate/jvlc

Visual language implementation through standardcompiler–compiler techniques

Gennaro Costagliola, Vincenzo Deufemia�, Giuseppe Polese

Dipartimento di Matematica e Informatica, Universita di Salerno, 84084 Fisciano (SA), Italy

Received 11 October 2005; received in revised form 1 March 2006; accepted 8 June 2006

Abstract

We present a technique for implementing visual language compilers through standard compiler

generation platforms. The technique exploits eXtended Positional Grammars (XPGs, for short) for

modeling the visual languages in a natural way, and uses a set of mapping rules to translate an XPG

specification into a translation schema. This lets us generate visual language parsers through

standard compiler–compiler techniques and tools like YACC. The generated parser accepts exactly

the same set of visual sentences derivable through the application of XPG productions. The

technique represents an important achievement, since it enables us to perform visual language

compiler construction through standard compiler–compilers rather than specific compiler generation

tools. This makes our approach particularly appealing, since compiler–compilers are widely used and

rely on a well-founded theory. Moreover, the approach provides the basis for the unification of

traditional textual language technologies and visual language compiler technologies.

Published by Elsevier Ltd.

Keywords: Compiler–compiler techniques; Conflict handling techniques; LR parsing; Visual grammars; Visual

languages

1. Introduction

In the last two decades research in visual languages has seen the proliferation ofgrammar formalisms, compiler generation techniques, and compiler–compiler tools [1].

- see front matter Published by Elsevier Ltd.

.jvlc.2006.06.002

nding author. Tel.: +39089 963324; fax: +39 089 963303.

dresses: [email protected] (G. Costagliola), [email protected] (V. Deufemia), [email protected]

www.elsevier.com/locate/jvlc

dx.doi.org/10.1016/j.jvlc.2006.06.002

mailto:[email protected]

mailto:[email protected]

ARTICLE IN PRESSG. Costagliola et al. / Journal of Visual Languages and Computing 18 (2007) 165–226166

Many of them provide an efficient compiler construction life cycle, resembling those usedfor textual programming languages. However, as opposed to textual programminglanguages, visual language compiler construction paradigms lack standardization. Thereare many different grammar formalisms, each associated with a different compiler–com-piler tool. In this article we present a technique for the generation of visual languagecompilers through standard compiler–compilers, like for example YACC [2]. Thetechnique is based upon the formalism of eXtended Positional Grammars (XPGs) [3],and it provides mapping rules to transform an XPG into an equivalent translation schemato be given in input to the compiler–compiler. XPGs extend Positional Grammars (PGs)[4], a previous visual language grammar formalism based on context-free grammars. Inparticular, XPGs extend PGs with new mechanisms enabling the parsing of a broader classof visual languages [3].The XPG formalism has string-like productions inducing a scanning order of the

symbols during the parsing process that yields an efficient parser of the input visualsentence. The associated parsing algorithm is the XpLR parser, which is based on the well-known LR parsing technique. The tool for the automatic generation of compilers andvisual programming environments is the Visual Language Desk (VLDesk) [3], which isbased on XPG and XpLR, and exploits the translation technique presented in this articleto generate a compiler through the well-known compiler–compiler tool Bison [5]. Thegenerated parser accepts the same set of visual sentences derivable through the applicationof XPG productions. As a consequence, the effort needed to adapt the tool to new versionsof the grammar model is considerably reduced, since we only have to modify the mappingrules. Moreover, the VLDesk provides a graphical environment to visually specify anXPG, which simplifies the hard task of grammar specification, making it more intuitive. Asa consequence, the work presented in this article provides an important contributiontoward the derivation of a sound and modular visual language specification andimplementation process. In fact, it adds a back-end link toward standard compiler–com-piler platforms. Finally, the whole approach allows us to see the implementation of visualand textual languages within a common framework, which provides many potentialadvantages, since we can analyze hybrid textual–visual languages within the commonframework of LR parsing.Since the proposed translation technique has standard translation schemes as the target

notation, for a clear explanation we needed to provide some background knowledge onlyfor the source notation. Thus, in Sections 2 and 3 we provide the fundamentals of the XPGformalism and the XpLR methodology, before introducing the translation technique inSection 4. A survey of grammar formalisms for visual languages and associated compilergeneration tools is given in Section 5. Finally, conclusions are given in Section 6. Theapproach is illustrated throughout the paper by a running example describing a typicalvisual language, namely that of state transition diagrams, which is a template for anyconcrete syntax with different representations of nodes and edges.

2. Modeling visual languages with XPGs

In this section we describe the XPGs, a grammar formalism for modeling visuallanguages. First, we introduce a formal definition of the concepts underlying visuallanguages [3].

ARTICLE IN PRESSG. Costagliola et al. / Journal of Visual Languages and Computing 18 (2007) 165–226 167

2.1. Describing visual languages

Basically, a visual language is formed by a set of visual sentences over a set of visualsymbols from an alphabet. A visual sentence of a language L is a set of visual symbolswhose spatial arrangement obeys the syntax of L.

A visual symbol type is characterized by a set of syntactic attributes and a rule calledvisual pattern. Syntactic attributes are used to relate a vsymbol to others, and theirvalues store the ‘‘position’’ of the vsymbol in the sentence. The visual pattern describes theset of graphical objects with the same visual symbol type, similar to what happens intraditional string languages [6] (in this case the graphical objects play the role of thelexemes).

A visual symbol (vsymbol, for short) of type X is a graphical object satisfying the visualpattern of X and its syntactic attributes are completely instantiated. As an example, let usconsider the vsymbol type DECISION representing the decision elements of flowcharts. Inthis case, as usual, the visual pattern describes such elements as rhombuses, whereas thesyntactic attributes keep track of the links connected to the attaching points of therhombuses. The decision element of the flowchart in Fig. 1(a) is a vsymbol of typeDECISION.

A visual alphabet S is a set of vsymbol types. A visual sentence (vsentence, for short) on Sis a set of vsymbols fx1; x2; . . . ; xng whose types are in S. Examples of vsentences are givenin Fig. 1. In Fig. 1(a) the vsymbols are the blocks of the ‘‘flowchart’’. The syntacticattributes of each block correspond to its attaching points and keep track of theconnections among the blocks. In the vsentence of Fig. 1(b) some of the vsymbols arecharacterized by syntactic attributes corresponding to attaching lines visualized, in thefigure, by thicker lines. Notice that the text in the vsentences may be considered as part ofthe graphical appearance of a vsymbol (such as a STOP sign), as simple vsymbols(see Fig. 1(a)) or as sentences of string languages, such as Cþþ code fragments, UMLlabels (see Fig. 1(b)), and so forth.

In general, the different types of syntactic attributes can be used to identify severalclasses of syntactic relations, which yield corresponding ways of modeling visual

EndPlanning

GeneralFunctional

Specification

Review

Revisionneeded?

no

yes

(a)

BeginPlanning

(b)

WaitingWorkingentry/i++ exit/i--

Holding Sending

Awaitingconfirmation

FinishedWork()

confirm()

after(5s)

when(empty)

[ready]doWork(j:Job)/ p.tell(j)

Processing

(c)

keypad Scan

Keypad

key repeats

Keypad

wait fora while

wait forrelease

key hit

key not hit

repeattimeout

key hitkeynot hit

Fig. 1. (a) A flowchart, (b) a statechart diagram, and (c) a data flow diagram annotated with a statechart.


languages. Two main classes of syntactic relations are connection and geometric [3]:

1.

1

the

A connection relation is specified between vsymbols whose syntactic attributescorrespond to attaching points or lines, or more generally regions. In this case, visualsentences can be built by making the attaching regions of vsymbols touch each other orby connecting them through links explicitly represented as lines, circles, arrows, and soforth. The values of the syntactic attributes store the connections linked to thecorresponding attaching regions. One or more vsymbols can be connected to anattaching region. As an example, the vsymbol Waiting in Fig. 1(b), representing a simplestate in a statechart [7], has one attaching region represented by its border line, and twoarrow vsymbols touching it through one of their attaching points.

2.
A geometric relation between vsymbols is specified on the coordinates of the upper-leftand the lower-right vertices of their bounding boxes. Sentences can be built bycomposing vsymbols through relations such as containment, sibling, right-to, etc. As anexample, the vsymbol Processing in Fig. 1(b) is related to the vsymbol Sending in itsbounding box through the containment relation. As another example, the vsymbol a inthe string bacb is related to the vsymbol c through the right-to relation. In this case right-
to is a visual counterpart of the string concatenation relation. As a consequence, it ispossible to model string languages as a special case of visual languages.

Depending on the abstraction level of the visual language representation, syntacticrelations may have an explicit visual representation. In this case we call them explicit

relations. As an example, the arrows in the flowchart of Fig. 1(a) can be modeled either asvsymbols with two attaching points each connected to a different block or conditionvsymbol or as a connection relation linking the attaching regions of two vsymbols.Another example of relation that can have an explicit representation is the annotation asshown in Fig. 1(c). Here, the dotted cone can be modeled either as an explicit relationinvolving the process symbol Scan Keypad and the state transition diagram or as a vsymbolwith the spatial relation ‘‘touches’’ with respect to process and ‘‘includes’’ with respect tothe annotating diagram. Other examples of explicit geometric relations may be zoomingrelations, gesture representations, call-outs, etc.

2.2. Extended positional grammars

XPGs are a grammar formalism that have been successfully used to model a wide classof visual languages, including Class diagrams, Petri nets, Statechart diagrams, and Activity

diagrams [3].An XPG is a particular type of context-free1 string attributed grammar ðN ;T [

POS;S;PÞ where:

�
N is a finite nonempty set of nonterminal vsymbol types; � T is a finite nonempty set of terminal vsymbol types, with N \ T ¼ ; and T ¼ TT [ TF
where TT are true terminals and TF are false terminals;
� POS is a finite set of binary relation identifiers, with POS \N ¼ ; and POS \ T ¼ ;;
Here ‘‘context-free’’ means that the grammar productions are in ‘‘context-free’’ format and does not refer to

computational power of the formalism.


�
S 2 N denotes the starting vsymbol type; � P is a finite nonempty set of productions having the following format:
A! x1R1x2R2 . . . xm�1Rm�1xm;D;G,

where A is a nonterminal vsymbol type, x1R1x2R2 . . . xm�1Rm�1xm is a linearrepresentation with respect to POS where each xi is a vsymbol type in N [ T andeach Rj is partitioned into two sub-sequences

ðhRELh11 ; . . . ;RELhk

k i; hRELhkþ1

kþ1 ; . . . ;RELhnn iÞ with 1pkpn.

The relation identifiers in the first sub-sequence of an Rj are called driver relations,whereas the ones in the second sub-sequence are called tester relations. During syntaxanalysis, driver relations are used to determine the next vsymbol to be scanned, whereastester relations are used to check whether the last scanned vsymbol (terminal ornonterminal) is properly related to some previously scanned vsymbols.Without loss of generality we assume that there are no useless vsymbol types, and nounit and empty productions [6].D is a set of rules used to synthesize the values of the syntactic attributes of A from thoseof x1;x2; . . . ;xm;G is a set of triples fðTj ;Condj ;DjÞgj¼1;...;t, tX0, used to dynamically insert new terminalvsymbols in the input visual sentence during the parsing process. In particular,

� Tj is the terminal vsymbol type of the vsymbol to be inserted in the input visualsentence;� Condj is a pre-condition to be verified in order to insert Tj;� Dj is the rule used to compute the values of the syntactic attributes of Tj from thoseof x1; . . . ;xm.

Moreover, a property that guarantees the convergence of parsing algorithms, based onXPGs, is: ‘‘for each production A! x1 . . . xm, D, G the number of triples in G whoseconditions can simultaneously evaluate to true must be less than m� 1’’. This means thatno more than m� 2 vsymbols can be inserted in the input during the application of aproduction.

In the following we characterize the languages described by an extended positionalgrammar XPG ¼ ðN;T [ POS;S;PÞ. We write a( b and say that b reduces to a in onestep, if there exist d, g, A, Z such that

1.
A! Z, D, G is a production in P, 2. b ¼ dZg, 3. a ¼ dA0pg, where A0 is a nonterminal whose attributes are set according to the rule D
and p results from the application of the rule G.

We also write a(ib to indicate that the reduction has been achieved by applying

production i. Moreover, we write a(�b and say that b reduces to a, if there exist

a0; a1; . . . ; am (mX0) such that

a ¼ a0 ( a1 ( � � � ( am ¼ b.


The sequence am; am�1; . . . ; a0 is called a derivation of a from b.

�
A positional sentential form from S is a string b such that S(�b.
�
A positional sentence from S is a string b containing no nonterminals and no falseterminals, and such that S(
�b.

A visual sentence is obtained by instantiating the physical appearance and the syntacticattributes of the true terminal vsymbol types of a positional sentence from S (this is meantto be realized by a materialization function PE [3]).The language described by an XPG, L(XPG), is the set of the visual sentences obtained

from the starting vsymbol type S of XPG.In the following we define the notion of reachability between two vsymbols which will be

used in the next sections. Given the two pairs ðx; kÞ and ðy; jÞ, where x 2 N [ T , y 2 T , k isa syntactic attribute of x, and j is a syntactic attribute of y, we say that ðy; jÞ is reachable

from ðx; kÞ iff one of the following situations occurs:

1.
x ¼ y; 2. there exists a production x! x1R1x2 . . . xi . . .Rm�1xm, D, G in P such that attribute k of
x is synthesized from attribute h of x1 by means of D, and ðy; jÞ is reachable from ðx1; hÞ.

If ðy; jÞ is reachable from ðx; kÞ, we also say that y is reachable from x.The notion of reachability allows us to provide a formal definition of driver relation. Let

x be the vsymbol type that follows Ri in the right-hand side of a production. If x is aterminal then all the relations in Ri are drivers. Otherwise, the driver relations of Ri arethose defined on the same syntactic attribute k of x such that (y, j) is reachable from (x, k)with y 2 T . Note that this property can be verified statically relieving the grammardesigner from the need to explicitly tag the relations.In the following we show two examples of XPG grammars, the first describing a context-

sensitive string language, and the second modeling a state transition diagram language.

Example 2.1. Let us consider the context-sensitive language L ¼ fanbncn j nX1g. It isgenerated by the string grammar with the following productions:

(1)
S! aBSc
(2)
S! aBc
(3)
Bc! bc
(4)
Ba! aB
(5)
Bb! bb,
where the nonterminals are S and B, and the terminals are a, b, and c. As a matter of fact,

the sentence a2b2c2 is obtained through the following derivation: S)1

aBSc)2

aBaBcc )3

aBabcc)4

aaBbcc)5

aabbcc.

The XPG which generates L can be obtained modifying this string grammaraccordingly. In particular, the set of nonterminals is given by N ¼ fS;Bg where eachvsymbol type has two syntactic attributes, called head and tail, both specifying a positionin the plane. The set of terminals is given by T ¼ TT ¼ fa; b; cg (TF ¼ ;) and have onesyntactic attribute (the pair of coordinates of their centroid), referred to as head or tail


interchangeably. As previously described, the right�to relation is the visual counterpart ofthe string concatenation relation. Thus, the set of relations is given by POS ¼ fright�togand the right�to relation can be defined as

uhright�toi v if and only if 9! vjvheadx ¼ utailx þ 1 and vheady ¼ utaily ,

where u; v 2 N [ T . The set of productions P is described below.

(10)
S! ahright�toiBhright�toiShright�toicD: (Shead ¼ ahead; Stail ¼ ctail)
(20)
S! ahright�toiBhright�toicD: (Shead ¼ ahead; Stail ¼ ctail)
(30)
B! bhright�toicD: ðBhead ¼ bhead;Btail ¼ btailÞ
G: fðc0; true; c0head ¼ chead; c0tail ¼ ctailÞg

(40)
B! ahright�toiB0
D: ðBhead ¼ ahead;Btail ¼ atailÞ

G: fða0; true; a0head ¼ B0head; a0tail ¼ B0tailÞg

(50)
B! bhright�toib0
D: ðBhead ¼ bhead;Btail ¼ btailÞ

G: fðb00; true; b00head ¼ b0head; b00tail ¼ b0tailÞg.

Notice that the prime marks are used to distinguish different occurrences of the samevsymbol type and the terminals in the left-hand side of the string grammar productions aremoved in the G rules of the XPG productions.

Example 2.2. In this example we introduce XPG modeling state transition diagrams thatrecognize sentences on a and b. Let STD ¼ ðN;T [ POS;S;PÞ be the XPG for statetransition diagrams, characterized as follows. The set of nonterminals is given by N ¼

fGraph;Node;Edge;NLabel;ELabelg where the first two vsymbol types have one attachingregion as a syntactic attribute, Edge has two attaching points as their syntactic attributes,the last two vsymbol types have two syntactic attributes, called head and tail, bothspecifying a position in the plane. Graph is the starting vsymbol type, i.e., S ¼ Graph.

The set of terminals is given by TT ¼ fNODEI;NODEIF;NODEF;NODEG;EDGE;a; b;DIGITg and TF ¼ fPLACEHOLDg. The true terminals NODEI, NODEIF, NODEF,NODEG represent the initial, the initial and final, the final, and the generic node,respectively, of a state transition diagram. As syntactic attributes they have one attachingregion corresponding to the borderline of the node, and one containment areacorresponding to the circle area representing the node. The true terminal EDGE hastwo attaching points as their syntactic attributes corresponding to the start and end pointsof the edge. PLACEHOLD is a false terminal to be dynamically inserted in the inputsentence during the parsing process. It has one attaching region as syntactic attribute. Thevsymbol types a and b represent the labels of the edges and have two syntactic attributes,called head and tail, both specifying a position in the plane. Finally, DIGIT is a vsymboltype whose visual pattern matches the decimal digits 0–9. It is used to compose each node’slabel, and has two syntactic attributes, called head and tail, both specifying a position inthe plane.

ARTICLE IN PRESS

Non-terminals Terminals

Edge NLabel ELabelGraph Node

1 1

1 2 head tail head tail

11

NODEI NODEIF NODEF NODEG EDGE PLACEHOLD

1 1

1 2

1

Fig. 2. A typical visual representation of nonterminals and terminals for the grammar STD.

G. Costagliola et al. / Journal of Visual Languages and Computing 18 (2007) 165–226172

Typical instances of vsymbols for this language are graphically depicted in Fig. 2. Here,each attaching region is represented by a bold line and each is identified by a number, eachcontainment area is represented by a light gray area, while the attaching points arerepresented by bullets. In the following, the notation Vsymi denotes the attaching point i ofthe vsymbol Vsym.The set of relations is given by POS ¼ fLINKi;j; any; contains; edge�labellingg, where:

�
LINKi;j is defined as follows: a vsymbol x is in relation LINKi;j with a vsymbol y iffattaching point (or region) i of x is connected to attaching point (or region) j of y, andwill be denoted as i_j to simplify the notation. Moreover, we use the notation i_j whendescribing the absence of a connection between two attaching areas i and j. � The relation identifier any denotes a relation that is always satisfied between any pair of
vsymbols.
� contains is a containment geometric relation. In particular, if A is a vsymbol with a
containment area as syntactic attribute and B is a vsymbol then A contains B if and onlyif B is inside the containment area of A. As an example from Fig. 3(a) NODEI containsthe digit 1 in its containment area.
� edge-labelling is a geometric relation. In particular, if A is a vsymbol of type EDGE and
B is a vsymbol representing a string label then A edge-labelling B if and only if B is closeto A with respect to their syntactic attributes. As an example from Fig. 3(a) the stringlabels a and b are close to the edges of the diagram.

Next, we provide the set of productions for describing state transition diagrams.

(1)
Graph! NODEIhcontainsiNLabel
D: ðGraph1 ¼ NODEI1Þ
(2) Graph! NODEIFhcontainsiNLabel
D: ðGraph1 ¼ NODEIF1Þ

(3)
Graph! Graph0hh1_1i; h1_2ii Edge 2_1Node
D: ðGraph1 ¼ Graph01 � Edge1Þ

G: fðPLACEHOLD; jNode1j41;PLACEHOLD1 ¼ Node1 � Edge2Þg

(4)
Graph! Graph0hh1_1i; h1_2iiEdge
D: ðGraph1 ¼ ðGraph01 � Edge1Þ � Edge2Þ

(5)
Graph! Graph0hh1_2i; h1_1iiEdge1_1Node
D: ðGraph1 ¼ Graph01 � Edge2Þ

G: fðPLACEHOLD; jNode1j41;PLACEHOLD1 ¼ Node1 � Edge1Þg

(6)
Graph! Graph0hanyiPLACEHOLDD: ðGraph1 ¼ Graph01 þ PLACEHOLD1Þ
(7)
Node! NODEGhcontainsiNLabel
D: ðNode1 ¼ NODEG1Þ

ARTICLE IN PRESS

NodeNode Node

Graph

(a) (b) (c) (d)

Productions 11, 7 Productions 11, 1

1

3

2

Production 3

NodeGraph

Production 6

Graph

Production 4

(e) (f) (g) (h) (i)

Productions 13, 10

Graph

Graph

a

b

a Productions 13, 10

NodeGraph

aa

b3 3

Edge

b3

a

Graph Productions 13, 10

a

b

3

Edge

Productions 11, 8

b b

Production 3 Graph

Edge

a

Fig. 3. The reduction process for a state transition diagram.

G. Costagliola et al. / Journal of Visual Languages and Computing 18 (2007) 165–226 173

(8)
Node! NODEFhcontainsiNLabel
D: ðNode1 ¼ NODEF1Þ

(9)
Node! PLACEHOLDD: ðNode1 ¼ PLACEHOLD1Þ
(10)
Edge! EDGEhedge�labellingiELabel
D: ðEdge1 ¼ EDGE1;Edge2 ¼ EDGE2Þ

(11)
NLabel ! DIGITD: (NLabelhead ¼ DIGIThead ;NLabeltail ¼ DIGITtailÞ
(12)
NLabel ! NLabel 0hright�toiDIGITD: ðNLabelhead ¼ NLabel 0head ;NLabeltail ¼ DIGITtailÞ
(13)
ELabel! aD: ðELabelhead ¼ ahead ;ELabeltail ¼ atailÞ
(14)
ELabel! bD: ðELabelhead ¼ bhead ;ELabeltail ¼ btailÞ.
Notice that Graph1 ¼ Graph01 � EDGE1 indicates set difference and is to be interpreted asfollows: ‘‘the attaching area 1 of Graph has to be connected to whatever is attached to theattaching area 1 of Graph0 except for the attaching point 1 of EDGE’’. Moreover thenotation jNode1j indicates the number of connections to the attaching area 1 of Node.

According to these rules, a state transition diagram is described by a graph defined as

�
an initial node containing a label (production 1) or as � an initial–final node containing a label (production 2) or, recursively, as � a graph connected to a node through an outgoing (production 3) or incoming
(production 5) edge, or as
� a graph with a loop edge (production 4).
A node can be either a generic node containing a label (production 7) or a final nodecontaining a label (production 8). An edge is labeled (production 10) by a (production 13)


or b (production 14). A node label is the string concatenation of decimal digits(productions 11 and 12).During the reduction process, the introduction of the PLACEHOLD false terminals

(productions 3 and 5) and their successive processing (productions 6 and 9) allow us tokeep knowledge of the source and the target node of each reduced edge. The same resultcould be achieved by using the terminal NODEG instead of PLACEHOLD. However, thiswould let the grammar describe also unconnected graph structures.Figs. 3(a)–(i) show the steps to reduce a state transition diagram for the language aþb

through the XPG STD shown above. In particular, dashed ovals indicate the handles to bereduced, and their labels indicate the productions to be used. The reduction process startsby applying production 11 to the digit inside the initial state which is reduced to thenonterminal NLabel. Then, production 1 reduces the initial state and NLabel to thenonterminal Graph. Due to the D rule of production 1, Graph inherits all the connections ofNODEI. Similarly, the application of productions 11 and 7 replace the unique NODEG

and the digit of Fig. 3(a) with the nonterminal Node. Fig. 3(b) shows the resultingsentential form, and highlights the handle for the application of productions 13 and 10.The first reduces the edge label to the nonterminal ELabel, the latter reduces the terminalEDGE and ELabel to the nonterminal Edge. Due to the D rule of production 10, Edge

inherits all the connections of EDGE. Fig. 3(c) shows the resulting sentential form, andhighlights the handle for the application of production 3. The vsymbol types Graph, Edge,and Node are then reduced to the new nonterminal Graph. Due to the D rule of production3, the new Graph is connected to all the remaining edges attached to the old Graph.Moreover, due to the G rule, since jNodej ¼ 441, a new node PLACEHOLD is inserted inthe input, and it is connected to all the remaining edges attached to the old Node. Fig. 3(d)shows the resulting sentential form. Production 6 reduces the nonterminals Graph andPLACEHOLD to a new nonterminal Graph. By applying the D rule of production 6, thenew Graph inherits all the connections to PLACEHOLD (see Fig. 3(e)). After theapplication of productions 13, 10, 11, and 8 the sentential form reduces to the one shown inFig. 3(f). The subsequent application of productions 4, 9, 13, 10, and 3 reduces the originalstate transition diagram to the starting nonterminal in Fig. 3(i), confirming that the visualsentence associated with the initial state transition diagram belongs to the visual languageLðSTDÞ.

3. The XpLR methodology

The XpLR methodology is a framework for implementing visual systems based uponXPGs and LR parsing [3]. An XpLR parser scans the input in a nonsequential way, drivenby the relations used in the grammar.

3.1. Lexical analysis

The role of a lexical analyzer in the parsing of visual languages is to preprocess the inputvisual sentence in order to put it in a format suitable for the syntax analysis phase. Inparticular, the lexical analyzer associates each graphical object with a proper vsymbol type,and instantiates its syntactic attributes. In the case of string languages this means giving avalue to the attribute position, which is the only attribute characterizing vsymbols in thislanguage, and represents the position index of the vsymbol in a string. In the case of Plex

ARTICLE IN PRESS

LexicalAnalyzer

b

a

dc

e f

g

1

2

3

4 5

6

7

Edited Visual Sentence

TypeName aps[1] aps[2]

1 Start {a} -2 Activity {a} {b}3 Synchronize {b} {c,d}4 Activity {c} {e}5 Activity {d} {f}6 Synchronize {e,f} {g}7 Halt {g} -

Dictionary

abcdefg

COUNTERAttribute-based representation

1111111

Fig. 4. The lexical analysis of an activity diagram.


visual languages2 [8] the attributes of vsymbols are attaching points numbered andrepresented by an array ap½1�; . . . ; ap½n�. Instantiating them for a vsymbol v means giving avalue to each ap½i�, representing a unique label assigned to the link plugged into attachingpoint i of v. Similarly, in the case of Graph visual languages3 the attributes are attachingregions numbered and represented by an array aps½1�; . . . ; aps½n� of sets. The value of aps½i�

for a vsymbol v is the set of labels of the links plugged into attaching region i of v. If theinput picture contains explicit relations, i.e., the relations have a graphical representation,its attribute-based representation is augmented with an array COUNTER containing anentry for each explicit relation. The entry COUNTER(r) for an explicit relation labeled r

with degree n contains the value n� 1. This value indicates the number of binary relationsdescribing r in any relative representation of the picture.

Fig. 4 shows the attribute-based representation produced by the lexical analyzer on theinput activity diagram. Here node and edge labels have been explicitly represented only tobetter describe the corresponding attribute-based representation. Since all the binaryrelations in the flowchart are explicit the output dictionary also includes an arrayCOUNTER.

Obviously, the task of identification of the vsymbols in a picture strongly depends on thevisual editor used to compose the vsentence. In our approach we assume that the editorincludes a palette with the graphical appearance of the language vsymbols. Thus, eachgraphical object in the edited sentence has associated detailed information about thevsymbol it represents, which simplifies the creation of a dictionary during the scanning ofthe input picture.

Nevertheless, when using general purpose graphical editors the lexical analyzer needsmore complex visual patterns in order to match the drawings in the input picture with thevsymbols of the language. This process is particularly complex in pen-based interfaceswhere sketch recognizers take as input raw strokes and visual patterns of the symbols to berecognized. For example, the circle recognizer reports a circle if all the points on a strokelie at roughly the same distance from the average X and Y coordinates of the stroke [9].

2Plex languages relate vsymbols by line and arrow connections acting on their attaching points. The Plex

languages are suitable for modeling graph-structured visual languages, with the limitation that each vsymbol can

only have a fixed number of connections, like chemical structures, logic diagrams, electrical circuits, etc.3Graph languages are a generalization of Plex languages since the relations act on attaching regions of the

vsymbols they connect. Graph languages are suitable for modeling general graph-structured visual languages

whose vsymbols can have any number of connections.

ARTICLE IN PRESS

action goto next

XpLR Parsing Table

XpLR parsing program

(driver program)Input

sm

Xm

.....

s1

X1

s0

Stack

vsymbol

Next vsymbol request

Output

Fig. 5. The architecture of an XpLR parser.


3.2. The XpLR parser

The components of an XpLR parser are shown in Fig. 5 and are detailed in thefollowing.The input to the parser is the dictionary, called Dp, storing the attribute-based

representation of a picture as produced by the lexical analyzer. No parsing order is definedon the vsymbols in the dictionary. The parser retrieves the vsymbols in the dictionarythrough a find operation, driven by the relations in the grammar. The parser implicitlybuilds and parses a linear representation from the input attribute-based representation.During the parsing phase, all the visited vsymbols, and the traversed explicit binary

relations, are marked in order to guarantee that each vsymbol and each explicit relation beconsidered at most once. The marking of an explicit binary relation REL labeled r is doneby decreasing the entry COUNTER(r) by 1.The 0-entry of the dictionary always refers to the end-of-input symbol EOI. Similar to the

usual end-of-string marker, the end-of-input symbol EOI is returned to the parser if andonly if the input has been completely visited, i.e., all the input vsymbols have been parsed,and all the explicit relations have been traversed. These conditions are signaled by havingall the vsymbols marked and COUNTERðrÞ ¼ 0 for each explicit relation r, respectively.An instance of the stack has the general format s0X 1s1X 2s2 . . .X msm, where sm is the

stack top, X i is a vsymbol, and si is a generic state of the parsing table. The parsingalgorithm uses the state on the top of the stack, and the vsymbol currently underexamination, to access a specific entry of the parsing table in order to decide the nextaction to execute.An XpLR parsing table is composed of a set of rows and is divided into three main

sections: action, goto, and next. The action and goto sections are similar to the ones used inLR parsing tables for string languages [6], whereas the next section is used by the parser toselect the next vsymbol to be processed. An entry next[k] for a state sk contains a pairðRdriver;xÞ, which drives the parser in selecting the next vsymbol y such that it is reachablefrom x and it is related to previously analyzed vsymbols through the driver relations inRdriver. Two special pairs in the column next are ðstart;SÞ and ðend;EOIÞ, where S is thestarting vsymbol type and EOI is the end-of-input marker. The first is used at thebeginning of the parsing process to retrieve the first vsymbol to be parsed, which is avsymbol reachable from S. For example, a vsymbol of type NODEI or NODEIF is the

ARTICLE IN PRESS

ACTION GOTOSt.

NODEI NODEIF EDGE a b EOI Graph Edge NEXT

0 :sh2 :sh3 :1 (start, Graph)

1 :sh5 acc 1_2: 4 (1_1, Edge) (end, EOI)

2 r1 r1 r1 r1 r1 r1 -

3 r2 r2 r2 r2 r2 r2 -

4 r4 r4 r4 r4 r4 r4 -

5 r4 r4 r4 r4 r4 r4 -

6 :sh7 :sh8 (edge-labeling , ELabel)

7 r13 r13 r13 r13 r13 r13 -

8 r14 r14 r14 r14 r14 r14 -

Fig. 6. An XpLR(0) parsing table.


first vsymbol retrieved by the parser constructed from the grammar STD. The latter is usedto check whether the whole input sentence has been parsed. If all the vsymbols have beenanalyzed and all the explicit relations have been considered, then the query returns the EOImarker. Fig. 6 shows a simple example of parsing table for a reduced version of thegrammar STD of Example 2.2 by considering only productions 1, 2, 4, 10, 13, and 14, anddisregarding node labels.

An action entry has one of the following four values:

1.
‘‘Rtester: shift s’’ where Rtester is a possibly empty sequence of tester relations and s is astate;
2.
reduce by a grammar production ðiÞ A! b, shown in the table as ri; 3. accept; 4. error shown as an empty entry.
A goto entry contains ‘‘Rtester: s’’, where Rtester is a possibly empty sequence of testerrelations and s is a state.

A shift or goto action is executed only if all the relations in the corresponding Rtester aretrue, or if Rtester is empty. As an example, let us consider the XpLR(0) parsing table in Fig.6. If the current state corresponds to row 1, and the vsymbol obtained from a reduction isEdge, then the parser executes the goto 1_2: 4, that is, if the tester relation 1_2 holdsbetween Edge and the vsymbol on the stack top, then the parser goes to state 4.

A configuration of an XpLR parser is a pair ðs0 X 1 s1 X 2 . . .X m sm, t1 . . . tn) where thefirst component is the stack contents and the second component represents the inputunmarked vsymbols.

In order to illustrate the XpLR parsing program we define the two functionsFetch_Vsymbol and Test. The former uses the stack and the input as global datastructures and takes its arguments from the column NEXT of the parsing table. The latteris used to validate the tester relations between vsymbols. It takes in input an actioncondition from the action or goto part of the parsing table and returns a boolean value.

Function Fetch_Vsymbol(NEXT)begin


case NEXT of

NEXT ¼ ðstart;SÞ:return the row index in Dp of a vsymbol reachable form S, it corresponds to the first

vsymbol to parseNEXT ¼ (end, EOI):

if all the vsymbols have been marked as visited and COUNTER(r) ¼ 0 for eachexplicit relation r

then return the row index 0 in Dp pointing to the end-of-input symbol EOI

else emit ‘‘error: unparsed input’’ and exit

NEXT ¼ (Rdriver, x), where Rdriver ¼ hRELh11 ; . . . ;RELhn

n i, x is a vsymbol type and

each RELhi

i acts on a syntactic attribute ki of x

for i ¼ 1 to n

let zi be the hi-th vsymbol below the stack toplet next_seti ¼ fb j b is in Dp, it is non marked as visited, it

has an attribute j such that (b, j) is reachable from ðx; kiÞ,zi RELi b holds, and the relation RELi acts on a syntactic attributeof zi and the syntacticattribute j of b, respectively g

if \i¼1...n next_seti contains exactly one vsymbol b

then for each explicit relation RELi in Rdriver do

decrease by 1 the entry in the array COUNTER corresponding to the explicitrelation zi RELi b

mark the corresponding entry in Dp as visited

return the row index of b in Dp

else if \i¼1...n next_seti contains more than one vsymbol b

then emit ‘‘run-time conflict’’ and exit

else return null;NEXT ¼ null:

return null;endcase

end

Let us describe how the function works on the table in Fig. 6. In particular, the relationsRdriver in the NEXT column are as follows:

�
the special relation start: in this case Fetch_Vsymbol returns the index in Dp of avsymbol with type NODEI or NODEIF in the visual sentence, since they are the typesof the vsymbols reachable from S; � the special relation end: in this case Fetch_Vsymbol returns the index in Dp of the markerEOI only if all the vsymbols and all the explicit relations of the visual sentence havebeen visited; � a relation h_k: this relation must hold between the vsymbol z on the stack top andexactly one nonvisited vsymbol b in Dp. In particular, when NEXT ¼ ðh_k; xÞ:1. if x is a terminal, Fetch_Vsymbol returns the index in Dp of a nonvisited vsymbol x
whose kth syntactic attribute is linked to the hth syntactic attribute of z.2. if x is a nonterminal, Fetch_Vsymbol returns the index in Dp of a vsymbol b whose jth

syntactic attribute is linked to the hth syntactic attribute of z. The couples (x, k) and


(b, j) are such that b begins a positional sentence derived from x and the kth syntacticattribute of x is synthesized from j by successively applying the D rules in thederivation.

If no vsymbol is found then Fetch_Vsymbol returns null. On the other hand, if more thanone vsymbol is found, then the parser cannot proceed because it cannot decide whichvsymbol to analyze deterministically. As a consequence, the function issues a run-time

conflict message and stops the execution of the parser. The occurrence of this type ofconflict, called run-time conflict, might prevent the recognition of syntactically correctinput visual sentences. In Section 3.4 we analyze the run-time conflicts, and give someheuristics to solve such problem.

The function Test shown below verifies that the vsymbol to be pushed on the stack top isproperly related to a vsymbol already in the stack.

Function Test(COND)

let COND ¼ ðRELi;xÞ where x is a terminal or a nonterminal vsymbollet z be the ith vsymbol below the stack topif z REL x holds

then begin

if REL is an explicit relation then

decrease by 1 the entry in the array COUNTER corresponding to the explicitrelation z REL b

return true

end

else return false

In the following, we give the complete XpLR(0) parsing algorithm.

Algorithm 3.1. The XpLR(0) parsing algorithm.

Input: A visual sentence in attribute-based representation and an XpLR(0) parsing table.Output: A bottom-up analysis of the visual sentence if this is syntactically correct, an errormessage otherwise.Method: Start with the state s0 on the top of the stack.

repeat forever

let s be the state on the stack topset ip ¼ Fetch_Vsymbol(next[s])if ip is not null

then

let b the vsymbol pointed by ip

if action½s; b� ¼‘‘accept’’ then ‘‘success’’ and exit;if action½s; b� is a conditioned shift of type ‘‘Rt: shift s0’’

then

if Rt is empty or TestðRELh; bÞ is true for each RELh 2 Rt

then push b and then s0 on the stack;


else emit ‘‘syntax error’’ and exit;else emit ‘‘syntax error’’ and exit;

else if action½s; c� ¼ reduce A! x1R1x2R2 . . .Rm�1xm, D, G with c any terminal vsymboltype then

compute the syntactic attributes of A according to the synthesis rule Dapply rule G, if present, and pop 2 �m elements from the stacklet s0 be the new state on the stack topif goto½s0;A�is a conditioned goto of type ‘‘Rt: s’’ then

if Rt is empty or TestðRELh, A) is true for each RELh 2 Rt then

push A and then s00 on the stack andoutput the production A! x1R1x2R2 . . .Rm�1xm, D, G

else emit ‘‘syntax error’’ and exit;else emit ‘‘syntax error’’ and exit;

else emit ‘‘syntax error’’ and exit;endrepeat

Let us suppose that the parser is in the configuration (s0 X 1 s1 X 2 . . .X m sm, t1 . . . tn). TheXpLR parsing program checks the entries next½sm� of the parsing table. IfFetch_Vsymbol(next½sm�) is not null, then the resulting pointer ip points to the nextvsymbol tj to be processed, with 1pjpn, which is used to consult the parsing action tableentry action½sm;Tj� where Tj is the type of tj. The configurations resulting after each of thethree types of moves are as follows:

1.
If action½sm; tj� ¼ ‘‘Rtester: shift s’’, the parser executes a shift move if all the relations inthe corresponding Rtester are true, entering the configuration (s0 X 1 s1 X 2 . . .X m sm tj s,t1 . . . tj�1 tjþ1 . . . tn).
2.
If action½sm; tj � ¼ ‘‘accept’’, parsing is completed. 3. If action½sm; tj � ¼ ‘‘error’’, the parser has discovered an error.
Otherwise, if next½sm� is empty then a reduce action is required. The reduction action½sm; tj� ¼

‘‘reduce A! x1R1x2 . . .Rm�1xm, D, G’’, with tj terminal vsymbol, is accomplished bycalculating the syntactic attributes of A as specified by D, possibly introducing vsymbolsaccording to the G rule, popping 2 �m elements out of the stack, and pushing A on the stacktop. If s0 is the state on the stack top after popping the 2 �m elements, then the next state s00 ofthe parser is given by the entry goto½s0;A�. Also in this case, the goto action may be triggered byan action condition to be verified between objects below the stack top and the object A. Noticethat when inserting a vsymbol vt with the G rule, relations between the instantiated attributes ofvt and the others already in the vsentence are implicitly created.More formally, if action½sm; tj� ¼ ‘‘reduce A! b, D, G’’, the parser executes a reduce

move, entering the configuration (s0 X 1 s1 X 2 . . .X m�r sm�r A s, t1 . . . tn tnþ1 . . . tnþm) wheres ¼ goto½sm�r;A� and r is the length of b, the right side of the production. Moreover,tnþ1 . . . tnþm are the terminal vsymbols specified in the G rule associated with theproduction whose pre-conditions are satisfied.

3.2.1. Constructing XpLR(0) parsing tables

In the following we briefly present the rationale underlying the construction of XpLR(0)parsing tables. The detailed algorithms are given in Appendix A.


Let us start by providing the notion of item. An XpLR(0) item of an XPG is a productionwithout the D and G rules, and with a dot at some position of the right-hand side. However, adot can never be placed between a relation identifier and the terminal or nonterminal to its right.

As an example, the production A! XR1YR2 Z, D, G leads to the following four typesof XpLR(0) items: ½A! �XR1YR2Z�, ½A! X � R1YR2Z�, ½A! XR1Y � R2Z�, ½A!XR1YR2Z��

Intuitively, an item indicates how much of a production has already been examinedduring the parsing process and what is yet to come. For instance, the item½Graph! Graph0 � hh1_1i,h1_2iiEdge� from Example 2.2 means that the nonterminalGraph0 has already been seen and a nonterminal Edge in relation h1_1; 1_2i with Graph0 isexpected next.

A collection of sets of XpLR(0) items provide the basis for constructing XpLR(0) parsers.To construct such collection for a grammar, we define an augmented grammar and twofunctions, closure and goto. Given an XPG G with start vsymbol type S, its augmented XPG G0

is derived from G by adding the new start vsymbol type S0 and the production S0 ! S.If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I

by the two rules:

1.
Initially, every item in I is added to closure(I). 2. If A! a � RBb with aa� or A! �Bb is in closure(I) and B! g is a production, then
add the item B! �g to I, if it is not already there. We apply this rule until no more newitems can be added to closure(I).

Intuitively, given a set of items I containing an item with a dot before a nonterminal B, thefunction CLOSURE adds to I all the items with B in the left-hand side and the dot precedingthe first vsymbol type of the right-hand side. This means that if the nonterminal B is expectednext, then any vsymbol type starting a positional sentential form from B is expected next.

The second useful function is gotoðI ;R;xÞ where I is a set of items, R is a sequence oftester relations, and x is a terminal or nonterminal. gotoðI ;RtesterÞ; xÞ is defined to be theclosure of the set of all items ½A! ahRdriver;Rtesterix � b� such that ½A! a �hRdriver;Rtesterixb� is in I. Intuitively, once x has been seen, the function GOTO determinesthe set of items containing the vsymbol types that can be seen next.

The collection of XpLR(0) item sets of an augmented XPG G0 are incrementally constructedby the main procedure ITEMS, starting from the initial set containing the item ½S0 ! �S�.Similar to the LR case, the sets of XpLR(0) items correspond to the states of a finiteautomation for viable prefixes [6] where the transitions are determined by the function GOTO.Example A.1 describes the collection of XpLR(0) item sets for the grammar of Example 2.2.

Finally, the action and goto parts of the XpLR parsing table are constructed as in theLR parsing tables. The action conditions and the entries in the next column areconstructed as follows:

�
a shift or goto action in state i has a sequence of tester relations Rtester as an actioncondition if and only if the set of items I i corresponding to state i contains an item witha dot preceding a sequence Rtester; � the entry next[i] contains the pair ðRdriver; xÞ if and only if the set of items I i
corresponding to state i contains an item with a dot preceding a sequencehðRdriver;Rtesteri and the vsymbol type x.


3.3. XpLR parsing table conflicts

A conflict in an XpLR parsing table arises when multiple actions are contained in asingle entry of the action, goto or positional parts. An XpLR parsing table may presentshift/shift, goto/goto, and positional conflicts, besides the classical shift/reduce, reduce/reduce conflicts.A shift/shift conflict occurs whenever multiple shift actions are present in a single entry

of the action part. Analogously, a goto/goto conflict occurs whenever multiple goto actionsappear in a single entry of the goto part. Shift/shift or goto/goto conflicts are generatedwhenever a set of XpLR(0) items contains two or more items with the dot preceding thesame vsymbol type with the same sequence of driver relations, but with different testerrelations.As in the LR methodology a shift/reduce (reduce/reduce, resp.) conflict occurs whenever

a single entry of the action part contains both shift and reduce (multiple reduce, resp.)actions.A positional conflict occurs whenever multiple values ðREL; xÞ are present in a single

entry of the next column. This conflict is generated whenever a sequence contains a set ofXpLR(0) items with two or more items with the dot preceding pairs with different driverrelations (row 1 of Fig. 6) or with the same driver relation but different vsymbol types(rows 3 and 4 of Fig. 12).An XPG for which it is possible to construct an XpLR parsing table without conflicts is

said to be an XpLR grammar.

3.3.1. Handling parsing table conflicts

Ambiguities in non-XpLR grammars are handled by exploiting heuristics. In particular,positional conflicts are solved by partitioning the conflicting state into a sequence ofsubstates on the base of the driver relations, and ordering the values ðRELh;xÞ in the sameentry of the next column.As an example, Fig. 7 shows the parsing table of the grammar STD of Example 2.2.

Note that state 1 is partitioned into four ordered substates. Thus when the parser is in state1, it has recognized a nonterminal Graph and proceeds with the parsing of the visualsentence by looking for:

1.
an outgoing edge or a self-edge (corresponding to state 1.1) of Graph as shown inFigs. 8(a) and (b), or
2.
an incoming edge of Graph (state 1.2) as shown in Fig. 8(c), or 3. a vsymbol PLACEHOLD (state 1.3) as shown in Fig. 8(d), or 4. the end-of-input marker EOI (state 1.4).
The order of the substates in a state depends on the syntax of the language to be parsed.In general, the language implementer may need to modify the order of the substatesaccordingly.In many practical cases the partitioning of a state in a sequence of ordered substates

allows us to avoid the conflicts caused by the introduction of G rules in the XpLRgrammars, and also some of the conflicts that could occur when using the XpLR parsingtable construction algorithm, as shift/reduce and reduce/reduce conflicts.

ARTICLE IN PRESS

Fig. 7. An XpLR(0) parsing table with ordered substates.

State 1.1 State 1.2 State 1.3

(a) (b) (c) (d)

Graph Graph

1_1

Graph

1_2

Graph 1_1, 1_2

Fig. 8. A graphical representation of state 1.


The remaining shift/reduce and reduce/reduce conflicts are solved by using disambig-uating rules such as those used by tools like YACC [2]. In particular, a shift/reduce isresolved in favor of shift, and a reduce/reduce is resolved by choosing the conflictingproduction listed first in the grammar specification.

Finally, shift/shift and goto/goto conflicts are solved by ordering the conditioned actionspresent in the same entry. The parser tests the action conditions sequentially and executesthe first action whose condition is verified. Similar to YACC, the order of multiple valuesin the same entry of the parsing table depends on the order of the items in the same set.

It is easy to reproduce the reduction process in Fig. 3 by applying Algorithm 3.1modified with the previous heuristics on the XpLR(0) parsing table in Fig. 7.

Fig. 9 shows a visual representation of the XpLR parser configurations during theparsing process of the state transition diagram in Fig. 3. At the beginning the XpLR parseris in state 0, and by invoking Fetch_Vsymbol(next[0]) it finds a vsymbol reachable from thestarting vsymbol type Graph, which is an initial node. The action in row 0 and columnNODEI is sh2, hence NODEI and state 2 are pushed onto the stack top. Now,Fetch_Vsymbol(next[2]) finds a vsymbol DIGIT inside the initial node. The action in row 2and column DIGIT is sh5, thus DIGIT and state 5 are pushed onto the stack top. Then,

ARTICLE IN PRESS

Fig.9.A

visualrepresentationofXpLR

parsingconfigurationsoninputvisualsentence

ofFig.3.



the action of state 5 is to reduce by NLabel ! DIGIT. Thus, one state and one vsymbolare popped off the stack. Since goto½2;NLabel� ¼ 3, NLabel and 3 are pushed on the stack.Then, the action of state 3 is to reduce by Graph! NODEIhcontainsiNLabel. As aconsequence, two states and two vsymbols are popped off the stack. Since state 0 is on thestack top and goto½0;Graph� ¼ 1, Graph and 1 are pushed on the stack. By invokingFetch_Vsymbol on the pair ð1_1;EdgeÞ (which is the first next entry in the ordered sequenceof state 1) the parser finds an EDGE vsymbol related with the vsymbol on the stack topGraph, and the action in row 1.1 and column EDGE is :sh12. The remaining steps aredetermined similarly.

3.4. Applicability of XpLR parsing

In this subsection we show the properties that an XPG must satisfy in order to obtain acorrect and complete XpLR parser. Moreover, we show that the XpLR methodologyprovides means to handle also grammars whose associated XpLR parsers are not correctand/or complete.

Theorem 3.1 gives the conditions under which an XpLR parser is correct. The proof isderived from Theorem 7.1 of [4].

Theorem 3.1 (Correctness). Let XPG be an XpLR grammar and PðXPGÞ its XpLR parser.

If P is a visual sentence accepted by PðXPGÞ then P 2 LðXPGÞ.

Vice versa, the absence of conflicts in an XpLR parsing table for a language L doesnot guarantee that any visual sentence in L is accepted by the corresponding XpLRparser. Let P be a visual sentence in LðXPGÞ. At each step of the parsing process,the function Fetch_Vsymbol takes as argument the pair ðRELhi

i ;xÞ from the column nextof the parsing table to inquire the input dictionary. For the parsing program toexecute correctly in a deterministic way, there must be a single terminal xi reachable from x

that is detected and returned by Fetch_Vsymbol. However, in the case Fetch_Vsymbol

detects more than one terminal on the pair ðRELhi

i ;xÞ, a ‘‘run-time conflict’’ message isreturned and the parsing program halts. In this case, we say that a run-time conflictoccurred.

As an example, let us consider the sentential form in Fig. 10 obtained during thereduction process described in Example 2.2 (see Fig. 3(e)). The parser produces this formby reaching state 1.1. In the next step, the execution of Fetch_Vsymbol on the pair (1_1,Edge) retrieves two occurrences of the terminal EDGE and, as a consequence, detects arun-time conflict. Let us note that this type of conflict can be caused only by relations thatare not functions.

Definition 3.1 (XpLR parsable). Let PðXPGÞ be the XpLR parser of an XpLR grammarXPG. XPG is said XpLR parsable if for each visual sentence P 2 LðXPGÞ, each execution

Graph

a

b

3

Fig. 10. A sentential form.


of Fetch_Vsymbol invoked by PðXPGÞ during the parsing of P detects and returns one andonly one terminal vsymbol.

In other words, the parser PðXPGÞ of an XpLR parsable grammar XPG does never incurin run-time conflicts. The following theorem gives the conditions for completeness of theXpLR parsing algorithm. The proof is an extension of theorem 7.2 of [4].

Theorem 3.2 (Completeness). Let XPG be an XpLR parsable grammar and PðXPGÞ its

XpLR parser. If P 2 LðXPGÞ then P is accepted by PðXPGÞ.

It is obvious that grammars that exhibit run-time conflicts are undesirable because they arenot suitable for XpLR parsing. To this end, there is an algorithm that allows us tostatically verify (during the construction of the parsing table) whether a positionalgrammar modeling graph-based language produces run-time conflicts [10] (the extension ofthis algorithm to XPG and other classes of languages is straightforward). Whenever thealgorithm detects a conflict it returns the set of items causing the conflict. Therefore, thistechnique allows a designer to have a feedback in the early phases of the syntax definitionof the visual language and gives him/her information on how and where to intervene inorder to solve the conflict. As a matter of fact, whenever the algorithm detects a run-timeconflict the grammar designer analyzes the relation R causing the conflict and verifies if thescanning order of the vsymbols producing the conflict, i.e., belonging to the set detected byFetch_Vsymbol(NEXT), is not relevant to the correct parsing of the sentence. In this caseany of the detected vsymbols can be chosen as the next input. In the example referring toFig. 10, it is easy to show that the relation 1_1 is such that every EDGE can be chosen asthe next vsymbol to be parsed.It is worthwhile to notice that the run-time conflicts problem appears to be similar to

that of confluence and termination of graph transformation systems [11]. In both cases thesolution is a static analysis to detect the rules that can potentially lead to anondeterministic behavior. However, it is easy to show that it is not possible to applycritical pair analysis to detect run-time conflicts and vice versa.In general, when the algorithm statically detects a ‘‘non relevant run-time conflict’’

produced by a relation of this type in a particular set of items, the grammar designer mustexplicitly tag such relation in the XPG.In order to support such approach, the function Fetch_Vsymbol must be modified to

take into account the tagged relations. The modification consists in the addition of thefollowing new case (tagged relations are indicated with a ‘‘*’’):

NEXT¼ ðR�driver; xÞ, where R�driver ¼ hRELh11 ; . . . ;RELhn

n i� and each RELhi

i acts on

a syntactic attribute ki of x

for i ¼ 1 to n do

let zi be the hith vsymbol below the stack toplet next_seti ¼ f b j b is in Dp; it is non marked as visited, it has an attribute j

such that ðb; jÞ is reachable from ðx; kiÞ, zi RELi b holds,and the relation RELi acts on a syntactic attribute of zi

and the syntactic attribute j of b, respectively gif \i¼1...n next_seti is nonempty then

randomly select a vsymbol b from \i¼1...n next_seti

for each RELi in Rdriver that is an explicit relation do


decrease by 1 the entry in the array COUNTER corresponding to theexplicit relation zi RELi b

mark the corresponding entry in Dp as visited

return the row index of b in Dp

else return null;

Although the relations used to model many popular visual languages are applied incontexts such that the relations are tagged, this technique cannot always be applied. Inthese cases, the grammar designer must modify the grammar in order to solve the conflictanalogously to what happens when using traditional compiler–compiler tools such asYACC [2].

Fig. 11 describes the steps that the designer follows to construct an XpLR parser.It is worth noting that for some classes of non-XpLR grammars the application of the

heuristics leads to deterministic parsers. In particular, the conflicts that preserve thedeterminism are: the positional conflicts where all the conflicting entries present the samerelation and this relation is a function, and the shift/shift (goto/goto, resp.) conflicts withmutually exclusive conditioned shift (goto, resp.) actions. In the first case, the relation maybe satisfied by at most one vsymbol, thus Fetch_Vsymbol may succeed at most on oneconflicting entry. Similarly, in the second case only one condition is true; this guaranteesthat only one shift (goto, resp.) action can be performed.

As an example, the grammar for the context-sensitive language anbncn given in Example2.1 is non-XpLR (see Fig. 12) but the positional conflicts present the same relation right�tothat is a function.

In conclusion, the application of heuristics enables the recognition of many practicalvisual languages for which there is no XpLR grammar, considerably broadening the classof visual languages that can be parsed with this approach. In many other cases, the visuallanguage has an XpLR grammar, but often this is much more complex than an equivalentgrammar with conflicts.

3.5. Parsing time complexity

In this subsection we analyze the time complexity of the XpLR parsing algorithm. Asdescribed in Section 2 an XpLR parser may not converge in the analysis of a visualsentence, since the parser may get into a loop while reducing productions where thenumber of vsymbols introduced with G is greater or equal to the number minus one of

Fig. 11. The steps for the construction of an XpLR parser.

ARTICLE IN PRESS

St. Action Goto NEXT a b c EOI S B

0 :sh2 :1 (start, S)1 acc (end, EOI)2 :sh5 : sh4 :3 (right-to, B)

3 :sh2 :sh10 :8(right-to, S)(right-to, c)

4 :s h7 : sh6 (right-to, c)(right-to, b)

5 :sh5 : sh4 :9 (right-to, B)6 r3 r3 r3 r3 - 7 r5 r5 r5 r5 - 8 :sh11 (right-to, c)9 r4 r4 r4 r4 -

10 r2 r2 r2 r2 - 11 r1 r1 r1 r1 -

Fig. 12. The XpLR(0) parsing table for the grammar of Example 2.1.


vsymbols popped from the stack. Thus, we restrict the analysis to the class of convergentXpLR parsers. In particular, we analyze the time complexity to parse a visual sentencecontaining n vsymbols, and with nt vsymbols inserted during the parsing. The worstcase complexity is achieved for correct input pictures when all the input vsymbols arevisited.Given an extended positional grammar XPG, let

�
na be the maximum number of syntactic attributes of a vsymbol type, � no be the maximum number of vsymbol types in the right-hand side of a production, � nr be the maximum number of relations in a tester, and � t be the maximum number of triples in the G rules.
At each step, the parser performs a shift or a reduce action. Therefore, the total number ofshifts will be nþ nt, while the number of reductions will be Oðnþ ntÞ. The parsingalgorithm performs a shift action whenever next[s] is defined and a reduce actionotherwise. Let us compute separately the time complexity for shift and reduce actions.To perform a shift action the parsing program must first access the input dictionary and

then test the action condition, if any. Let tq be the time required to perform the functionFetch_Vsymbol (on next[s]). Moreover, if an action condition is to be performed, theconditioned shift depends on the number nr of relations in a tester and on the time tr to testeach relation. As the push operation on the stack takes time na, the total time complexityto perform a shift action is Oðtqþ nr � trþ naÞ.To reduce a production, the parser has to perform the following steps:

(i)
calculate the syntactic attributes of the left-hand side nonterminal; (ii) apply the G rule; (iii) pop the records corresponding to the right-hand side vsymbols from the stack; (iv) test for conditioned gotos; (v) push the vsymbol corresponding to the left-hand side nonterminal onto the stack.


The cost of step (i) depends on the particular function used to synthesize each syntacticattribute. Let OðtDðnoÞÞ be the time required to perform this task, then the time complexityfor step (i) will be Oðna � tDðnoÞÞ. The cost of the step (ii) depends on the time c to computethe conditions and the time na required to insert new vsymbols in the input dictionary. Thetotal time is Oðtðcþ na � tDðnoÞÞÞ. As the stack pop operation takes time na, step (iii) willcost Oðno � naÞ. Similar to a conditioned shift, a conditioned goto has time complexityOðnr � trþ naÞ. The final push operation (step (v)) takes time na. Therefore, the total timecomplexity for a reduce action is OðnaðtDðnoÞ þ tðcþ naÞ þ noÞ þ nr � trÞ.

Then, the time complexity of the parser is Oððnþ ntÞðnaðtDðnoÞ þ t � cþ noÞþ

nr � trþ tqÞÞ. For a fixed grammar, na, nr, no, c, and t are constants and the timecomplexity reduces to Oððnþ ntÞðtqþ trþ tDÞÞ. The parameters tq, tr, and tD depend on theparticular class of visual languages. For example, for the graph languages the access timetq may vary from a constant to OðnÞ, depending on the chosen implementation of the inputdictionary, while the test time tr is constant. Finally, the time complexity tD forsynthesizing the syntactic attributes of a vsymbol requires OðnÞ time. Thus, for a fixedgrammar modeling a graph language the time complexity is Oððnþ ntÞðn � tqÞÞ. By usingproper hashing techniques to implement the dictionary Dp, the expected time complexityreduces to Oðnðnþ ntÞÞ.

4. Building LR(0) parsers for XPG grammars

Given an extended positional grammar XPG, we can build an XpLR(0) parser thatrecognizes LðXPGÞ by using the algorithms described in the previous section.

In this section, we show that it is also possible to construct a parser for an XPGgrammar by using a translation scheme directly derived from XPG by means of specialmapping rules. A much simpler conversion in translation scheme has been proposed forlinear symbolic PGs in [12]. A translation scheme is a context-free string grammar in whichattributes are associated with the grammar vsymbols and actions enclosed between bracesfg are inserted within the right sides of productions [6]. In the following we denote withmapðXPGÞ the translation scheme SG derived by applying mapping rules to XPG, GðSGÞ

the context-free grammar underlying SG, PðSGÞ the corresponding parser, and LðSGÞ thelanguage recognized by PðSGÞ.

The conversion of an XPG into an ‘‘equivalent’’ translation scheme allows us to usestandard and well-known compiler generation tools, like YACC [2], BYACC, Bison [5],etc., for the rapid implementation of compilers for visual languages. We also prove thatgiven a translation scheme SG ¼ mapðXPGÞ,

1.
GðSGÞ is LR(0) iff XPG is XpLR(0), and 2. the LR(0) parser built on SG recognizes the same set of visual sentences as the XpLR(0)
parser built on XPG.

However, in an attempt to keep the grammars simple, visual language designers oftenprefer to leave ambiguities within the grammar, and to solve them later by using conflicthandling techniques to be specified when generating the parser. This means that we mighthave to frequently deal with ambiguous XPGs. Thus, PðSGÞ needs heuristics for conflictsolving to preserve the equivalence between LðSGÞ and LðXPGÞ. In particular, we willprove that to each type of conflict in an XPG corresponds a precise type of conflict in

ARTICLE IN PRESS

mappingare there

reduce/reduceconflicts?

NO

YES

Application ofR/R ConflictResolutionTechniques

XPG IntermediateTranslation

Scheme

LRTranslation

Scheme

Fig. 13. An approach for the construction of LR grammars from XPGs.


mapðXPGÞ. In this way, we devise conflict handling techniques for mapðXPGÞ simulatingthe behavior of the techniques used for XPG, so that LðXPGÞ is still equivalent toLðmapðXPGÞÞ.Fig. 13 graphically illustrates our approach. Let us observe that if the translation scheme

obtained from the conversion is non-LR(0) then it does not present shift/reduce conflictsbut only reduce/reduce ones.The next subsection describes the mapping process, whereas Section 4.2 shows the

equivalence between LðXPGÞ and LðmapðXPGÞÞ for an XpLR grammar XPG. Finally,Section 4.3 provides techniques for constructing a translation schema mapðXPGÞ for anon-XpLR grammar XPG, such that LðPðmapðXPGÞÞÞ ¼ LðPðXPGÞÞ. In other words, wedescribe how to construct an LR parser simulating PðXPGÞ, including its heuristics.

4.1. Converting an XPG into a translation scheme

In this section we define the mapping rules to convert a generic XPG into a translationscheme. The generated translation schemes have synthesized attributes, i.e., each grammarproduction ‘‘A! a’’ is associated with an action that calculates the attributes of thenonterminal A from the values of the vsymbol types in the right-hand side a.Let us consider the kth production of an extended positional grammar XPG:

A! axihRdriveri;Rtesteri

ixiþ1 b;D;G, (1)

where xi, xiþ1 are either terminals or nonterminals and G ¼ fðN1;Cond1;D1Þ; . . . ;ðNt;Condt;DtÞg with tX0.The syntactic attributes of each vsymbol type in the production will be left unchanged in

the final translation scheme SG. The D and G rules will be emulated within the actionsections of SG. In order to complete the mapping we need to introduce new nonterminals,productions, and actions within SG to simulate the behavior of each sequence of relationsRdriveri

and Rtesteri.

The conversion of XPG in SG is accomplished through the four mapping rules givenbelow, which are applied to the productions of XPG to derive the set of productions andactions of SG. In them we refer to Dp as the dictionary storing the vsymbols of the visualsentence. Moreover, the functions Fetch_Vsymbol and Test have the same behavior of thecorresponding functions defined in Section 3.2, with the only difference that they ignoresome nonterminals when accessing the stack. In particular, they ignore nonterminals addedduring the conversion process, which did not belong to XPG.


The four mapping rules follow:

Rule 1.
Replace each sequence of driver relations Rdriveriwith a new unique nonterminal
DRki . Furthermore, build an empty production on DRki with an actionemulating the fetching of the next vsymbol to parse. Such an action calls thefunction Fetch_Vsymbol on arguments Rdriveri

and xiþ1, where xiþ1 is thevsymbol type following Rdriveri

in the XPG production. When DRki is reduced,the action retrieves the next vsymbol to be processed from Dp. In particular, theadded production is

DRki ! �
f ip ¼ Fetch_VsymbolðRdriveri;xiþ1Þ;
if ip is not null then next_vsymbol ¼ Dp½ip�

else {emit ‘‘syntax error’’; exit;}
}
Rule 2.
Replace each sequence of nonempty tester relations Rtesteriwith a sequence
formed by a new unique nonterminal TRki followed by a false unique terminalaki

. Such a sequence must be placed after the vsymbol type xiþ1 following Rtesteri

in the XPG production. Moreover, introduce a new empty production for TRki

with an action emulating the tester relations. In particular, the action invokesthe function Test for each relation RELh in Rtesteri

to verify whether RELh holdsbetween xiþ1 and a previously scanned vsymbol. If Test returns true foreach tester relation, then the false terminal aki

is returned as the next vsymbol tobe processed. The successful parsing of aki

signals the correct recognitionof xiþ1.

The following productions are the result of applying rules 1 and 2 to Rdriveriand Rtesteri

inthe XPG production 1 by supposing that the relation preceding xi does not contain a testerrelation:

A! mðaÞxiDRkixiþ1TRkiakimðbÞ

{ D;
for j ¼ 1 to t do
if Condj is true then {insert(Dp, Nj); Dj;}

}
DRki ! � { ip ¼ Fetch_VsymbolðRdriveri
, xiþ1);


}
TRki ! �
{ if Test(RELh, xiþ1) is true for each RELh in Rtesteri
then next_vsymbol ¼ aki

}
where mðaÞ denotes the sequence obtained by applying the translation process to thegeneric sequence a on the RHS of some production in XPG.


Rule 3.
Add the following two productions to SG in order to calculate the first vsymbolto be processed, and to verify that all the vsymbols in the input sentence havebeen processed:
S0 ! SP S
{ ip ¼ Fetch_Vsymbol(end);
if ip is not null then {emit ‘‘syntax error’’; exit;}
else {emit ‘‘the sentence is correct’’; exit;}
}
SP! � { ip ¼ Fetch_Vsymbol(start);

}
Here, S is the starting vsymbol type of XPG, whereas S0 is the starting symbol of SG.The following rule aims to reduce the number of productions and nonterminals in SG, so

that the corresponding parser will have a reduced number of states.

Rule 4.
Merge empty productions with identical actions to form a single production.This entails the elimination of the nonterminals on the LHSs of mergedproductions, and the introduction of a new nonterminal as the LHS of theresulting production. Moreover, merge empty productions having the sameparameters in the Test function into a single production. This entails that theLHSs and the false terminals of merged productions need to be replaced by asingle nonterminal and a single false terminal for the resulting production. Thisrenaming process needs to be propagated to all the productions referring to therenamed vsymbol types.
The application of these mapping rules to an extended positional grammar XPG withoutempty productions produces a translation scheme SG in which the productions have twopossible formats:

1.
B! y1 A1 y2 A2 . . .An�1 yn with nX1, where B is a nonterminal from XPG, each Ai iseither a DR or a TR nonterminal, and each yi is either a terminal or a nonterminal fromXPG or a unique false terminal. Moreover, a TR can only be followed by a falseterminal.
2.
A! �, where A is either a DR or a TR nonterminal.
In the following, productions of type 1 will be referred to as ordinary productions, andproductions of type 2 as DR or TR productions depending on whether A is of type DRor TR.

Example 4.1. Given the XPG ¼ ðT ;N [ POS;P;SÞ for state transition diagrams shown inExample 2.2, the application of mapping rules 1–4 yields the translation schemeSG ¼ ðT 0;N 0;P0;S0Þ, where T 0 ¼ T [ fA1;A2;A3g, N 0 ¼ N [ fS0;SP; r1_1; r2_1; r1_2;


r1_1b; tn1_2; t1_2; tn1_1; r_any; r_contains; r_labelling; r_rightg, and P0 is the set ofproductions with actions described in the following (Scheme 1).

4.2. Comparing the recognized languages

In this section we compare LðSGÞ and LðXPGÞ and analyze the circumstances underwhich they are equivalent. In particular, we prove that the grammar GðSGÞ is LR(0) iffXPG is XpLR(0), and that if GðSGÞ is LR(0) then LðSGÞ is equivalent to LðXPGÞ.


Let XPG ¼ ðN ;T [ POS;S;PÞ be an XPG, and let Rd and Rt be the sets of driver andtester sequences of relations in XPG, respectively. Moreover, let SG ¼ mapðXPGÞ

and G0 ¼ GðSGÞ ¼ ðN 0;T 0;S0;P0Þ. From the mapping rules 1–4 seen above we knowthat N � N 0 and T � T 0. In particular, N 0 ¼ N [DR [ TR and T 0 ¼ T [ FICT [ f�g, whereDR is the set of nonterminal vsymbols introduced in G0 by rule 1; TR andFICT are the set of nonterminal and false terminal vsymbols introduced in G0 byrule 2. Furthermore, the following regular expressions will be also used throughout the proofs:

N is the regular expression denoting the set N of nonterminals in XPG;T is the regular expression denoting the set T of terminals in XPG;Rd is the regular expression denoting the set Rd of sequences of driver relations in

XPG;Rt is the regular expression denoting the set Rt of sequences of tester relations in

XPG;DR is the regular expression denoting the set DR of nonterminals resulting from rules

1 and 4;TR is the regular expression denoting the set TR of nonterminals resulting from rules

2 and 4;a is the regular expression denoting the set FICT of false terminals resulting from

rules 2 and 4;x ¼ ðN jTÞ denotes the set of grammar vsymbol types from XPG;PREF ¼ xðhRd;Rt?ixÞ� denotes a set of nonempty prefixes of the right-hand side of a

production in XPG;SUFF ¼ ðhRd;Rt?ixÞ� denotes a set of suffixes of the right-hand side of a production in

XPG;PREF0 ¼ xðDR xðTR aÞ?Þ� denotes a set of prefixes of the right-hand side of a production

in G0;SUFF0 ¼ ðDR xðTR aÞ?Þ� denotes a set of suffixes of the right-hand side of a production

in G0.

If r is a regular expression we will use the standard notation LðrÞ to refer to the languagedefined by r.In the following we define a correspondence between the set of items constructed from

an XPG by using the algorithms of Appendix A and the set of items constructed from thegrammar G0.

Definition 4.1 (Map-equivalence on item sets). Let I be a set of XpLR(0) items derivedfrom XPG and I 0 a set of LR(0) items derived from G0. The sets I and I 0 are map-equivalent

iff

1.

4

the number of kernel items4 is the same, and
2. for each kernel item A! mðaÞ �mðbÞ in I 0 there exists a kernel item A! a � b in I such
that the production A! mðaÞmðbÞ in G0 is derived by the application of rules 1–4 to theproduction A! ab in XPG.

An interesting property of the mapping process is given in the following proposition.

Kernel items include the initial item, S! �S0, and all the items whose dots are not at the left end.


Proposition 4.1. Let I be a set of XpLR(0) items derived from an XPG and I 0 a set of LR(0)items derived from G0 and map-equivalent to I. For each shift/goto transition of PðXPGÞ from

I to an adjacent set of items Ix there exists a set of items I 0x map-equivalent to Ix that can be

reached through 1, 2, or 4 consecutive transitions of PðG0Þ from I 0.

Proof. The items derived from XPG that are subject to shift/goto transitions can be ofthree types:

1.
Kernel items with a nonempty sequence of tester relations following the dot. 2. Kernel items with an empty sequence of tester relations following the dot. 3. Items with a dot at the beginning of the right-hand side.
Since we assume that XPG generates no conflicts, there can only be three cases that areworthwhile to examine.

Case 1: I contains one or more items of type 1 with the same driver sequence, testersequence, and vsymbol type immediately following the dot. In this case there is a singletransition from the set of items

I:
A! a � hRdriver;Rtesterixb, Ix: A! ahRdriver;Rtesterix � b, B! d � hRdriver;Rtesterixg, to the set of items B! dhRdriver;Rtesterix � g, . . . . . .
where a; d 2 LðPREFÞ, Rdriver 2 Rd, Rtester 2 Rt, x 2 N [ T , b; g 2 LðSUFFÞ.Given the nature of the mapping rules 1–4 above it is easy to prove that there exists a

sequence of four consecutive transitions in PðG0Þ starting from the set of items I 0 (map-equivalent to I):

I 0:
A! mðaÞ �DRixTRiamðbÞ, I 0x: A! mðaÞDRixTRia �mðbÞ, B! mðdÞ �DRixTRiamðgÞ, and ending to the set
of items
B! mðdÞDRixTRia �mðgÞ,
. . .
. . .
where mðaÞ;mðdÞ 2 LðPREF0Þ, DRi 2 DR, TRi 2 TR, a 2 FICT , mðbÞ;mðgÞ 2 LðSUFF0Þ.The execution of the actions associated with the empty productions involving the

nonterminals DRi and TRi together with the recognition of a reproduces the same effect asthe XpLR parser invocations of the Fetch_Vsymbol and Test algorithms on (Rdriver, x) and(Rtester, x), respectively.

Case 2: As case 1, but here Rtester is empty. If x is the vsymbol type following the dot,there might exist items of type 3 derived by closure that happen to have x as the firstvsymbol type on their RHS. Thus, in PðXPGÞ there is a shift/goto transition from the set ofitems:

I:
A! a � hRdriver; ixb, Ix: A! ahRdriver; ix � b, B! d � hRdriver; ixg, to the set of items B! dhRdriver; ix � g, X ! s � hRdriver;RtesteriCt, C ! x � l, C! �xl, . . . . . .


and there exists a sequence of two consecutive transitions in PðG0Þ starting from the set ofitems I 0, crossing set of items I 00, and ending to an item set I 0x map-equivalent to Ix:

I 0: A
! mðaÞ �DRixmðbÞ, I 00: A ! mðaÞDRi � xmðbÞ, I 0x: A ! mðaÞDRix �mðbÞ, B ! mðdÞ �DRi xmðgÞ, B ! mðdÞDRi � xmðgÞ, B ! mðdÞDRix �mðgÞ, X ! mðsÞ �DRiCTRiamðtÞ, X ! mðsÞDRi � CTRiamðtÞ, C ! x �mðlÞ, . . . C ! �xmðlÞ, . . .
.
. .
with mðaÞ;mðdÞ;mðsÞ 2 LðPREF0Þ;DRi 2 DR;TRi 2 TR, a 2 FICT , mðbÞ;mðgÞ;mðtÞ;mðlÞ2 LðSUFF0Þ.The execution of the actions associated with the empty production involving the

nonterminal DRi reproduces the same effect as the XpLR parser invocation of theFetch_Vsymbol on (Rdriver, x).

Case 3: In this case I contains a certain number of items of type 3 followed by the samevsymbol type x. This means that there is a shift/goto transition in PðXPGÞ from the set of items

I:
A! �xb, to the set of items Ix: A! x � b, B! �xg, B! x � g, . . . . . .
with x 2 N [ T , b; g 2 LðSUFFÞ, and there exists a transition in PðG0Þ from the set of items

I 0:
A! �xmðbÞ, to the set of items I 0x: A! x �mðbÞ, B! �xmðgÞ, B! x �mðgÞ, . . . . . .
with mðbÞ;mðgÞ 2 LðSUFF0Þ. &

In the next proposition we prove that GðmapðXPGÞÞ can only have reduce/reduceconflicts, for any XPG.

Proposition 4.2. Let G0 ¼ GðmapðXPGÞÞ. If G0 is non-LR(0) then the corresponding parsing

table can never present a shift/reduce conflict.

Proof. Let us contradict the thesis by supposing that the parsing table derived from G0 hasa shift/reduce conflict. In order for G0 to produce a shift/reduce conflict there must exist aset of items K containing at least one complete item and one with the dot preceding aterminal. The latter can be of the following three possible types:1. The dot is between a TR nonterminal and a false terminal

A! mðaÞDRkixiþ1TRki � aki mðbÞmðaÞ 2 LðPREF0Þ;DRki 2 DR;TRki 2 TR; aki 2 FICT ;mðbÞ 2 LðSUFF0Þ.

Since the set of items K must have been reached through a goto operation on TRki and aTR nonterminal has to be followed by a false terminal, then there cannot exist anycomplete item in K. Hence, shift/reduce conflicts cannot involve an item with a dotbetween a TR nonterminal and a false terminal.2. The dot is between a DR nonterminal and a vsymbol type from XPG

A ! mðaÞDRki � bmðrÞmðbÞmðaÞ 2 LðPREF0Þ, b 2 N [ T , DRki 2 DR, mðrÞ 2 LððTRa)?), mðbÞ 2 LðSUFF0Þ.


The set of items K must have been reached by a goto operation on DRki . Moreover, aDR nonterminal has to be followed by a vsymbol type b 2 N [ T . Thus, if b 2 T therecannot exist any complete item in K; if b 2 N the closure on b cannot generate completeitems in K because no empty ordinary productions are allowed. Hence, shift/reduceconflicts cannot involve an item with a dot between a DR nonterminal and a terminal.

3. The dot is at the beginning of the right-hand side of a nonempty productionA! �bmðaÞb 2 T , mðaÞ 2 LðSUFF0Þ

Thus, the set of items K should contain the following item:B! mðdÞDRki � YmðrÞmðgÞmðdÞ 2 LðPREF0Þ; Y 2 N; Y)

�AmðlÞ; mðlÞ; mðgÞ 2 LðSUFF0Þ; mðrÞ 2 LððTRaÞ?Þ.

This corresponds to case 2. Hence, shift/reduce conflicts cannot involve an item with adot at the beginning of the right-hand side of a nonempty production.

It is then proved that G0 can never produce an LR(0) parsing table with shift/reduceconflicts, independently from the XpLR(0) property of XPG. &

Now, we consider the case in which the translation scheme SG obtained from an XPGby applying the mapping rules 1–4 is LR(0). We prove that GðSGÞ is LR(0) if and only ifXPG is XpLR(0).

Theorem 4.1. Let G0 ¼ GðmapðXPGÞÞ. G0 is LR(0) iff XPG is XpLR(0).

The detailed proof is reported in Appendix B. In particular, we first use Proposition 4.1to prove that each type of XpLR(0) conflict of an XPG yields a conflict in the parsing tableassociated with GðmapðXPGÞÞ. Then, we use Proposition 4.2 to prove that the parsing tablederived from G0 can never contain a shift/reduce conflict, and that a reduce/reduce conflictin it would lead to a conflict in the parsing table derived from XPG.

In the next theorem we prove that if the translation scheme SG ¼ mapðXPGÞ is LR(0),then PðSGÞ and PðXPGÞ are equivalent.

Theorem 4.2. If SG ¼ mapðXPGÞ is LR(0) then the parser built on SG recognizes the same

set of visual sentences as the XpLR(0) parser built on XPG.

Proof. Let VS ¼ ft1; t2; . . . ; tng be the set of terminal vsymbols forming an input visualsentence. By using induction on the number of parsed terminals m (equals to nþ nt, wherent is the number of vsymbols introduced during the parsing) we prove that the two parsersproduce equivalent results after reading jpm vsymbols. By equivalent results we mean thateither they both successfully scan the sub-sentence ti1 ; ti2 ; . . . ; tij

, in the same order, or theyboth reject it after reading the same vsymbol tix

, 1pxpj. From Proposition 4.1 PðSGÞ

might do this by performing more transitions than PðXPGÞ, although these do not producefurther effects, since they only simulate the effects of driver and tester relations of XPG.Moreover, the hypothesis and Theorem 4.1 ensure that both XPG and GðSGÞ are conflictfree.

Induction base (m ¼ 1): Let us examine the steps executed by PðXPGÞ when scanning VS.We know that the starting set of items I0 derived from XPG must contain the itemS0 ! �S, which generates by closure in I0 a certain number of nonkernel items of typeA! �xib, b 2 LðSUFFÞ, which in turn might generate items of the same type within I0.The parser PðXPGÞ starts by reading ti1 from the input. If I0 does not contain anynonkernel item of type A! �ti1b, then the parser returns error and VS is rejected.


Otherwise, there exist k40 nonkernel items of type A! �ti1b, so that PðXPGÞ performs ashift to a set of items Ix containing at least k items of type A! ti1 � b. The parser PðSGÞ

will have a similar behavior on VS. In particular, the starting set of items I 00 derived fromSG must contain the following items:

I 00 : S0 ! �SP S,

SP! �

with SP 2 DR. There are no more items in I 00 generated by closure, but there exists a set ofitems I 01 derived from SG which contains the item S0 ! SP � S, and generates by closure inI 01 the same number of nonkernel items generated from S0 ! �S in I0. In particular, foreach item in I0 of type A! �xib, b 2 LðSUFFÞ, there exists in I 01 a corresponding itemA! �ximðbÞ; mðbÞ 2 LðSUFF0Þ. Therefore, it is easy to verify that the terminals startingthe right-hand sides (RHS) of items in I 01 are the same as those in I0. Moreover, I 00 containsno items starting with a terminal, hence PðSGÞ will necessarily reduce with SP! � and willexecute the associated positioning action to scan the vsymbol ti1 . Then, it performs atransition from I 00 to I 01. If PðXPGÞ rejected VS it means that ti1 does not start any RHS ofitems in I1, and from what said above, it cannot start a RHS of items in I 01. Thus, alsoPðSGÞ rejects VS and returns a parse error. Vice versa, if PðXPGÞ scanned ti1 successfully, itmeans that I 01 contains exactly k40 items of type A! �ti1mðbÞ as I0. Thus, also PðSGÞ

performs a transition to a state I 0x map-equivalent to Ix, containing k items of type A!

ti1 �mðbÞ, and perhaps some empty production of type ‘‘DR ! �’’.Induction hypothesis/step: If the two parsers PðXPGÞ and PðSGÞ produce equivalent

results after reading jom vsymbols, then they produce equivalent results after reading the(j þ 1)th vsymbol.Obviously, if both PðXPGÞ and PðSGÞ returned a parse error there would be no (j þ 1)th

step. Vice versa, if they produced equivalent results reading j vsymbols, it means that theyhave reached map-equivalent set of items I j and I 0j. We distinguish two cases according tothe different structures of I j and I 0j .

Case 1: I j contains one or more kernel items like

A! atij� hRdriveri

;Rtesteriixib, (k1)

where a 2 LðPREFÞ, Rdriveri 2 Rd, Rtesteri 2 Rt [ f�g, b 2 LðSUFFÞ, xi 2 N [ T .From the induction hypothesis I 0j contains similar kernel items like

A! mðaÞtij�DRixiTRiaimðbÞ,

DRi! � (k10

)

with mðaÞ 2 LðPREF0Þ [ f�g, DRi 2 DR, TRi 2 TR [ f�g, ai 2 FICT [ f�g, mðbÞ 2LðSUFF0Þ.For each xi 2 N, xi will generate one or more items of type xi ! �yib, b 2 LðSUFFÞ, by

closure in I j. The application of the transitive closure to each xi 2 N yields a certainnumber of terminals b1; b2; . . . ; bn, each appearing as the first of one or more RHSs of itemsin I j. Thus, starting from the last scanned vsymbol tij

, PðXPGÞ executes the driver relationsin Rdriveri in order to scan the next vsymbol tijþ1

from the input. If tijþ1 does not coincide


with any of the xi 2 T following Rdriveri in I j or the terminals b1; b2; . . . ; bn, then PðXPGÞ

returns parse error and VS is rejected. Otherwise, PðXPGÞ successfully scans tijþ1and

performs a shift to a state Ix. The latter contains all the items of type X ! stijþ1� d,

s 2 LðPREFÞ [ f�g, d 2 LðSUFFÞ, such that X ! s � tijþ1d was in I j, plus those theygenerate by closure. Similarly, the same situations will also have occurred in PðSGÞ. In fact,from the assumption that there are no conflicts, PðSGÞ can only reduce with DRi! � inI 0j, and the execution of the associated action reproduces the same positioning effects ofRdriveri starting from the last scanned vsymbol tij

. Thus, PðSGÞ will have transited in a setof items I 00j containing items of type A! mðaÞtijDRi � xiTRiai mðbÞ. Thus, for each xi 2 N,xi will generate by closure similar items as those generated from xi in I j, with the same setof terminals b1; b2; . . . ; bn, starting their RHSs. Therefore, it is easy to see that if tijþ1

wassuccessfully scanned by PðXPGÞ, then it will also be scanned by PðSGÞ, which will transitto a state I 0x map-equivalent to Ix. Conversely, if PðXPGÞ returned parse error then alsoPðSGÞ returns parse error.

Case 2: I j contains one complete kernel item like

A! atij � , (k2)

with a 2 LðPREFÞ [ f�}.From the induction hypothesis I 0j contains a similar complete kernel item like

A! mðaÞtij� , (k2

0

)

with mðaÞ 2 LðPREF0Þ [ f�}.This means that PðXPGÞ performs a reduction and will return to a set of items Ih

containing an item with the dot preceding the nonterminal A:

X ! s � rAd, (k3)

with s 2 LðPREFÞ [ f�g, r 2 LððhRd;Rt?iÞ?), d 2 LðSUFFÞ, and s ¼ �3r ¼ �.Analogously, PðSGÞ will reduce to A and will return to the following set of items I 0h:

X ! mðsÞ � AmðdÞ, (k30

)

. . .

with mðsÞ 2 LðPREF0ðDRÞÞ [ f�g, mðdÞ 2 LðSUFF0Þ.It is easy to prove that both parsers perform a goto on A, transiting to map-equivalent

states Ihþ1 and I 0hþ1, respectively. If d and mðdÞ are not empty we run into case 1, so we canapply the same arguments. If they are both empty and A is not S, then we are in case 2again, so we apply the same reasoning until we run into case 1 or A becomes S. In the lastcase, both check if there are vsymbols in the input which have not been examined. Sincefrom the inductive hypothesis both parsers have scanned the same vsymbols, in the sameorder, it means that PðXPGÞ accepts VS if and only if PðSGÞ accepts VS. &

In the next subsection we consider the non-LR(0) translation scheme generated throughmapping rules 1–4.


4.3. Resolving conflicts in non-LR(0) translation schemes

A grammar G0 ¼ GðmapðXPGÞÞ may not be LR(0), hence PðSGÞ needs conflict solvingheuristics to preserve the equivalence between LðSGÞ and LðXPGÞ. To this aim,previously we have proved that conflicts in G0 are introduced by conflicts in XPG. Inparticular, we have proved that each conflict in XPG always yields one reduce/reduceconflict in G0. This is an important property because it enables us to develop conflictsolving heuristics in G0 simulating the heuristics adopted on XPG (see Section 3.3.1), sothat LðXPGÞ is still equivalent to LðG0Þ. In this way, we can use the parsing implementationtechnique presented in this article even in those cases when the XPG grammar is notXpLR(0).As shown in Fig. 13, initially we ignore the non-LR problem and use the transfor-

mation algorithms of Section 4.1 to generate what we call an intermediate translationscheme. Successively, we apply ad hoc transformation techniques to the intermediategrammar in order to eliminate the conflicts possibly caused by the original non-XpLRgrammar XPG.In order to devise conflict handling techniques for SG ¼ mapðXPGÞ, we must identify

the possible reduce/reduce conflicts on the grammar SG and modify it according toresolution techniques preserving the property LðPðSGÞÞ ¼ LðPðXPGÞÞ. As shown in theproof of Theorem 4.1, the possible reduce/reduce conflicts in a set of items are given by thepossible combinations of ordinary, DR and TR productions. For an (ORDINARY,ORDINARY) conflict we have the following set of items I 0:

I 0 : A! mðaÞxi� (i1)

B! mðbÞxi� (i2)

. . .

xi 2 N [ T ;mðaÞ;mðbÞ 2 LðPREF0ðDRÞÞ [ f�g.

In the XpLR methodology this type of conflict is solved by choosing the conflictingproduction listed first in the grammar specification. This approach can be simulated withthe introduction of the nonterminal ‘‘next1’’ in the two conflicting productions followed bynew false terminals.

A! mðaÞxi next1 ai

B! mðbÞxi next1 aj

then we introduce the empty production

next1! � f next_vsymbol ¼ ak; g

where ak is ai if the production associated with (i1) precedes the production associated with(i2) in the XPG specification, otherwise ak is aj.The techniques for removing the remaining five types of conflicts from the grammar are

detailed in Appendix C. The general idea is to introduce new nonterminals NEXTi to

ARTICLE IN PRESS

Fig. 14. The algorithm for the resolution of reduce/reduce conflicts in the translation scheme.


replace the conflicting TR and DR vsymbol types, and to append them to the conflictingORDINARY productions together with a false terminal. In this way, the action associatedwith NEXTi discriminates which production to reduce. Fig. 14 shows the algorithm for theelimination of reduce/reduce conflicts from the grammar G0 based on the techniquesintroduced above. The algorithm takes into account the possibility that a production beinvolved in more than one conflict. To highlight the occurrence of these situations, thealgorithm constructs a graph where the nodes are the states of the parser having a reduce/reduce conflict, and the edges connect states with a common complete item. In the presenceof edges we introduce only one nonterminal NEXTi for all the occurrences of the sameconflicting item, but the associated action must contain conflict handling code for all thenodes in which the item occurs.

The equivalence of the languages recognized from PðGÞ and PðXPGÞ followsfrom Theorem 4.1 and from the approaches used in the resolution of reduce/reduceconflicts.

Example 4.2. Let us consider the non-LR(0) translation scheme SG of Example 4.1. Theapplication of the algorithm in Fig. 14 to SG first creates the two sequencesordnextð12Þ ¼ hr_right, NLabeli and ordnextð17Þ ¼ hr1_1, r2_1, r_any, S0i. Since they donot share complete items, there will be no edge connecting nodes 12 and 17, meaningthat the conflict handling techniques are applied separately to the conflicting produc-tions. Since the resulting translation scheme has a reduce/reduce conflict caused by thetester relations, on the second loop iteration the algorithm creates the sequence


ordnextð21Þ ¼ htn1_2; t1_2; tn1_1i. The final translation scheme is SG0 ¼ ðT 0;N 0;P0;S0Þ,where T 0 ¼ T [ fA1;A2;A3;A4;A5g, N 0 ¼ N [ fS0;SP; r2_1; r1_1b; next1; next2; next3;r_contains; r_labellingg, and P0 is the set of productions with actions as detailed below(Scheme 2).

At this point SG0 can be easily represented as a Bison specification, which can in turn beprocessed by VLDesk (see Fig. 15) to generate the final visual programming environment.

ARTICLE IN PRESS

Fig. 15. The Bison specification editor of VLDesk.


Fig. 16(a) shows the parsing result for a correct visual sentence, whereas Fig. 16(b) shows asentence with a parse error (the initial state symbol is missing).

5. Related work

In the last two decades many grammatical formalisms for visual languages have beendeveloped [1], most of which have associated a tool for the generation of compilers andvisual programming environments. However, although some of these tools use a Lex/YACC fashion to specify the visual compiler, none of them uses a standardcompiler–compiler. This yields an undesirable tight coupling between the grammarformalism and the associated compiler generation tool. In what follows we survey severalwell-known visual language grammar formalisms, and the associated compiler generationtools.

In the literature there are several categories of grammar formalisms. Some of them useattributes to handle information about the spatial layout of symbols, and to guide theparsing process. In fact, grammar productions are applied if attribute values satisfy specificconstraints. In this category of formalisms we find XPGs [3], relational grammars (RGs)[13,14], constraint multiset grammars (CMGs) [15], and picture layout grammars (PLGs)[16]. Other grammar formalisms specify relationships among visual symbols at a high level

ARTICLE IN PRESS

Fig. 16. The visual programming environment generated by VLDesk.



of abstraction. In this category we find symbol-relation grammars [17], hypergraphgrammars [18], and layered graph grammars [19,20].

Many of the proposed grammar formalisms support order-free pictorial parsers thatprocess the input objects according to no ordering criterion. The formalisms of PLGs,relation grammars, and CMGs fall into this class. In general, and in the worst case, anorder-free parser proceeds with a purely bottom-up enumeration. To limit the parsingcomputational cost, subclasses of PLGs, CMGs, and RGs have been defined to provide thecorresponding parsers with predictive capabilities that restrict the search space. To furtherimprove parsing efficiency, predictive pictorial parsers have also been defined. Other thanXpLR parsers, in this category we also find those based on RGs [13]. In general, thebroader the class of languages to be treated, the less efficient the parsing algorithm is.

In the formalism of PLGs a visual sentence is viewed as a multiset of visual symbols,with attributes containing positional information about symbols. Each production of aPLG grammar is associated with a set of semantic functions and constraints. The formersspecify rules by which attributes of left-hand side symbols are derived from those of right-hand side symbols, whereas constraints represent predicates over the attribute values of theright-hand side symbols, and are used to determine when a production can be applied. Theassociated parsing algorithm has the drawback that it works under certain restrictions thatcannot always be checked at run-time. As a consequence, it cannot be guaranteed that thePLG parser produces correct results nor that it terminates. A visual compiler generationtool based on this grammar formalism is the Visual Programmer’s Workbench (VPW) [21],which enables the generation of visual programming environments such as iconiclanguages and some diagrammatic languages.

CMG is another constraint based formalism, and is highly related to PLGs. Theyprovide a grammar formalism based on multiset rewriting, where a nonterminal symbol ina multiset can be rewritten by a production in the grammar whenever the attributes of thesymbols in the multiset satisfy a given constraint describing relationships between pictures.CMGs also allow the specification of negative constraints, which enables the specificationof visual languages by deterministic CMGs, yielding an efficient parsing [22]. A visualcompiler generator based on CMGs is the Penguins system. Its main characteristic is anincremental parsing technique, which allows the presence of incorrect intermediate visualsentences, yielding a user friendlier paradigm for visual sentence manipulation.

RGs is a formalism based on relational structures. A recent version also considersattributes, but these can only be associated with terminal graphical objects. The parsingalgorithms of RG have been applied to several application fields, such as mathematicalexpression analysis, line drawing for pen-based interfaces, multidimensional dataverification, interactive support for design, and multimedia document generation [14].

Symbol-relation grammars (SR) views a visual sentence as a set of symbol occurrencesand a set of relational items over symbol occurrences [17]. The derivation of a visualsentence is accomplished by rewriting both symbol occurrences and relational items bymeans of simple context-free rules. This formalism is supported within the visual languageprogramming environment generator (VLPEG) [23]. The latter provides automatedsupport for the specification and implementation of visual language compilers.

Layered graph grammars are context-sensitive graph grammars where the left- andright-hand sides of productions are extended with context elements. The latter cannot bemodified as a result of production applications, but they may be used as sources of targetsfor new relationships [19]. The presence of context elements requires the use of quite


complex parsing algorithms, which have exponential time and space complexity.Nevertheless, the authors argue that such algorithms drastically reduce their complexitywhen run upon real world examples of visual notations.Zhang and Zhang [24] have proposed a simple parser for a further restricted kind of

layered graph grammars, called reserved graph grammars (RGG). They reverse eachgrammar production and thus obtain a new graph transformation system. As long as thereverse graph transformation system is confluent, a graph can be parsed by reducing it toan initial graph. The parsing algorithm for RGGs has a polynomial worst-case behaviorbut is restricted because of the requirement of confluence. VisPro [24] is a set of visualprogramming tools capable of automatically generating visual compilers and visualprogramming environments in a Lex/YACC fashion. The construction of visual notationsconsists of a lexicon and a grammar specification. During the lexicon definition the userdefines the visual objects and a visual editor. Then, the formalism of RGGs is used tospecify the language syntax. This specification is used by the toolset to generate a compilerand the visual programming environment.Close to grammar-based approaches are formal specification methods based on

rewriting systems. In particular, it is worth mentioning the visual attributed rewritingsystem [25] conceived to specify the pictorial and computational aspects of visuallanguages formalizing interactive sessions of the human–computer dialogue. Thisapproach has been implemented in the GenIAL system. It allows users to constructvisual sentences in free-order form, and implements the interpreter for the controlautomation of the correspondent visual language.

6. Discussion

We have presented a new technique for implementing visual language compilersexploiting standard compiler–compiler techniques. This is accomplished through mappingrules that allow us to transform XPGs into equivalent translation schemes. Moreover, withthis technique we are also able to generate LR parsers for non-XpLR grammars. In fact,once we have chosen conflict handling heuristics on the XPG, we can derive equivalentheuristics on the translation schema by modifying some of its productions and some of theactions in a systematic way.We believe that the proposed technique provides a new efficient way to generate visual

language compilers. It represents an important milestone toward the unification of visuallanguage compiler generation platforms, as it is for string languages. In fact, the approachrelies on standard compiler–compiler techniques to generate visual language compilersmodeled through XPGs. Thus, it avoids the necessity to use a specific compiler generationtool, whereas the majority of existing grammar formalisms rely upon their own proprietarycompiler generation platforms. The use of standard compiler generation platforms canalso provide advantages in terms of tool reliability. In fact, as opposed to proprietarytools, standard compiler–compilers are widely used, hence they have been thoroughlytested and revised. Moreover, the generation of visual language compilers throughtranslation schemes provides us with a common framework for both visual and stringlanguages, which can provide interoperability between these two separated worlds, withmany potential advantages. As an example, it can facilitate the construction of hybridhierarchies of visual languages interleaved with string languages, such as flow chartsannotated with textual programming languages. But it can also facilitate the addition of


visual features to string languages, since both visual and textual features can be analyzedusing a translation schema.

In general, one of the drawbacks in the use of LR-based parser approaches is thatimplementors must have a good knowledge about the underlying parsing techniques. As aconsequence, in order to improve usability of the approach, language implementors needtools assisting them in the grammar definition (see for example Visual�Parseþþ [26]).Thus, to reduce the intrinsic difficulties of building an XpLR grammar we are extendingVLDesk with functionalities to aid implementors in the definition and reuse of grammars,and in the resolution of conflicts. These functionalities are made possible by the fact thatboth the identification of driver and tester relations and the detection of run-time conflictscan be performed statically, as shown in Section 2.2 and in [10], respectively.

In future we would like to further investigate the mapping process to enable the use ofstandard LR-based parser generation tools for implementing incremental visual languageparsers [27].

Appendix A. Constructing XpLR(0) parsing tables

In this appendix we present the algorithms for the construction of an XpLR(0) parsingtable.

Function CLOSURE(I)begin

J ¼ I ;repeat

for each item ½A! a � RBb� with aa� or ½A! �Bb� in J

and each production B! g in G0 such that B! �g is not in J

do add ½B! �g� to J

until no more items can be added to J

return J;end.

Function GOTO(I, Rtester, x)begin

if Rtester ¼ ; then

let J ¼ f½A! aRdriverx � b�jaa� and ½A! a � Rdriverxb� 2 Ig[

f½A! x � b�j½A! �xb� 2 Ig

else

let J ¼ f½A! ahRdriver, Rtesterix � b�jaa�, and ½A! a � hRdriver;Rtesterixb� 2 Ig

return CLOSURE(J)end

Algorithm A.1. Construction of the sets of XpLR(0) items.

Input: An augmented extended positional grammar G0.Output: The collection of XpLR(0) item sets.


Method: Item sets are constructed by the main procedure ITEMS, which in turn calls thetwo functions CLOSURE and GOTO.

Procedure ITEMSðG0Þ

begin

let C ¼ fhCLOSUREðf½S0 ! �S�gÞig

repeat

for each set of items I in C, each vsymbol type x

such that there exists ½A! a � hRdriverixb� 2 I or ½A! �xb� 2 I andGOTOðI ;;;xÞ is not included in C

do C ¼ C [ GOTOðI ;;;xÞfor each set of items I in C, each vsymbol type x and each sequence

of tester relations Rtestera; such that ½A! a � hRdriver;Rtesterixb� 2 I

and GOTOðI ;Rtester;xÞ is not included C

do C ¼ C [ GOTOðI ;Rtester;xÞuntil no more sets of items can be added to C

end

Example A.1. The collection of XpLR(0) item sets for the grammar of Example 2.2 isdescribed in the following. The notation (goto j) to the right-hand side of an item K ¼

½A! a � hRdriver;Rtesterixb� indicates the item sets I j returned by GOTOðK ;Rtester; xÞ.

I0 ¼ f S0 ! �Graph
(goto 1)
Graph! �NODEIhcontainsiNLabel
(goto 2) Graph! �NODEIFhcontainsiNLabel (goto 4)
Graph! �Graph0hh1_1i; h1_2iiEdge 2_1 Node
(goto 1)
Graph! �Graph0hh1_1i; h1_2iiEdge
(goto 1)
Graph! �Graph0hh1_2i; h1_1iiEdge 1_1 Node
(goto 1)
Graph! �Graph0hanyiPLACEHOLD
(goto 1)}
I1 ¼ f S0! Graph�

Graph! Graph0 � hh1_1i, h1_2iiEdge 2_1 Node
(goto 8)
Graph! Graph0 � hh1_1i; h1_2iiEdge
(goto 9)
Graph! Graph0 � hh1_2i; h1_1iiEdge 1_1 Node
(goto 10)
Graph! Graph0 � hanyiPLACEHOLD
(goto 11)
Edge! �EDGEhedge�labellingiELabel
(goto 12)}
I2 ¼ f Graph! NODEI � hcontainsiNLabel
(goto 3) NLabel ! �DIGIT (goto 5)
NLabel ! �NLabel 0hright�toiDIGIT
(goto 3)}
I3 ¼ f Graph! NODEIhcontainsiNLabel�

NLabel ! NLabel0 � hright�toiDIGIT
(goto 7)}
I4 ¼ f Graph! NODEIF � hcontainsiNLabel
(goto 6)


NLabel ! �DIGIT
(goto 5)
(goto 6)}
I5 ¼ f NLabel ! DIGIT�}

I6 ¼ f Graph! NODEIFhcontainsiNLabel�

NLabel ! NLabel 0 � hright�toiDIGIT
(goto 7)}
I7 ¼ f NLabel ! NLabel0hright�toiDIGIT�g

I8 ¼ f Graph! Graph0hh1_1i; h1_2iiEdge � 2_1 Node
(goto 13)
Node! �NODEGhcontainsiNLabel
(goto 14) Node! �NODEFhcontainsiNLabel (goto 15) Node! �PLACEHOLD (goto 16)}
I9 ¼ f Graph! Graph0hh1_1i; h1_2ii Edge�g

I10 ¼ f Graph! Graph0hh1_2i; h1_1iiEdge � 1_1 Node
(goto 17)
Node! �NODEGhcontainsiNLabel
(goto 14) Node! �NODEFhcontainsiNLabel (goto 15) Node! �PLACEHOLD (goto 16)}
I11 ¼ f Graph! Graph0hanyiPLACEHOLD�g

I12 ¼ f Edge! EDGE � hedge�labellingiELabel
(goto 18) ELabel ! �a (goto 19) ELabel ! �b (goto 20)}
I13 ¼ f Graph! Graph0hh1_1i; h1_2ii Edge 2_1 Node�g

I14 ¼ f Node! NODEG � hcontainsiNLabel
(goto 21)}
I15 ¼ f Node! NODEF � hcontainsiNLabel
(goto 22)}
I16 ¼ f Node! PLACEHOLD�}

I17 ¼ f Graph! Graph0hh1_2i; h1_1iiEdge 1_1 Node�g

I18 ¼ f Edge! EDGEhedge�labellingiELabel�g

I19 ¼ fELabel ! a�g
I20 ¼ fELabel ! b�g


I21 ¼ f Node! NODEGhcontainsiNLabel�

NLabel ! NLabel0 � hright�toiDIGIT
(goto 7)}
I22 ¼ f Node! NODEF � hcontainsiNLabel

NLabel ! NLabel0hright�toiDIGIT�
(goto 7)}
Notice that the set of items I1 reveals a positional conflict, whereas I3, I6, I21, I22 revealshift/reduce conflicts, as also shown in the parsing table of Fig. 7.

Algorithm A.2. Constructing an XpLR(0) parsing table.

Input: An augmented extended positional grammar G0.

Output: The XpLR(0) parsing table for G0.Method:
1. Construct C ¼ fI0; I1; . . . ; Img, the collection of sets of XpLR(0) items as described in
Algorithm A.1.
2. State i of the parser is constructed from the set of items I i. The entries for state i of the
parsing table action and next parts are determined as follows:SHIFT ENTRIES� If ½A! a � Rdriverab� or ½A! �ab� is in I i and GOTOðI i;;; aÞ ¼ I j then set

action½i; a� ¼ ‘‘ T : shift j’’ (a is required to be a terminal) where T stands for acondition which returns always true.� If ½A! a � hRdriver;Rtesteriab� is in I i and GOTOðI i;Rtester; aÞ ¼ I j then set

action½i; a� ¼ ‘‘Rtester: shift j’’ (a is required to be a terminal).

REDUCE ENTRIES� If ½A! a�� is in I i then set action½i; a� ¼ ‘‘reduce A! a’’ for each terminal a.

NEXT and ACCEPT ENTRIES� Whenever ½A! a � hRdriver;Rtesterixb� is in I i insert ðRdriver, x) in next[i].� If ½S0 ! �S� is in I i then insert ðstart;SÞ in next[i]. If ½S0 ! S�� is in I i then insertðend;EOIÞ in next[i] and ‘‘accept’’ in action½i;EOI�.

3.
The entries for state i and nonterminals X of the goto part are determined as follows:� If ½A! a � hðRdriver;RtesteriXb� is I i and GOTOðI i;Rtester;X Þ ¼ I j then insert ‘‘Rtester:
j’’ in goto½i;X �.� If ½A! a � Rdriver Xb� or ½A! �Xb� is in I i and GOTOðI i;;;X Þ ¼ I j then insert ‘‘T:

j’’ in goto½i;X �.

Appendix B. LR–XpLR equivalence

Theorem B.1. Let G0 ¼ GðmapðXPGÞÞ. G0 is LR(0) iff XPG is XpLR(0).

Proof. ð)Þ If G0 is LR(0), we need to prove that XPG is XpLR(0). Let us contradict thethesis by supposing that XPG is not XpLR(0). Then we need to prove that also thehypothesis is contradicted, i.e., G0 is not LR(0). Thus, let us suppose that XPG is notXpLR(0). This implies that its XpLR(0) parsing table has at least one of the followingtypes of conflicts: shift/shift case, goto/goto case, shift/shift othercase, goto/goto othercase,


and positional conflicts. In the following we analyze each XpLR(0) conflict, detect theXpLR(0) items raising them, and use Proposition 4.1 to prove that there must exist acorresponding set of LR(0) items generated from G0 yielding conflicts in the associatedLR(0) parsing table.

Shift/shift case (goto/goto case, respectively): There is only one way for an XpLR(0) setof items K to present a shift/shift case conflict (goto/goto case conflict, resp.). The set ofitems K must contain at least two kernel items k1 and k2 with the dot preceding a terminal(nonterminal, respectively). The sequences of driver relations right after the dot must beequal, whereas the sequences of tester relations must not be mutually exclusive to have aconflict. Thus, K contains the following items:

K : A! a � hRdriveri ;Rtesteriixiþ1b, (k1)

B! g � hRdriveri ;R0testeriixiþ1d, (k2)

. . .

with a; g 2 LðPREFÞ, Rdriveri 2 Rd, Rtesteri , R0testeri2 Rt, b; d 2 LðSUFFÞ, xiþ1 2 T (xiþ1 2

N, respectively).Then, the map-equivalent set of items K 0 derived from G0 will contain the following

items:

K 0 : A! mðaÞ �DRixiþ1TRiaimðbÞ, (k10

)

B! mðgÞ �DRixiþ1TR0ia0imðdÞ, (k2

0

)

DRi! �

. . .

with mðaÞ;mðgÞ 2 LðPREF0Þ, DRi 2 DR, TRi, TR0i 2 TR, ai; a0i 2 FICT , mðbÞ;mðdÞ 2LðSUFF0Þ.

By executing the goto operation twice, on DRi first and xiþ1 then, we reach a set of itemsI 0 containing the following items:

I 0 : A! mðaÞDRixiþ1 � TRiaimðbÞ, (i10

)

B! mðgÞDRixiþ1 � TR0ia0imðdÞ, (i2

0

)

TRi! �

TR0i! �

. . .

which presents a reduce/reduce conflict involving two different TR productions. As aconsequence, if the parsing table on XPG contains a shift/shift case conflict or a goto/gotocase conflict, then the parsing table built on G0 must contain a reduce/reduce conflict. Thisleads to a contradiction since the hypothesis states that G0 is LR(0).


Shift/shift othercase (goto/goto othercase, respectively): There are two ways for anXpLR(0) set of items K to present a shift/shift othercase conflict (goto/goto othercaseconflict, resp.).

Case 1: In the first case, the set of items K must contain two kernel items k1 and k2 withthe dot preceding a terminal (nonterminal, respectively). The two items must have equalsequences of driver relations right after the dot, and exactly one of the two must have anempty sequence of tester relations right after the dot.


B! g � hRdriveriixiþ1d, (k2)

. . .

where a, g 2 LðPREFÞ, Rdriveri 2 Rd, Rtesteri 2 Rt, b; d 2 LðSUFFÞ, xiþ1 2 T (xiþ1 2 N,respectively).Then, the map-equivalent set of items K 0 derived from G0 will contain the following

items:

K 0 : A! mðaÞ �DRixiþ1TRiaimðbÞ, (k10

)

B! mðgÞ �DRixiþ1mðdÞ, (k20

)

DRi! �

. . .

where mðaÞ;mðgÞ 2 LðPREF0Þ, DRi 2 DR, TRi 2 TR, ai 2 FICT , mðbÞ;mðdÞ 2 LðSUFF0Þ.By executing the goto operation twice, on DRi and xiþ1 successively, we reach a set of

items I 0 containing the following items:

I 0 : A! mðaÞDRixiþ1 � TRiaimðbÞ, (i10

)

B! mðgÞDRixiþ1 �mðdÞ, (i20

)

TRi! �

. . .

If mðdÞ is empty, then the set of items contains a reduce/reduce conflict on the TRi

production and the item i20, otherwise mðdÞ starts with the vsymbol type DRiþ1, so thatalso the item ‘‘DRiþ1! �’’ must have been added to I 0 by closure. In this case there wouldbe a reduce/reduce conflict involving the DR and the TR productions.

Case 2: In the second case, the set of items K must contain a kernel item k1 and anonkernel item k2, both with the dot preceding a terminal (nonterminal, respectively).


B! �xiþ1d, (k2)

. . .


where a 2 LðPREFÞ, Rdriveri 2 Rd, Rtesteri 2 Rt, b; d 2 LðSUFFÞ xiþ1 2 T (xiþ1 2 N,respectively).

In this case K must also contain at least another kernel item k0, from which k2 is derivedby closure. The two kernel items k0 and k1 must have equal sequences of driver relationsright after the dot, otherwise there would not be the conflict:

X ! s � hRdriveri ;riYg, (k0)

where s 2 LðPREFÞ, Y 2 N, Y(�Bl, l 2 LðSUFFÞ, Rdriveri 2 Rd, r 2 LðRt?Þ,

g 2 LðSUFFÞ.Then, the map-equivalent set of items K 0 derived from G0 will contain the following

items:

K 0 : X ! mðsÞ �DRiYmðrÞmðgÞ, (k00

)

A! mðaÞ �DRixiþ1TRiaimðbÞ, (k10

)

DRi! �

where mðsÞ;mðaÞ 2 LðPREF0Þ, DRi 2 DR, TRi 2 TR, mðrÞ 2 LððTR a)?), ai 2 FICT ,mðbÞ;mðgÞ 2 LðSUFF0Þ.

By executing the goto operation on DRi we reach a set of items J 0 containing thefollowing items:

J 0 : X ! mðsÞDRi � YmðrÞmðgÞ, (j00

)

A! mðaÞDRi � xiþ1TRiaimðbÞ, (j10

)

B! �xiþ1mðdÞ. (j20

)

Notice the presence of the item B! �xiþ1 mðdÞ. We prove that it is generated by closure fromY. In fact, we know from the hypothesis that Y(

�Bl with l 2 LðSUFF0Þ; this means that

there exists in XPG a sequence possibly empty of nonterminals B1;B2; . . . ;Bn, such that

Y ( B1l1 ( B2l2 ( � � � ( Bnln with l1; l2; . . . ; ln 2 LðSUFFÞ,

and there exist productions Bn ! Blnþ1, and B! xiþ1d, with lnþ1; d 2 LðSUFFÞ.According to the mapping rules this means that there must be a similar derivationgenerated by G0:

Y ) B1mðl1Þ ) B2mðl2Þ ) � � � ) BnmðlnÞ with mðl1Þ;mðl2Þ; . . . ;mðlnÞ 2 LðSUFF0Þ,

and productions Bn ! Bmðlnþ1Þ, and B! xiþ1mðdÞ, with mðlnþ1Þ;mðdÞ 2 LðSUFF0Þ, sothat when the dot precedes the string YmðrÞmðgÞ we can derive the item B! �xiþ1mðdÞ byclosure through the nþ 2 productions seen above.

By executing the goto operation on xiþ1 we reach a set of items I 0 containing thefollowing items:

I 0 : A! mðaÞDRixiþ1 � TRiaimðbÞ, (i1)

B! xiþ1 �mðdÞ, (i2)

TRi! �


Again, if mðdÞ is empty then I 0 contains a reduce/reduce conflict on the TRi production andthe item i2, otherwise mðdÞ starts with the vsymbol type DRiþ1, so that also the item‘‘DRiþ1 ! �’’ must have been added to I 0 by closure. Thus, also in this case there is areduce/reduce conflict involving the DR and the TR productions.

Positional conflicts: There is only one way for an XpLR(0) set of items K to present apositional conflict. The set of items K must contain at least two incomplete kernel items k1and k2 having equal sequences of driver relations and different vsymbol types followingthe dot. No constraints are imposed on the sequences of tester relations. Thus, K mustcontain the following items:

K : A! a � hRdriveri ;rixiþ1b, (k1)

B! g � hRdriveri ;fiyiþ1d, (k2)

where a, g 2 LðPREFÞ, Rdriveri 2 Rd, r, f 2 LðRt?Þ, b, d 2 LðSUFFÞ, xiþ1, yiþ1 2

T ðxiþ1; yiþ1 2 N, respectively).The map-equivalent set of items K 0 derived from G0 will then contain the following

items:

K 0 : A! mðaÞ �DRixiþ1mðrÞmðbÞ, (k10

)

B! mðgÞ �DR0iyiþ1mðfÞmðdÞ, (k20

)

DRi! �

DR0i! �

mðaÞ;mðgÞ 2 LðPREF0Þ;DRi;DR0i 2 DR;mðrÞ;mðfÞ 2 LððTR aÞ?Þ;mðbÞ;mðdÞ 2 LðSUFF0Þ

with a reduce/reduce conflict involving two different DR productions. Again, thiscontradicts the hypothesis.Thus, we can conclude that if G0 is an LR(0) grammar obtained through the application

of the mapping rules 1–4 to an extended positional grammar XPG, then we can state thatXPG is an XpLR(0) grammar.ð(Þ Let XPG be an XpLR(0) grammar; we need to prove that the grammar G0 ¼

GðmapðXPGÞÞ is LR(0). We will prove this by assuming that G0 is not LR(0) and byshowing that such a hypothesis leads to a contradiction. In order for G0 to be non-LR(0) itsparsing table must contain at least a shift/reduce or a reduce/reduce conflict. FromProposition 4.2 the parsing table derived from G0 can never present a shift/reduce conflict,thus in the following we show that each different type of reduce/reduce conflict leads to aparticular conflict in the parsing table derived from XPG.We distinguish different types of reduce/reduce conflicts produced by G0 depending on

the types of productions involved. Let us recall that G0 contains three types of productions,namely, ordinary, TR, and DR productions. Therefore, the number of possible reduce/reduce types of conflicts caused by G0 is given by the six pairwise combinations of them.Fig. 17 summarizes all the correspondences between conflicts caused by G0 and XPG that

we intend to prove. As an example, the first row of the table can be read as follows: ‘‘areduce/reduce involving an ordinary production and a DR production in G0 implies a shift/

Type of reduce/reduce conflict caused by G Type of conflict caused by XPG

(ordinary, DR) shift/reduce

(ordinary, ordinary) reduce/reduce

(TR, TR) shift/shift or goto/goto case

(TR, ordinary)

(TR, DR)shift/shift or goto/goto othercase

(DR, DR) positional

Fig. 17. Correspondence between conflicts caused by G0 and XPG.


reduce conflict caused by XPG’’. The correctness of such table implies that if XPG is anXpLR(0) grammar, then the grammar G0 is an LR(0) grammar. In what follows we willprove the six cases singularly.

1. (ORDINARY, DR): Since XPG does not contain empty productions, this case occurswhen G0 generates a set of items I 0 that contains at least one complete item like i1, and atleast one item like i2 with the dot preceding a DR vsymbol type on the RHS:

I 0 : A! mðaÞ� (i1)

B! mðbÞ �DRixiþ1mðrÞmðgÞ (i2)

DRi! �

. . .

xiþ1 2 N [ T; mðaÞ; mðbÞ 2 LðPREF0Þ;mðrÞ 2 LððTR aÞ?Þ; mðgÞ 2 LðSUFF0Þ.

There must exist a map-equivalent set of items I generated from XPG containing thefollowing items:

I : A! a�

B! b � hRdriveri ;rixiþ1g

. . .

a;b 2 LðPREFÞ; Rdriveri 2 Rd; r 2 LðRt?Þ; g 2 LðSUFFÞ.

Hence, there should have been a shift/reduce conflict generated by XPG, which wouldcontradict the hypothesis.

2. (ORDINARY, ORDINARY): This case occurs when the grammar G0 generates a setof items I 0 containing two or more complete items. As an example, let us suppose that I 0

contains two complete items i1 and i2. They should be terminated by the same vsymboltype, because it is the last scanned vsymbol type in both of them and from the hypothesisthe two items are not empty.

I 0 : A! mðaÞxi� (i1)


B! mðbÞxi� (i2)

. . .

xi 2 N [ T; mðaÞ; mðbÞ 2 LðPREF0ðDRÞÞ [ f�g.

Thus, there must exist a map-equivalent set of items I generated from XPG containing thefollowing items:

I : A! axi�

B! bxi�

. . .

a; b 2 LðPREFhRd; iÞ [ f�g.

Hence, there should have been a reduce/reduce conflict generated by XPG, which wouldcontradict the hypothesis.3. (TR, TR): This case occurs when I 0 contains two or more items with the dot preceding

different TR vsymbol types. By following similar arguments as above, I 0 will have thefollowing structure:

I 0 : A! mðaÞDRixiþ1 � TRiaimðdÞ (i1)

B! mðbÞDRixiþ1 � TRjajmðgÞ (i2)

TRi! �

TRj! �

. . .

xiþ1 2 N [ T, mðaÞ, mðbÞ 2 LðPREF0Þ, DRi 2 DR, TRi, TRj 2 TR, ai, aj 2 FICT, mðdÞ,mðgÞ 2 LðSUFF0Þ.Thus, there must exist the following set of items J 0 generated by the grammar G0, such

that gotoðJ 0; xiþ1Þ ¼ I 0:

J 0 : A! mðaÞDRi � xiþ1TRiaimðdÞ (j1)

B! mðbÞDRi � xiþ1TRjajmðgÞ (j2)

. . .

Moreover, there must exist the following set of items K 0 generated by the grammar G0, suchthat gotoðK 0;DRiÞ ¼ J 0:

K 0 : A! mðaÞ �DRixiþ1TRiaimðdÞ (k1)

B! mðbÞ �DRixiþ1TRjajmðgÞ (k2)

DRi! �

. . .


The map-equivalent set of items K generated from XPG contains the following conflictingitems:

K : A! a � hRdriveri ;Rtesteriixiþ1d

B! b � hRdriveri ;Rtesterjixiþ1g

. . .

a;b 2 LðPREFÞ;Rdriveri 2 Rd, Rtesteri , Rtesterj 2 Rt, d, g 2 LðSUFFÞ.Thus, there should have been a shift/shift case conflict generated by XPG, which would

contradict the hypothesis.4. (TR, ORDINARY): This case occurs when G0 generates a set of items I 0 containing

one or more complete items like i1 and one or more items like i2 with the dot preceding avsymbol type TRi 2 TR:

I 0 : A! mðaÞxiþ1� (i1)

B! mðbÞDRixiþ1 � TRiaimðgÞ (i2)

TRi! �

. . .

xiþ1 2 N [ T, mðaÞ 2 LðPREF0ðDRiÞÞ [ f�g, mðbÞ 2 LðPREF0Þ, DRi 2 DR, TRi 2 TR,ai 2 FICT, mðgÞ 2 LðSUFF0Þ.

We must distinguish two cases, according to the following two alternatives: mðaÞ ¼ � ormðaÞa�. In the first case, there must exist the following set of items J 0 such thatgotoðJ 0;xiþ1Þ ¼ I 0:

J 0 : A! mðaÞ � xiþ1 (j1)

B! mðbÞDRi � xiþ1TRiaimðgÞ (j2)

. . .

This means that there must exist the following item in J 0 from which j1 is derived byclosure:

X ! mðsÞDRi � YmðrÞmðtÞ (j0)

mðsÞ 2 LðPREF0Þ, DRi 2 DR, Y 2 N, Y(�AmðlÞ, mðlÞ, mðtÞ 2 LðSUFF0Þ,

mðrÞ 2 LððTRiaiÞ?Þ, TRi 2 TR, ai 2 FICT.Notice that there must be the same vsymbol type DRi preceding the dot in j0 and j2.

Moreover, Y cannot be the vsymbol type S following SP, otherwise DRi ¼ SP and this isnot possible since SP can only occur once in a set of items. Thus, there must exist thefollowing set of items K 0 such that gotoðK 0;DRiÞ ¼ J 0:

K 0 : X ! mðsÞ �DRiYmðrÞmðtÞ (k0)

B! mðbÞ �DRixiþ1TRiaimðgÞ (k2)

. . .


The map-equivalent set of items K generated from XPG contains the following items:

K : X ! s � hRdriveri ;riYt

B! b � hRdriveri ;Rtesteriixiþ1g

A! �xiþ1

. . .

s, b 2 LðPREFÞ, Rdriveri 2 Rd, Rtesteri 2 Rt, r 2 LðRt?Þ, t, g 2 LðSUFFÞ.We can notice that the set of items K presents a shift/shift or goto/goto othercase conflict

depending on whether xiþ1 is a terminal or nonterminal, which would contradict thehypothesis.If mðaÞa� there must exist the following set of items J 0 such that gotoðJ 0; xiþ1Þ ¼ I 0:

J 0 : A! mðaÞ � xiþ1 (j1)

B! mðbÞDRi � xiþ1TRiaimðgÞ (j2)

. . .

and the following set of items K 0 such that gotoðK 0;DRiÞ ¼ J 0:

K 0 : A! mðfÞ �DRixiþ1 (k1)

B! mðbÞ �DRixiþ1TRiaimðgÞ (k2)

. . .

Thus, the map-equivalent set of items K generated from XPG contains the following items:

K : A! f � hRdriveri ; ixiþ1

B! b � hRdriveri ;Rtesteriixiþ1g

. . .

f;b 2 LðPREFÞ;Rdriveri 2 Rd; Rtesteri 2 Rt; g 2 LðSUFFÞ.

Thus, also in this case there is a shift/shift or goto/goto othercase conflict.5. (TR, DR): In this case the set of items I 0 generated from G0 must contain at least an

item i1 with the dot preceding a vsymbol type DRj 2 DR and at least an item i2 with thedot preceding a vsymbol type TRi 2 TR. By the nature of the mapping rules, I 0 must havethe following structure:

I 0 : A! mðaÞxi �DRjxjmðrÞmðlÞ (i1)

B! mðbÞDRixi � TRiaimðgÞ (i2)

DRj! �

TRi! �

. . .

xi, xj 2 N [ T, mðaÞ 2 LðPREF0ðDRiÞÞ [ f�g;DRi, DRj 2 DR, TRi 2 TR, ai 2 FICT,mðrÞ 2 LððTRaÞ?Þ, mðbÞ 2 LðPREF0Þ, mðlÞ, mðgÞ 2 LðSUFF0Þ.


Also here we must distinguish two cases corresponding to the two alternatives: mðaÞ ¼ �and mðaÞa�.

Case 1:

I 0 : A! xi �DRjxjmðrÞmðlÞ (i1)


DRj! �

TRi! �

. . .

There must exist the following set of items J 0 such that gotoðJ 0;xiÞ ¼ I 0:

J 0 : A! �xiDRjxjmðrÞmðlÞ (j1)

B! mðbÞDRi � xiTRiaimðgÞ (j2)

. . .

This means that there must exist the following item in J 0, from which j1 is derived byclosure:

X ! mðsÞDRi � YmðrÞmðtÞ (j0)

mðsÞ 2 LðPREF0Þ, DRi 2 DR, Y 2 N, Y(�AmðlÞ, mðlÞ, mðtÞ 2 LðSUFF0Þ,

mðrÞ 2 LððTRiaiÞ?Þ, TRi 2 TR, ai 2 FICT.Again, Y can be the vsymbol type S, but not the one following SP, otherwise DRi ¼ SP

and this is not possible since SP can only occur once in an item set. Thus, there must existthe following set of items K 0 such that gotoðK 0;DRiÞ ¼ J 0:

K 0 : X ! mðsÞ �DRiYmðrÞmðtÞ (k0)

B! mðbÞ �DRixiTRiaimðgÞ (k2)

DRi! �

. . .

Consequently, the map-equivalent set of items K generated from XPG contains thefollowing items:

K : X ! s � hRdriveri ; riYt

B! b � hRdriveri ;Rtesteriixig

A! �xi

. . .

a;b 2 LðPREFÞ; Rdriveri 2 Rd; Rtesteri 2 Rt; r 2 LðRt?Þ; t; g 2 LðSUFFÞ.

The derivation of the item A! �xi can be proven by similar arguments used in the(TR,ORDINARY) case. We can notice that the set of items K presents a shift/shift or goto/goto othercase conflict, which would contradict the hypothesis.


Case 2:



DRj! �

TRi! �

. . .

xi, xj 2 N [ T, mðaÞ ¼ mðtÞDRi, mðtÞ, mðbÞ 2 LðPREF0Þ, DRi, DRj 2 DR, TRi 2 TR,ai 2 FICT, mðrÞ 2 LððTR aÞ?Þ, mðlÞ, mðgÞ 2 LðSUFF0Þ.There must exist the following set of items J 0 such that gotoðJ 0;xiÞ ¼ I 0:

J 0 : A! mðaÞ � xiDRjxjmðrÞmðlÞ (j1)

B! mðbÞDRi � xiTRiaimðgÞ (j2)

. . .

and the following set of items K 0 such that gotoðK 0;DRiÞ ¼ J 0:

K 0 : A! mðtÞ �DRixiDRjxjmðrÞmðlÞ (k1)

B! mðbÞ �DRixiTRiaimðgÞ (k2)

DRi! �

. . .

Consequently, the map-equivalent set of items K generated from XPG contains thefollowing items:

K : A! t � hRdriveri ; ixihRdriverj ;riYl

B! b � hRdriveri ;Rtesteriixig

. . .

t; b 2 LðPREFÞ; Rdriveri ;Rdriverj 2 Rd; Rtesteri 2 Rt; r 2 LðRt?Þ; l; g 2 LðSUFFÞ.

Thus, also in this case K presents a shift/shift or goto/goto othercase conflict, which wouldcontradict the hypothesis.6. (DR, DR): This case occurs when I 0 contains two or more items with the dot

preceding different DR vsymbol types. As an example, let us consider the following twoconflicting items i1 and i2:


B! mðbÞxi �DRkxkmðyÞmðgÞ (i2)

DRj! �


DRk! �

. . .

xi; xj;xk 2 N [ T, mðaÞ;mðbÞ 2 LðPREF0ðDRiÞÞ [ f�g, DRj, DRk, DRi 2 DR, mðrÞ,mðyÞ 2 LððTR aÞ?Þ, mðlÞ, mðgÞ 2 LðSUFF0Þ.

The map-equivalent set of items I generated from XPG contains the following items:

I : A! axi � hRdriverj ; ixjrl

B! bxi � hRdriverk ; ixkyg

. . .

a;b 2 LðPREFðhRdriveri ; iÞÞ, Rdriveri , Rdriverj , Rdriverk 2 Rd, r, y 2 LðRt?Þ, l, g 2 LðSUFFÞ.Thus, there would also be a positional conflict generated from XPG, which would

contradict the hypothesis. &

Appendix C. Resolving conflicts in non-LR(0) translation schemes

In Section 4.3 we have shown how the (ORDINARY,ORDINARY) conflicts areeliminated from the translation schemes following the heuristics defined in the XpLRmethodology. In the following we describe the remaining types of reduce/reduce conflictsand the transformation techniques that we use to eliminate them from the grammar.

Case 2 (DR, DR):


B! mðbÞxi �DRkxkmðyÞmðgÞ (i2)

DRj! �

DRk! �

. . .

xi, xj, xk 2 N [ T, mðaÞ, mðbÞ 2 LðPREF0ðDRiÞÞ [ f�g, DRj, DRk, DRi 2 DR, mðrÞ,mðyÞ 2 LððTR aÞ?Þ, mðlÞ, mðgÞ 2 LðSUFF0Þ.

In the XpLR methodology the set of items is partitioned according to the driverrelations. This establishes an evaluation order of the driver relations. We can resolve thisconflict by introducing the nonterminal ‘‘next1’’ in the two conflicting productions:

A! mðaÞxinext1xjmðrÞmðlÞ

B! mðbÞxinext1xkmðyÞmðgÞ

and this empty production:

next1! �
f let Rseq be the ordered sequence of parameters of Fetch_Vsymbol
in the conflicting DR productions
do { let R the first element in Rseq;
ip ¼ Fetch_Vsymbol(R);
if ip is not null then next_vsymbol¼ ðDp½ip�;


else delete R from Rseq;

} while(ip is null and Rseq is not empty);

if (ip is null) { emit ‘‘syntax error’’; exit; }
}
Case 3 (TR, TR): This case occurs when I 0 contains two or more items with the dotpreceding different TR vsymbol types. Thus, I 0 will have the following structure:

I 0 : A! mðaÞDRixiþ1 � TRiaimðdÞ (i1)

B! mðbÞDRixiþ1 � TRjajmðgÞ (i2)

TRi! �

TRj! �

. . .

xiþ1 2 N [ T, mðaÞ, mðbÞ 2 LðPREF0Þ, DRi 2 DR, TRi, TRj 2 TR, ai, aj 2 FICT, mðdÞ,mðgÞ 2 LðSUFF0Þ.This conflict is generated from a shift/shift or goto/goto case conflict in the XPG

grammar. The heuristics used by the XpLR methodology to eliminate these ambiguities isto order the tester relations and to execute the first shift or goto whose condition is true.This heuristic can be simulated by introducing a new nonterminal ‘‘next1’’ and by definingit through the following empty production:

next1! �
{ let Rseq be the ordered sequence of tester relation that are parameters of
Test in the conflicting TR productions
if Test(RELh, xiþ1) is true for each RELh in R
then next_vsymbol¼ ak; // where ak is the false terminal following R
exit;
} while(Rseq is not empty);

if (Rseq is empty) { emit ‘‘syntax error’’; exit; }

}

Finally, we introduce the nonterminal ‘‘next1’’ in the conflicting productions:

A! mðaÞDRixiþ1next1aimðdÞ

B! mðbÞDRixiþ1next1ajmðgÞ

Case 4 (ORDINARY, DR):This case occurs when G0 generates a set of items I 0 that contains at least one complete

item, and at least one item with the dot preceding a DR vsymbol type on the RHS:

I 0 : A! mðaÞ� (i1)


B! mðbÞ �DRixiþ1mðrÞmðgÞ (i2)

DRi! �

. . .

xiþ1 2 N [ T, mðaÞ, mðbÞ 2 LðPREF0Þ, mðrÞ 2 LððTR aÞ?Þ, mðgÞ 2 LðSUFF0Þ.This conflict is generated by a shift/reduce conflict in the XpLR parsing table. In this

case, I is split by the function Partition in an ordered sequence of item sets on the base ofthe driver relations, and the parser gives priority to the shift of xiþ1. So we can tackle thisconflict by introducing a new nonterminal ‘‘next1’’ and by defining it through the followingempty production:

next1! �
{ let Rseq be the ordered sequence of parameters of Fetch_Vsymbol
in the conflicting DR productions
ip ¼ Fetch_Vsymbol(R);
if ip is not null then next_vsymbol ¼ Dp½ip�;

} while(ip is null and Rseq is not empty);

next_vsymbol ¼ ak;
}
then we introduce the nonterminal ‘‘next1’’ in the conflicting productions:

A! mðaÞnext1ak

B! mðbÞnext1xiþ1mðrÞmðgÞ

Case 5 (ORDINARY, TR): This case occurs when G0 generates a set of items I 0

containing one or more complete items and one or more items with the dot preceding avsymbol type TRi 2 TR:

I 0 : A! mðaÞxiþ1� (i1)

B! mðbÞDRixiþ1 � TRiaimðgÞ (i2)

TRi! �

. . .

xiþ1 2 N [ T, mðaÞ 2 LðPREF0ðDRiÞÞ [ f�g, mðbÞ 2 LðPREF0Þ, DRi 2 DR, ai 2 FICT,mðgÞ 2 LðSUFF0Þ.

Moreover, let us suppose that

TRi! �
{ if Test(RELh, xiþ1) is true for each RELh in Rtesteri
then next_vsymbol¼ ai;
else {emit ’’syntax error’’; exit;}
}

We must distinguish two cases, according to the following two alternatives: mðaÞ ¼ � ormðaÞa�.


In the first case, there is a shift/shift or goto/goto othercase conflict in XPG dependingon whether xiþ1 is a terminal or nonterminal. We can tackle this conflict by introducing anew nonterminal ‘‘next1’’ and by defining it through the following rule:

next1! �
{ let j: X ! mðsÞ DRi �Y mðrÞ mðtÞ the kernel item such that i1 is inClosure(j)
if mðrÞ ¼ TRj a0 then

if in the order sequence of conditioned actions, the condition verified
by Tj precedes the condition verified by TRi
then if Test(RELh, xiþ1) is true for each RELh in Rtesterj

then { next_vsymbol¼ aj; exit; }

if Test(RELh, xiþ1) is true for each RELh in Rtesteri

then next_vsymbol¼ ai;
else{emit ‘‘syntax error’’; exit;}
}

then we introduce the nonterminal ‘‘next1’’ and a false terminal aj in the conflictingproduction:

A! xiþ1next1aj

B! mðbÞDRixiþ1next1aimðgÞ

Also in the case mðaÞa� there is a shift/shift or goto/goto othercase conflict. By followingsimilar arguments as above, we can tackle this conflict by introducing a new nonterminal‘‘next1’’ and by defining it through the following empty production:

next1! �
{ if TestðRELh; xiþ1Þ is true for each RELh in Rtesteri
then next_vsymbol ¼ ai;
else next_vsymbol ¼ aj;
}

then we introduce the nonterminal ‘‘next1’’ and a false terminal aj in the conflictingproduction:

A! mðaÞxiþ1next1aj

B! mðbÞDRixiþ1next1aimðgÞ

Case 6 (TR, DR): In this case the set of items I 0 generated from G0 must contain at leastan item i1 with the dot preceding a vsymbol type DRj 2 DR and at least an item i2 with thedot preceding a vsymbol type TRi 2 TR:



DRj! �

TRi! �

. . .


xi; xj 2 N [ T, mðaÞ 2 LðPREF0ðDRiÞÞ [ f�g, DRi, DRj 2 DR, TRi 2 TR, ai 2 FICT,mðrÞ 2 LððTR aÞ?Þ, mðbÞ 2 LðPREF0Þ, mðlÞ, mðgÞ 2 LðSUFF0Þ.

This conflict is generated by a shift/shift or goto/goto case conflict in XPG. We cantackle this conflict by introducing a new nonterminal ‘‘next1’’ and define it through thefollowing empty production:

next1! �
{ if Test(RELh, xi) is true for each RELh in Rtesteri
then next_vsymbol ¼ ai;
else { ip ¼ Fetch_VsymbolðRdriveri ; xjÞ;
if ip is not null then next_vsymbol¼ Dp[ip];

else { emit ‘‘syntax error’’; exit;}
}
}

then we introduce the nonterminal ‘‘next1’’ in the conflicting productions:

A! mðaÞxi next1xjmðrÞmðlÞ

B! mðbÞDRixi next1aimðgÞ

References

[1] K. Marriott, B. Meyer, Visual Language Theory, Springer, New York, 1998.

[2] S.C. Johnson, YACC: Yet Another Compiler Compiler, Bell Laboratories, Murray Hills, NJ, 1978.

[3] G. Costagliola, V. Deufemia, G. Polese, A framework for modeling visual notations with applications to

software engineering, ACM Transactions on Software Engineering and Methodology 13 (4) (2004) 431–487.

[4] G. Costagliola, A. De Lucia, S. Orefice, G. Tortora, A parsing methodology for the implementation of visual

systems, IEEE Transactions on Software Engineering 23 (12) (1997) 777–799.

[5] C. Donnelly, R. Stallman, Bison: the YACC-compatible parser generator, hhttp://www.combo.org/bison/i,

1995.

[6] A.V. Aho, R. Sethi, J.D. Ullman, Compilers Principles, Techniques, and Tools, Addison-Wesley Series in

Computer Science, 1987.

[7] D. Harel, On visual formalisms, Communications of the ACM 31 (5) (1988) 514–530.

[8] J. Feder, Plex languages, Information Science 3 (1971) 225–241.

[9] D. Rubine, Specifying gestures by example, Computer Graphics 25 (1991) 329–337.

[10] G. Costagliola, V. Deufemia, F. Ferrucci, C. Gravino, On the pLR parsability of visual languages, in:

Proceedings of IEEE International Symposium on Human-Centric Computing Languages and Environments

(HCC’01), IEEE Computer Society Press, Stresa, Italy, September 2001, pp. 49–50.

[11] R. Heckel, J. Kuster, G. Taentzer, Confluence of typed attributed graph transformation systems, in:

A. Corradini, H. Ehrig, H.-J. Kreowski, G. Rozenberg (Eds.), Proceedings of First International Conference

on Graph Transformation, Barcelona, Spain, Lecture Notes on Computer Science, vol. 2505, 2002, Springer,

Berlin, pp. 161–176.

[12] G. Costagliola, S. Orefice, G. Polese, G. Tortora, M. Tucci, Automatic parser generation for pictorial

languages, in: Proceedings of IEEE Symposium on Visual Languages (VL’93), IEEE Computer Society

Press, Bergen, Norway, 1993, pp. 306–313.

[13] K. Wittenburg, Earley-style parsing for relational grammars, in: Proceedings of Eighth IEEE International

Workshop on Visual Languages, IEEE Computer Society Press, Seattle, WA, USA, 1992, pp. 192–199.

[14] K. Wittenburg, L. Weitzman, Relational grammars: theory and practice in a visual language interface

for process modeling, in: K. Marriott, B. Meyer (Eds.), Visual Language Theory, Springer, New York, 1998,

pp. 193–217.

[15] K. Marriott, Constraint multiset grammars, in: Proceedings of 10th IEEE Symposium on Visual Languages,

IEEE Computer Society Press, St. Louis, Missouri, 1994, pp. 118–125.

http://www.combo.org/bison/


[16] E.J. Golin, Parsing visual languages with picture layout grammars, Journal of Visual Languages and

Computing 2 (4) (1991) 371–394.

[17] F. Ferrucci, G. Pacini, G. Satta, M. Sessa, G. Tortora, M. Tucci, G. Vitiello, Symbol-relation grammars: a

formalism for graphical languages, Information and Computation 131 (1) (1996) 1–46.

[18] M. Minas, Concepts and realization of a diagram editor generator based on hypergraph transformation,

Science of Computer Programming 44 (2) (2002) 157–180.

[19] J. Rekers, A. Schurr, A graph based framework for the implementation of visual environments, in:

Proceedings of 12th IEEE International Symposium on Visual Languages, IEEE Computer Society Press,

Boulder, Colorado, 1996, pp. 148–157.

[20] J. Rekers, A. Schurr, Defining and parsing visual languages with layered graph grammars, Journal of Visual

Languages and Computing 8 (1) (1997) 27–55.

[21] R.V. Rubin, J. Walker II, E.J. Golin, Early experience with the visual programmer’s workbench, IEEE

Transactions on Software Engineering 16 (10) (1990) 1107–1121.

[22] S. Chok, K. Marriot, Automatic generation of intelligent diagram editors, ACM Transactions on Computer-

Human Interaction 10 (3) (2003) 244–276.

[23] F. Ferrucci, G. Tortora, M. Tucci, G. Vitiello, A system for rapid prototyping of visual languages, in:

Proceedings of IEEE International Symposium on Human-Centric Computing Languages and Environments

(HCC’01), IEEE Computer Society Press, Stresa, Italy, 2001, pp. 382–389.

[24] K. Zhang, D.Q. Zhang, J. Cao, Design, construction, and application of a generic visual language generation

environment, IEEE Transactions on Software Engineering 27 (4) (2001) 289–307.

[25] P. Bottoni, M.F. Costabile, P. Mussio, Specification and dialogue control of visual interaction through visual

rewriting systems, ACM Transactions on Programming Languages and Systems 21 (6) (1999) 1077–1136.

[26] Sandstone Technology Inc., Parsing with Sandstone’s Visual Parseþþ, Technical Report, hhttp://

www.sand-stone.comi, 2001.

[27] G. Costagliola, V. Deufemia, G. Polese, M. Risi, Building syntax-aware editors for visual languages, Journal

of Visual Languages and Computing 16 (6) (2005) 508–540.

http://www.sand-stone.com

http://www.sand-stone.com

Documents

Visual language implementation through standard compiler–compiler techniques