69
Programming errors in traversal programs over structured data RalfL¨ammel 1 and Simon Thompson 2 and Markus Kaiser 1 1 University of Koblenz-Landau, Germany 2 University of Kent, UK Abstract Traversal strategies ` a la Stratego (also ` a la Strafunski and ‘Scrap Your Boilerplate’) provide an exceptionally versatile and uniform means of querying and transforming deeply nested and heterogeneously structured data including terms in functional programming and rewriting, objects in OO programming, and XML documents in XML programming. However, the resulting traversal programs are prone to programming errors. We are specifically concerned with errors that go beyond conservative type errors; ex- amples we examine include divergent traversals, prematurely terminated traversals, and traversals with dead code. Based on an inventory of possible programming errors we explore options of static typing and static analysis so that some categories of errors can be avoided. This exploration generates suggestions for improvements to strategy libraries as well as their underlying programming languages. Haskell is used for illustrations and spec- ifications with sufficient explanations to make the presentation comprehensible to the non-specialist. The overall ideas are language-agnostic and they are summarized accordingly. Keywords: Traversal Strategies, Traversal Programming, Term Rewriting, Stratego, Strafunski, Generic Programming, Scrap Your Boilerplate, Type systems, Static Program Analysis, Functional Programming, XSLT, Haskell. ? The paper and accompanying source code are available online: http://userpages.uni-koblenz.de/ ~ laemmel/syb42 http://code.google.com/p/strafunski/ Preprint submitted to Science of Computer Programming (Version as of 13 February 2012)

Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Programming errors in traversal programs

over structured data

Ralf Lammel1 and Simon Thompson2 and Markus Kaiser1

1 University of Koblenz-Landau, Germany

2 University of Kent, UK

Abstract

Traversal strategies a la Stratego (also a la Strafunski and ‘Scrap Your Boilerplate’)provide an exceptionally versatile and uniform means of querying and transformingdeeply nested and heterogeneously structured data including terms in functionalprogramming and rewriting, objects in OO programming, and XML documents inXML programming.

However, the resulting traversal programs are prone to programming errors. Weare specifically concerned with errors that go beyond conservative type errors; ex-amples we examine include divergent traversals, prematurely terminated traversals,and traversals with dead code.

Based on an inventory of possible programming errors we explore options of statictyping and static analysis so that some categories of errors can be avoided. Thisexploration generates suggestions for improvements to strategy libraries as well astheir underlying programming languages. Haskell is used for illustrations and spec-ifications with sufficient explanations to make the presentation comprehensible tothe non-specialist. The overall ideas are language-agnostic and they are summarizedaccordingly.

Keywords: Traversal Strategies, Traversal Programming, Term Rewriting,Stratego, Strafunski, Generic Programming, Scrap Your Boilerplate, Typesystems, Static Program Analysis, Functional Programming, XSLT, Haskell.

? The paper and accompanying source code are available online:• http://userpages.uni-koblenz.de/~laemmel/syb42• http://code.google.com/p/strafunski/

Preprint submitted to Science of Computer Programming (Version as of 13 February 2012)

Page 2: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Illustrative queries

The abstract syntax of a programming language as a data model

(1) Determine all recursively defined functions.(2) Determine the nesting depth of a function definition.(3) Collect all the free variables in a given code fragment.

The organizational structure of a company as a data model

(1) Total the salaries of all employees.(2) Total the salaries of all employees who are not managers.(3) Calculate the manager / employee ratio in the leaf departments.

Illustrative transformations

The abstract syntax of a programming language as a data model

(1) Inject logging code around function applications.(2) Perform partial evaluation by constant propagation.(3) Perform unfolding or inlining for a specific function.

The organizational structure of a company as a data model

(1) Increase the salaries of all employees.(2) Decrease the salaries of all non-top-level managers.(3) Integrate a specific department into the hosting department.

Fig. 1. Illustrative scenarios for traversal programming.

1 Introduction

Traversal programming

In the context of data programming with XML trees, object graphs, andterms in rewriting or functional programming, consider the general scenar-ios of querying and transforming deeply nested and heterogeneously structureddata. Because of deep nesting and heterogeneity as well as plain structuralcomplexity, data programming may benefit from designated idioms and con-cepts. In this paper, we focus on the notion of traversal programming wherefunctionality is described in terms of so-called traversal strategies [70,41,3]: weare specifically concerned with programming errors in this context.

Let us briefly indicate some application domains for such traversal program-ming. To this end, consider the illustrative scenarios for querying and trans-forming data as shown in Figure 1. The listed queries and transformationsbenefit from designated support for traversal programming. The scenarios as-sume data models for a) the abstract syntax of a programming language andb) the organizational structure of a company within an information system.

2

Page 3: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

AST example: Abstract syntax of a simple functional language

type Block = [Function]

data Function = Function Name [Name] Expr Block

data Expr = Literal Int

| Var Name

| Lambda Name Expr

| Binary Ops Expr Expr

| IfThenElse Expr Expr Expr

| Apply Name [Expr]

data Ops = Equal | Plus | Minus

type Name = String

Company example: Organizational structure of a company

data Company = Company [Department]

data Department = Department Name Manager [Unit]

data Manager = Manager Employee

data Unit = EmployeeUnit Employee

| DepartmentUnit Department

data Employee = Employee Name Salary

type Name = String −− names of employees and departments

type Salary = Float −− salaries of employees

Fig. 2. Illustrative data models (rendered in Haskell). Queries and trans-formation on such data may require traversal programming possiblyprone to programming errors. The data models involve multiple types; recursionis exercised; there is also a type with multiple alternatives. As a result, traversal pro-grammers need to carefully control queries and transformations — leaving room forprogramming errors. Note on language agnosticism: Haskell’s algebraic data typesare chosen here without loss of generality. Other data-modeling notations such asXML schemas or class diagrams are equally applicable.

Suitable data models are sketched in Figure 2 in Haskell syntax. 1

1 Note on Haskell : type definitions are type synonyms. For instance, Name is defined tobe a synonym for Haskell’s String data type. In contrast, data types define new algebraictypes with one or more constructors (say, cases). Consider the data type Expr; it declaresa number of constructors for different expression forms. For instance, the constructor Varmodels variables; there is a constructor component of type Name (in fact, String) andhence the constructor function’s type is Name −> Expr.

3

Page 4: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

A querying traversal essentially visits a compound term, potentially layer bylayer, to extract data from subterms of interest. For instance, salary terms areextracted from a company term when dealing with the scenarios of totalingsalaries of all or some employees in Figure 1.

A transforming traversal essentially copies a compound term, potentially layerby layer, except that subterms of interest may be replaced. For instance, func-tion applications are enriched by logging code when dealing with the corre-sponding program transformation scenario of Figure 1.

Traversal programming with traversal strategies

In this paper, we are concerned with one particular approach to traversalprogramming—the approach of traversal strategies, which relies on so-calledone-layer traversal combinators as well as other, less original combinators forcontrolling and composing recursive traversal. Traversal strategies support anexceptionally versatile and uniform means of traversal. The term strategicprogramming is also in use for this approach to traversal programming [41].For example, the first scenario for the company example requires a traversalstrategy as follows:

• A basic (non-traversing) rule is needed to extract salary from an employee.• The rule is to be iterated over a term by a traversal scheme.• The traversal scheme itself is defined in terms of basic combinators.

The notion of traversal strategies was pioneered by Eelco Visser and collabo-rators [48,71,70] in the broader context of term rewriting. This seminal workalso led to Visser et al.’s Stratego/XT [69,10]—a domain-specific language, infact, an infrastructure for software transformation. Other researchers in theterm rewriting and software transformation communities have also developedrelated forms of traversal strategies; see, e.g., [9,7,67,79].

Historically, strategic programming has been researched considerably in thecontext of software transformation. We refer the reader to [70,68,34,44,46,78]for some published accounts of actual applications of traversal programmingwith traversal strategies.

The Stratego approach inspired strategic programming approaches for otherprogramming paradigms [43,41,72,3,4]. An early functional approach, knownby the name ‘Strafunski’, inspired the ‘Scrap your boilerplate’ (SYB) form ofgeneric functional programming [38,39,40,59,26,27], which is relatively estab-lished today in the Haskell community. SYB, in turn, inspired cross-paradigmvariations, e.g., ‘Scrap your boilerplate in C++’ [53]. Traversal strategies a laStratego were also amalgamated with attribute grammars [31]. Another sim-ilar form of traversal strategies are those of the rewriting-based approach of

4

Page 5: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

HATS [79,76,77].

There are also several, independently developed forms of traversal program-ming: TXL’s strategies [11], adaptive traversal specifications [47,42,1], Cω’sdata access [5], XPath-like queries and XSLT transformations [36,15] as wellas diverse (non-SYB-like) forms of generic functional programming [28,24,60].

Research topic: programming errors in traversal programming

Despite the advances in foundations, programming support and applicationsof traversal programming with strategies, the use and the definition of pro-grammable traversal strategies has remained the domain of the expert, ratherthan gaining wider usage. We contend that the principal obstacle to wideradoption is the severity of some possible pitfalls, which make it difficult to usestrategies in practice. Some of the programming errors that arise are familiar,e.g., type errors in rewrite rules, but other errors are of a novel nature. Theirappearance can be off-putting to the newcomer to the field, and indeed limitthe productivity of more experienced strategists.

Regardless of the specifics of an approach for traversal programming—thepotential for programming errors is relatively obvious. Here are two possibleerrors that may arise for the traversal scenarios of Figure 1:

A programming error implying an incorrect result. In one of the sce-narios for the company example, a query is assumed to total the salariesof all employees who are not managers. One approach is to use two type-specific cases in the traversal such that managers contribute a subtotalof zero whereas regular employees contribute with their actual salary. Thequery should combine the type-specific cases in a traversal that ceases wheneither of the two cases is applicable. A traversal could be incorrect in thatit continues even upon successful application of either case. As a result,managers are not effectively excluded from the total because the traversalwould eventually encounter the employee term inside each manager term.

A programming error implying divergence. In one of the scenarios forthe AST example, a transformation is assumed to unfold or inline a specificfunction. This problem involves two traversals: one to find the definitionof the function, another one to actually affect references to the function.Various programming errors are conceivable here. For instance, the lattertraversal may end up unfolding recursive function definitions indefinitely.Instead, the transformation should not attempt unfolding for any subtermthat was created by unfolding itself. This can be achieved by bottom-uptraversal order.

5

Page 6: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Research objective: advice on library and language improvements

We envisage that the traversal programming of the future will be easier andsafer because it is better understood and better supported. Strategy librariesand the underlying programming languages need to be improved in ways thatprogramming errors in the sense of incorrect strategy application and com-position are less likely. Our research objective is to provide advice on suchimprovements.

We contend that static typing and static program analysis are key tools inavoiding programming errors by detecting favorable or unfavorable behaviorsof strategies statically, that is, without running the program. Advanced typesare helpful in improving libraries so that their abstractions describe more ac-curately the contracts for usage. Static program analysis is helpful in verifyingcorrectness of strategy composition.

Accordingly, in this paper, we identify pitfalls of strategic programming anddiscover related properties of basic strategy combinators and common librarycombinators. Further, we research the potential of static typing and staticprogram analysis with regard to our research objective of providing advice onimproved libraries and languages for strategic programming.

Summary of the paper’s contributions 2

(1) We provide a fine-grained inventory of programming errors in traversalprogramming with traversal strategies; see §3. To this end, we reflecton the use of basic strategy combinators and library abstractions in thesolution of traversal problems.

(2) We explore the utility of static typing as means of taming traversal strate-gies; see §4. This exploration clarifies what categories of programming er-rors can be addressed, to some extent, by available advanced static typesystems. We use examples written in Haskell 2010, with extensions, forillustrations.

(3) We explore the utility of static program analysis as means of tamingtraversal strategies; see §5. Static analysis is an established technique

2 Note on the relationship to previous work by the authors: a short version of this paperappeared in the proceedings of the 8th Workshop on Language Descriptions, Tools andApplications (LDTA 2008), published by ENTCS. The present paper is generally moredetailed and updated, but its distinctive contribution is research on static program analysisfor traversal strategies in §5; only one of the analyses was sketched briefly in the shortversion. The present paper also takes advantage of other previously published work by twoof the present authors in so far that some of the strategy properties discovered by [30]help steering research on programming errors in the present paper. Programming errors arenot systematically discussed in [30]. Also, type systems and static program analysis playedno substantial role in said work. Instead, the focus was on demonstrating the potential oftheorem proving in the context of traversal strategies.

6

Page 7: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

for dealing with correctness and performance properties of software. Westudy success and failure behavior, dead code, and termination.

Haskell en route

In this paper, we use Haskell for two major purposes: a) illustrations of statictyping techniques for taming strategies; b) specifications of algorithms forstatic program analysis. All ideas are also explained informally and specificHaskell idioms are explained—as they are encountered—so that the presen-tation should also be comprehensible to the non-specialist. The overall ideasof taming strategies by static typing and static program analysis are not tiedto Haskell; they are largely agnostic to the language—as long as it supportsstrategic programming style. Accordingly, all subsections of §4 and §5 beginwith a language-agnostic, informal advice for improving strategy libraries andthe underlying languages.

Scope of this work

Our work ties into traversal strategies a la Stratego (also a la Strafunski and‘Scrap Your Boilerplate’) in that it intimately connects to idiomatic and con-ceptual details of this class of approaches: the use of one-layer traversal combi-nators, the style of mixing generic and problem-specific behavior, the style ofcontrolling traversal through success and failure, and a preference for certainimportant traversal schemes.

Nevertheless, our research approach and some of the results may be ap-plicable to other forms of traversal programming, e.g., adaptive program-ming [56,47,42], (classic) generic functional programming [28,24,60], XMLqueries or transformations (based on mainstream languages). However, wedo not explore such potential. As a brief indication of potential relevance, letus consider XSLT [75] with its declarative processing model for XML that canbe used to implement transformations between XML-based documents. AnXSLT transformation consists of a collection of template rules, each of whichdescribes which XML (sub-)trees to match and how to process them. Process-ing begins at the root node and proceeds by applying the best fitting templatepattern, and recursively processing according to the template. Hence, traver-sal is implicit in the XSLT processing model, but the templates control thetraversal. Several problems that we study in the present paper—such as deadcode or divergence—can occur in XSLT and require similar analyses.

Road map of the paper

• §2 provides background on strategic programming.• §3 makes an inventory of programming errors in strategic programming.• §4 researches the potential of static typing for taming traversal strategies.

7

Page 8: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

s ::= t→ t (Rewrite rules as basic building blocks.)

| id (Identity transformation; succeeds and returns input.)

| fail (Failure transformation; fails and returns failure “↑”.)

| s; s (Left-to-right sequential composition.)

| s←+ s (Left-biased choice; try left argument first.)

| 2(s) (Transform all immediate subterms by s; maintain constructor.)

| 3(s) (Transform one immediate subterm by s; maintain constructor.)

| v (A variable strategy subject to binding or substitution.)

| µv.s (Recursive closure of s referring to itself as v.)

Fig. 3. Syntax of strategy primitives for transformations.

• §5 researches the potential of static analysis for taming traversal strategies.• §6 discusses related work.• §7 concludes the paper.

2 Background on strategic programming

This section introduces the notion of traversal strategy and the style of strate-gic programming. We integrate basic material that is otherwise scattered overseveral publications. The collection is also valuable in so far that we make aneffort to point out properties of strategies as they will be useful eventually inthe discussion of programming errors.

We describe traversal strategies in three different styles. First, we use a formalstyle based on a grammar and a natural semantics. Second, we use an in-terpreter style such that the semantics is essentially implemented in Haskell.Third, we embed strategies into Haskell—this is the style that is useful foractual strategic programming (in Haskell) as opposed to formal investigation.

2.1 A core calculus of transformations

Figure 3 shows the syntax of a core calculus for transformations. Later we willalso cover queries. 3 Transformations are sufficient to introduce strategies in aformal manner; we are confident that our approach easily extends to queries.The core calculus of Figure 3 follows closely Visser et al.’s seminal work [70].

The calculus contains basic strategies as well as strategy combinators. The

3 Note on terminology : some of the strategic programming literature prefers the terms‘type-preserving strategies’ for transformations and ‘type-unifying strategies’ for queries.We do not follow this tradition here—in the interest of a simple terminology.

8

Page 9: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

basic strategy id denotes the always succeeding identity transformation, whichcan be thought of as the identity function. The basic strategy fail denotes thealways failing transformation. Here we note that strategies can either fail,succeed, or diverge. (We use the symbol “↑” to denote failure in the upcomingsemantics.) There is also a basic strategy form for rewrite rules in the senseof term rewriting. All strategies—including rewrite rules—are applied at theroot node of the input. Traversal into internal nodes of the tree is provided bydesignated combinators.

Turning to the combinators, the composed strategy s; s′ denotes the sequen-tial composition of s and s′. The composed strategy s←+ s′ denotes left-biasedchoice: try s first, and try s′ second—if s failed. The characteristic combinatorsof strategic programming are 2 and 3 (also known as ‘all’ and ‘one’). Thesecombinators model so-called one-layer traversal. The strategy 2(s) applies sto all immediate subterms of a given term. If there is any subterm for whichs fails, then 2(s) fails, too. Otherwise, the result is the term that is obtainedby the application of the original outermost constructor to the processed sub-terms. The strategy 3(s) applies s only to one immediate subterm, namelythe leftmost one, if any, for which s succeeds. If s fails for all subterms, then3(s) fails, too. The one-layer traversal combinators, when used within a recur-sive closure, enable the key capability of strategic programming: to describearbitrarily deep traversal into heterogeneously typed terms.

A comprehensive instantiation of strategic programming may require addi-tional strategy primitives that we omit here. For instance, Stratego also pro-vides strategy forms for congruences (i.e., the application of strategies to theimmediate subterms of specific constructors), tests (i.e., a strategy is testedfor success, but its result is not propagated), and negation (i.e., a strategy toinvert the success and failure behavior of a given strategy) [70]. As motivatedearlier, we also omit specific primitives for queries.

Following [70] and subsequent work on the formalization of traversal strate-gies [35,30], we give the formal semantics of the core calculus as a big-step op-erational semantics using a success and failure model, shown in Figure 4 andFigure 5. (The version shown has been extracted from a mechanized model oftraversal strategies, based on the theorem prover Isabelle/HOL [30].)

The judgement s@ t ; r describes a relation between a strategy expressions, an input term t, and a result r, which is either a proper term or failure—denoted by ‘↑’. Based on tradition, we separate positive rules (resulting ina proper term) and negative rules (resulting in ‘↑’). Incidentally, this dis-tinction already helps in understanding the success and failure behavior ofstrategies. For instance, the positive rule [all+] models that the strategy ap-plication 2(s) @ c(t1, . . . , tn) applies s to all the ti such that new terms t′i areobtained and used in the result term c(t′1, . . . , t

′n). The negative rule covers the

9

Page 10: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

∃θ. (θ(tl) = t ∧ θ(tr) = t′)

tl → tr @ t; t′[rule+]

id @ t; t [id+]

s1 @ t; t′ ∧ s2 @ t′ ; t′′

s1; s2 @ t; t′′[sequ+]

s1 @ t; t′

s1←+ s2 @ t; t′[choice+.1]

s1 @ t; ↑ ∧ s2 @ t; t′

s1←+ s2 @ t; t′[choice+.2]

∀i ∈ {1, . . . , n}. s@ ti ; t′i

2(s) @ c(t1, . . . , tn) ; c(t′1, . . . , t′n)

[all+]

∃i ∈ {1, . . . , n}.

s@ ti ; t′i

∧ ∀i′ ∈ {1, . . . , i− 1}. s@ ti′ ; ↑

∧ ∀i′ ∈ {1, . . . , i− 1, i+ 1, . . . , n}. ti′ = t′i′

3(s) @ c(t1, . . . , tn) ; c(t′1, . . . , t′n)

[one+]

s[v 7→ µv.s] @ t; t′

µv.s@ t; t′[rec+]

Fig. 4. Positive rules of natural semantics for the core calculus.

case that at least one of the applications s@ t′i resulted in failure.

The semantics shown uses variables only for the sake of recursive closures,and the semantics of these is modelled by substitution. We could also furnishvariables for the sake of parametrized strategy expressions, i.e., binding blocksof strategy definitions, as this is relevant for reusability in practice, but we donot furnish this elaboration of the semantics here for brevity’s sake.

The semantics commits to a simple, deterministic model: each strategy ap-plication evaluates to one term deterministically, or it fails, or it diverges.Further, the semantics of the choice combinator is left-biased with only lo-

10

Page 11: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

6 ∃θ. θ(tl) = t

tl → tr @ t; ↑[rule−]

fail @ t; ↑ [fail−]

s1 @ t; ↑

s1; s2 @ t; ↑[seq−.1]

s1 @ t; t′ ∧ s2 @ t′ ; ↑

s1; s2 @ t; ↑[seq−.2]

s1 @ t; ↑ ∧ s2 @ t; ↑

s1←+ s2 @ t; ↑[choice−]

∃i ∈ {1, . . . , n}. s@ ti ; ↑

2(s) @ c(t1, . . . , tn) ; ↑[all−]

∀i ∈ {1, . . . , n}. s@ ti ; ↑

3(s) @ c(t1, . . . , tn) ; ↑[one−]

s[v 7→ µv.s] @ t; ↑

µv.s@ t; ↑[rec−]

Fig. 5. Negative rules of natural semantics for the core calculus.

cal backtracking. Non-deterministic semantics have been considered elsewheresuch that results may be lists (or sets) of terms with the empty list (orset) representing failure [6,7]. Non-determinism is also enabled naturally by amonadic-style functional programming embedding when using the list monad.

2.2 Traversal schemes

Assuming an ad hoc notation for parametrized strategy definition—think, forexample, of macro expansion—some familiar traversal schemes and necessaryhelpers are defined in Figure 6. We refer to [33,59] for a more detailed discus-sion of the design space for traversal schemes.

The traversal scheme stop bu(s) from Figure 6 is in fact bogus; it is onlyincluded here to provide an early, concrete example of a conceivable program-ming error. That is, the argument s will never be applied. Instead, any ap-

11

Page 12: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

full td(s) = µv.s;2(v) - - Apply s to each subterm in top-down manner

full bu(s) = µv.2(v); s - - Apply s to each subterm in bottom-up manner

once td(s) = µv.s←+ 3(v) - - Find one subterm (top-down) for which s succeeds

once bu(s) = µv.3(v)←+ s - - Find one subterm (bottom-up) for which s succeeds

stop td(s) = µv.s←+ 2(v) - - Stops when s succeeds on a ‘cut’ through the tree

stop bu(s) = µv.2(v)←+ s - - An illustrative programming error; see the text.

innermost(s) = repeat(once bu(s)) - - A form of innermost normalization a la term rewriting.

repeat(s) = µv.try(s; v) - - Fixed point iteration; apply s until it fails.

try(s) = s←+ id - - Recovery from failure of s with catch-all id .

Fig. 6. Familiar traversal schemes.

plication of the scheme will simply perform a deep identity traversal. Thisproperty can be proven with relatively little effort by induction on the struc-ture of terms, also using the auxiliary property that 2(s) is the identity for allconstant terms, i.e., terms without any subterms, no matter what s is. A li-brary author may notice the problem eventually. A ‘regular’ strategic program-mer may make similar programming errors when developing problem-specifictraversals.

2.3 Laws and properties

Figure 7 lists algebraic laws obeyed by the strategy primitives. These lawsshould be helpful in understanding the primitives and their use in traversalschemes. We refer to [30] for a mechanized model of traversal strategies, whichproves these laws and additional properties. These laws provide intuitions withregard to the success and failure behavior of strategies, and they also hint atpotential sources of dead code.

For instance, the fusion law states that two subsequent ‘2’ traversals can becomposed into one. Such a simple law does not hold for ‘3’, neither does ithold generally for traversal schemes. In fact, little is known about algebraiclaws for traversal schemes, but see [29,58] for some related research.

We also illustrate three non-laws at the foot of Figure 7, which show putativeequalities. The first reflects that fact that sequential composition is (of course)not commutative, the second that choice is (indeed) left-biased, and the thirdthat distributivity is limited. Finding a counterexample to the third non-lawwe leave as an exercise for the reader.

Let us also discuss basic properties of success and failure behavior for strate-

12

Page 13: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

[unit of “;”] id ; s = s = s; id

[zero of “;”] fail ; s = fail = s; fail

[unit of “←+”] fail←+ s = s = s←+ fail

[left zero of “←+”] id←+ s = id

[associativity of “;”] s1; (s2; s3) = (s1; s2); s3

[associativity of “←+”] s1←+ (s2←+ s3) = (s1←+ s2)←+ s3

[left distributivity] s1; (s2←+ s3) = (s1; s2)←+ (s1; s3)

[one-layer identity] 2(id) = id

[one-layer failure] 3(fail) = fail

[fusion law] 2(s1);2(s2) = 2(s1; s2)

[“2” with a constant] constant(t) ⇒ 2(s) @ t = t

[“3” with a constant] constant(t) ⇒ 3(s) @ t = ↑

[“2” with a non-constant] ¬constant(t) ⇒ 2(fail) @ t = ↑

[“3” with a non-constant] ¬constant(t) ⇒ 3(id) @ t = t

[commutativity of “;”] s; s′ 6= s′; s

[commutativity of “←+”] s←+ s′ 6= s′←+ s

[right distributivity] (s1←+ s2); s3 6= (s1; s3)←+ (s2; s3)

Fig. 7. Algebraic laws and non-laws of strategy primitives. In a few laws, infact, implications, we use an auxiliary judgement constant that holds for all constantterms, i.e., terms with 0 subterms.

gies. According to [30], we say that a strategy s is infallible if it does notpossibly fail, i.e., for any given term, it either succeeds or diverges; otherwises is fallible. The following properties hold [30]:

• If s is infallible, then full td s and full bu s are infallible.• No matter the argument s, stop td s and innermost s are infallible.• No matter the argument s, once td s and once bu s are fallible.

These properties can be confirmed based on induction arguments. For instance,(the all-based branch of the choice in) a stop-top-down traversal succeedseventually for ‘leaves’, i.e., terms without subterms, and it succeeds as well forevery term for which the traversal succeeds for all immediate subterms. Hence,by induction over the depth of the term, the traversal succeeds universally.

The discussion of termination behavior for traversal schemes is more compli-cated, but let us provide a few intuitions here. That is, it is easy to see that

13

Page 14: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

data T x= Id| Fail| Seq (T x) (T x)| Choice (T x) (T x)| Var x| Rec (x −> T x)| All (T x)| One (T x)

Fig. 8. Syntax according to Figure 3 in Haskell.

full bottom-up traversal converges—as long as its argument strategy does notdiverge—because the scheme essentially performs structural recursion on theinput term. In contrast, full top-down traversal may diverge rather easily be-cause the argument strategy could continuously increase the given term beforetraversing into it. This will be illustrated in §3. We will address terminationby means of static program analysis in §5.

2.4 A Haskell-based interpreter for strategies

Let us also provide an interpreter-based model of the core calculus in Haskell.In fact, we provide this model in a way that we can easily refine it lateron for the purpose of static program analysis in §5. The interpreter-basedmodel is not very useful though for actual programming in Haskell becauseit is essentially untyped with regard to the terms being transformed. We willproperly embed strategies in Haskell in §2.5.

Figure 8 shows the algebraic data type T for the syntax of transformationsaccording to the core calculus. There are constructors for the various basicstrategies and combinators. There is, however, no strategy form for rewriterules because we can easily represent rewrite rules as functions. Please notethat type T is parameterized by the type x for variables. Here we apply amodeling technique for syntax that allows us to model variables and bindingconstructs of the interpreted language with variables and binding constructsof the host language. Further, this particular style supports repurposing thissyntax most conveniently during abstract interpretation in §5. Finally, thestyle frees us from implementing substitution.

Figure 9 shows the actual interpreter, which is essentially a recursive function,interpret, on the syntactical domain—subject to a few auxiliary declarationsas follows. There is an algebraic data type Term for terms to be transformed;constructors are assumed to be strings; see the type synonym Constr. There isa type synonym Meaning to make explicit the type of functions computed by

14

Page 15: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Representation of termsdata Term = Term Constr [Term]type Constr = String

−− The semantic domain for strategiestype Meaning = Term −> Maybe Term

−− Interpreter functioninterpret :: T Meaning −> Meaninginterpret Id = Justinterpret Fail = const Nothinginterpret (Seq s s ’) = maybe Nothing (interpret s’) . interpret sinterpret (Choice s s ’) = \t −> maybe (interpret s’ t) Just ( interpret s t)interpret (Var x) = xinterpret (Rec f) = fixProperty ( interpret . f)interpret (All s) = transform (all ( interpret s))interpret (One s) = transform (one ( interpret s))

−− Fixed−point combinatorfixProperty :: (x −> x) −> xfixProperty f = f ( fixProperty f)

−− Common helper for All and Onetransform :: ([Term] −> Maybe [Term]) −> Meaningtransform f (Term c ts)= maybe Nothing (Just . Term c) (f ts)

−− Transform all terms in a listall :: Meaning −> [Term] −> Maybe [Term]all f ts = kids ts ’where

ts ’ = map f tskids [] = Just []kids (Just t ’: ts ’) = maybe Nothing (Just . (:) t ’) (kids ts ’)

−− Transform one term in a listone :: Meaning −> [Term] −> Maybe [Term]one f ts = kids ts ts ’where

ts ’ = map f tskids [] [] = Nothingkids (t : ts) (Nothing:ts ’) = maybe Nothing (Just . (:) t) (kids ts ts ’)kids ( : ts) (Just t ’: ts ’) = Just (t ’: ts)

Fig. 9. Haskell-based interpreter for transformation strategies.

15

Page 16: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

full bu s = Rec (\x −> Seq (All (Var x)) s)full td s = Rec (\x −> Seq s (All (Var x)))

once bu s = Rec (\x −> Choice (One (Var x)) s)once td s = Rec (\x −> Choice s (One (Var x)))stop td s = Rec (\x −> Choice s (All (Var x)))innermost s = repeat (once bu s)try s = Choice s Idrepeat s = Rec (\x −> try (Seq s (Var x)))

Fig. 10. Familiar traversal schemes for interpretation in Haskell.

the interpreter. That is, a strategy is mapped to a function from terms to termswhere the function is partial in the sense of the Maybe type constructor. 4 Thetype Meaning hence also describes what variables are to be bound to. 5

The equations of the interpret function combine the positive and negative rulesof the natural semantics in a straightforward manner. The only special case isthe approach to recursion. We use a fixed point combinator, fixProperty, to thisend. Its name emphasizes that the definition of the operator is immediatelythe defining property of a fixed point.

The traversal combinators are interpreted with the help of auxiliary functionstransform, all, and one. One-layer traversal is essentially modeled by meansof mapping over the list of subterms while using the folklore list-processingfunction map. The auxiliary functions are otherwise only concerned with themanipulation of failure for processed subterms.

Figure 10 shows familiar traversal schemes in the Haskell-based syntax.

2.5 Embedding strategies into Haskell

By embedding strategies into Haskell, they can be applied to programmer-defined data types such as those illustrative data models for companies andprogramming-language syntax in Figure 2 in the introduction.

4 Note on Haskell : Maybe is defined as follows: data Maybe x = Just x | Nothing.The constructor Just is applied when a value (i.e., a result in our case) is avail-able whereas the constructor Nothing is applied otherwise. Maybe values can beinspected by regular pattern matching, but we also use the convenience functionmaybe :: b −> (a −> b) −> Maybe a −> b which applies the first argument if thegiven ‘maybe’ is Nothing and otherwise the second argument (a function) to the value.5 Note on Haskell : Haskell in all its glory has infinite and partial data structures, suchas trees with undefined leaves, or indeed undefined subtrees. In principle, the data typeTerm can be used in such a manner. In the presence of infinite and partial structures, thediscussion of strategy semantics and properties (most notably, termination) becomes moresubtle. In this paper, we are limiting our discussion to finite, fully defined data. (The subjectof coinductive strategies over coinductive types may be an interesting topic for future work.)We also skip over the issues of laziness in most cases.

16

Page 17: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Transformations as generic functionstype T m = forall x. Term x => x −> m x

−− Combinator typesidT :: Monad m => T mfailT :: MonadPlus m => T msequT :: Monad m => T m −> T m −> T mchoiceT :: MonadPlus m => T m −> T m −> T mallT :: Monad m => T m −> T moneT :: MonadPlus m => T m −> T madhocT :: (Term x, Monad m) => T m −> (x −> m x) −> T m

−− Trivial combinator definitionsidT = returnfailT = const mzerosequT f g x = f x >>= gchoiceT f g x = f x ‘mplus‘ g x

−− Non−trivial combinator definitions using pseudo−codeallT f (C t1 ... tn) =

f t1 >>= \t1’ −> ... f tn >>= \tn’ −>return (C t1’ ... tn’)

oneT f (C t1 ... tn) =... −− elided for brevity

adhocT s f x =if typeOf x == argumentTypeOf f

then f xelse s x

Fig. 11. Embedding strategies (transformations) in Haskell.

Figure 11 defines the generic function type for transformations and the func-tion combinators for the core calculus. A number of aspects of this embeddingneed to be carefully motivated.

The type T uses forall-quantification to emphasize that strategies are indeedgeneric (say, polymorphic) functions because they are potentially applied tomany different types of subterms along traversal. The context Term x => ...

emphasizes that strategies are not universally polymorphic functions, but theycan only be applied to types that instantiate the Haskell class Term. 6 Thus,Term is the set of types of terms. (For comparison, in the interpreter-based

6 Note on Haskell : Figure 12 gives a brief overview of Haskell classes (say, type classes) forthose unfamiliar with this aspect of the Haskell language.

17

Page 18: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

A Haskell class is specified by a signature, which can be thought of as an inter-face. An implementation of the class—called an instance in Haskell—is given bydefining the interface functions for a particular type. A typical example is theequality class, Eq, shown with an instance for the Boolean type, Bool.

class Eq a where(==) :: a −> a −> Bool

instance Eq Bool where(x == y) = if x then y else (not y)

A function may be polymorphic yet require that a type is a member of a particu-lar class, as in the list membership function; the context Eq a => ... constrainsa to be a member of the Eq class.

elem :: Eq a => a −> [a] −> Boolelem x ys = or [ x==y | y<−ys ]

Since Bool is in the class Eq, then elem can be used over Boolean lists.

Fig. 12. A note on classes in Haskell.

model, Term denotes the type of terms.) The operations of the Term classenable traversal and strategy extension (see below). The specifics of theseoperations are not important for the topic of the present paper; see the Stra-funski/‘Scrap Your Boilerplate’ literature [43,38] for details.

There is the adhocT combinator that has no counterpart in the formal seman-tics and the interpreter-based model because it is specifically needed for statictyping when different types can be traversed. The adhocT combinator enablesso-called strategy extension as follows. The strategy adhocT s f constructs anew strategy from the given strategy s such that the result behaves like thetype-specific case f , when f is applicable and like s otherwise. This is also ex-pressed by the pseudo-code. We omit technical details that are not importantfor the topic of the present paper; see, again, [43,38] for details. It is importantthough to understand that an operation like adhocT combinator is essentialfor a (conservative) typeful embedding. This will be illustrated shortly.

When compared to the interpreter of Figure 9, the embedding does not referto the Maybe type constructor for the potential of failure in strategy appli-cation. Instead, a type-constructor parameter for a monad m is used. 7 Theuse of monads is a strict generalization of the use of Maybe because a) Maybe

is a specific monad and b) other monads can be chosen to compose traver-sal behavior with additional computational aspects. This generalization hasbeen found to be essential in practical, strategic programming in Haskell. By

7 Note on Haskell : Figure 13 gives a brief overview of monads for those unfamiliar with theconcept.

18

Page 19: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Classes can also describe interfaces over type constructors, that is, functions fromtypes to types. For instance:

class Monad m wherereturn :: a −> m a(>>=) :: m a −> (a −> m b) −> m b

class Monad m => MonadPlus m wheremzero :: m amplus :: m a −> m a −> m a

Monads encapsulate an interface for ‘computations over a’, since return x givesthe trivial computation of the value x and (>>=) or ‘bind’ allows computationsto be sequenced. The simplest implementation of Monad is the identity typefunction:

data Id a = Id { getId :: x }

instance Monad Id wherereturn x = Id xc >>= f = f (getId c)

Other instances of Monad provide for non-deterministic or stateful computa-tion, which can be used to good effect in traversals, e.g., to accumulate contextinformation during the traversal.

In a similar way, MonadPlus encapsulates the concept of computations thatmight fail, witnessed by the mzero binding, and mplus combines together theresults of two computations that might fail, transmitting failure as appropriate.The simplest instance of MonadPlus is the Maybe type:

data Maybe a = Nothing | Just a

instance Monad Maybe wherereturn x = Just xNothing >>= f = Nothing(Just x) >>= f = f x

instance MonadPlus Maybe wheremzero = Nothingmplus Nothing y = ymplus x = x

Every instance of MonadPlus presupposes an instance of Monad, but not viceversa.

Fig. 13. A note on monads in Haskell.

19

Page 20: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

full td , full bu :: Monad m => T m −> T monce td, once bu :: MonadPlus m => T m −> T mstop td :: MonadPlus m => T m −> T minnermost, repeat , try :: MonadPlus m => T m −> T m

full td s = s ‘sequT‘ allT ( full td s)full bu s = allT ( full bu s) ‘sequT‘ sonce td s = s ‘choiceT‘ oneT (once td s)once bu s = oneT (once bu s) ‘choiceT‘ sstop td s = s ‘choiceT‘ allT ( stop td s)innermost s = repeat (once bu s)repeat s = try (s ‘sequT‘ repeat s)try s = s ‘choiceT‘ idT

Fig. 14. Familiar traversal schemes embedded in Haskell.

instantiating m as Maybe, we get these types:

idT :: T MaybefailT :: T MaybesequT :: T Maybe −> T Maybe −> T Maybe...

We note that the choice of Monad versus MonadPlus in the original functionsignatures of Figure 11 simply expresses what is required by the combinators’definitions. For instance, idT does not refer to any members of the MonadPlus

class whereas choiceT does.

The pseudo-code for the allT combinator expresses that the argument function(say, strategy) is applied to all immediate subterms, the various computationsare sequenced, and a term with the original outermost constructor is con-structed from the intermediate results—unless failure occurred.

Figure 14 expresses familiar traversal schemes with the embedding. The func-tion definitions are entirely straightforward, but two details are worth noticingas they relate to the central topic of programming errors. First, the functiondefinitions use general recursion, thereby implying the potential for divergence.Second, the distinction of Monad and MonadPlus in the function signatures sig-nals whether control flow can be affected by fallible arguments of the schemes,but the types do not imply rejection of infallible arguments, thereby implyingthe potential for degenerated control flow.

Figure 15 composes a traversal for the transformation of companies such thatthe salaries of all employees are increased by 1 Euro. The implementation ofthe traversal is straightforward: we select a scheme for full traversal such that

20

Page 21: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Increase the salaries of all employees.increase all salaries :: T Idincrease all salaries = full td (adhocT idT f)wheref (Employee n s) = Id (Employee n (s+1))

Fig. 15. Implementation of a transformation scenario from Figure 1.

we reach each node, and we extend the polymorphic identity function with amonomorphic function for employees so that we increase (in fact, increment)their salary components. The local function f can be viewed as a rewrite rulein that it rewrites employees—there is pattern matching on the left-hand sideand term construction on the right-hand side. The trivial identity monad, Id,is used here because the traversal is a pure function—without even the poten-tial of failure. Figure 1 also proposed a slightly more involved transformationscenario for companies: decrease the salaries of all non-top-level managers. Weleave this scenario as an exercise for the reader.

2.6 A note on queries

For most of the paper we focus on transformations since queries do not seemto add any additional, fundamental challenges. However, we extend the em-bedding approach here to include queries for a more complete illustration ofstrategic programming. Figure 16 defines the generic function type for queriesand corresponding function combinators.

The generic type Q models that queries may be applied to terms of arbitrarytypes while the result type r of the query is a parameter of the query; it doesnot depend on the actual type of the input term. Here, we note that Q is notparameterized by a monad-type constructor, as it is the case for T. This designcomes without loss of generality because the result type r may be instantiatedalso to the application of a monad-type constructor, if necessary.

The basic strategy constQ r denotes the polymorphic constant function, whichreturns r—no matter the input term or its type. The basic strategy failQ isthe always failing query. There is no special sequential composition for queriesbecause regular function compostion is appropriate—given that the result of aquery is of a fixed type. However, there is the combinator bothQ which appliestwo queries to the input and returns both results as a pair. Further, there isalso a form of choice, choiceQ, for queries. Ultimately, there is also one-layertraversal for queries. We only show the combinator allQ, which essentiallyconstructs a list of queried subterms. Finally, there is also a form of strategyextension for queries.

21

Page 22: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Queries as generic functionstype Q r = forall x. Term x => x −> r

−− Combinator typesconstQ :: r −> Q rfailQ :: MonadPlus m => Q (m r)bothQ :: Q u −> Q u’ −> Q (u,u’)choiceQ :: MonadPlus m => Q (m r) −> Q (m r) −> Q (m r)allQ :: Q r −> Q [r]adhocQ :: Term x => Q r −> (x −> r) −> Q r

−− Trivial combinator definitionsconstQ r = const rfailQ = const mzerobothQ f g x = (f x, g x)choiceQ f g x = f x ‘mplus‘ g x

−− Non−trivial combinator definitions using pseudo−codeallQ f (C t1 ... tn) =

[f t1, ..., f tn]

adhocQ s f x =if typeOf x == argumentTypeOf f

then f xelse s x

Fig. 16. Embedding strategies (queries) in Haskell.

−− Query each node and collect all results in a listfull cl :: Monoid u => Q u −> Q ufull cl s = mconcat . uncurry (:) . bothQ s (allQ (full cl s))

−− Collection with stopstop cl :: Monoid u => Q (Maybe u) −> Q ustop cl s = maybe mempty id

. (s ‘choiceQ‘ (Just . mconcat . allQ (stop cl s)))

−− Find a node to query in top−down, left−to−right manneronce cl :: MonadPlus m => Q (m u) −> Q (m u)once cl s = s ‘choiceQ‘ (msum . allQ (once cl s))

Fig. 17. Traversal schemes for queries embedded in Haskell.

22

Page 23: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

The Monoid type class encapsulates a type with a binary operation, mappend,and a unit, empty, for that operation:

class Monoid a wheremempty :: amappend :: a −> a −> amconcat :: [a] −> amconcat = foldr mappend mempty

The simplest instance is the list monoid, which indeed suggests the names usedin the class.

instance Monoid [a] wheremempty = []mappend = (++)

Other instances are given by addition and zero (or multiplication and one) overnumbers, wrapped by the Sum constructor:

newtype Sum a = Sum { getSum :: a }instance Num a => Monoid (Sum a) where

mempty = Sum 0Sum x ‘mappend‘ Sum y = Sum (x + y)

In each case mconcat is used to accumulate a list of values into a single value.This value will be independent of the way in which the accumulation is done ifthe instance satisfies the Monoid laws:

mappend mempty x = xmappend x mempty = xmappend x (mappend y z) = mappend (mappend x y) z

Fig. 18. A note on the Monoid class in Haskell.

Figure 17 expresses useful traversal schemes with the embedding. The firsttwo schemes are parameterized over a monoid to allow for the collection ofdata in a general manner. 8 (The postfix “cl” hints at “collection”.) That is,the monoid’s type provides the result type of queries and the monoid’s binaryoperation is used to combine results from querying many subterms. The thirdtraversal scheme in Figure 17 deals with finding a single subterm of interestas opposed to collecting data from many subterms of interest.

Figure 19 composes traversal for the query scenarios on companies: totalsalaries of all employees or non-managers, only. The implementation of theformer is straightforward; perhaps surprisingly, the implementation of the lat-

8 Note on Haskell : Figure 18 gives a brief overview of monoids for those unfamiliar withthe concept.

23

Page 24: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Total the salaries of all employees.total all salaries :: Q Floattotal all salaries = getSum . full cl (adhocQ (constQ mempty) f)wheref (Employee s) = Sum s

−− Total the salaries of all employees who are not managers.total all non managers :: Q Floattotal all non managers = getSum . stop cl type casewheretype case :: Q (Maybe (Sum Float))type case = adhocQ (adhocQ (constQ Nothing) employee) manageremployee (Employee s) = Just (Sum s)manager (Manager ) = Just (Sum 0)

Fig. 19. Implementation of two query scenarios from Figure 1.

ter is significantly more involved. The simple collection scheme full cl is inap-propriate for totaling all non-managers because the scheme would reach allnodes eventually—including the employee subterms that are part of managerterms, from which salaries must not be extracted though. Hence, a traver-sal with ‘stop’ is needed indeed. Further, an always failing default is neededhere—again, in contrast to the simpler case of totaling all salaries. Finally,the solution depends on the style of data modeling. That is, the assumed datamodel distinguishes the types of managers and employees. Hence, we can usean extra type-specific case for managers to stop collection at the manager level.Without the type distinction in the data model, the traversal program wouldneed to exploit the special position of managers within department terms.

The transformation scenario for decreasing the salaries of all non-top-levelmanagers, which we left as an exercise for the reader, calls for similarly involvedconsiderations. These illustrations may help to confirm that programming er-rors are quite conceivable in strategic programming—despite the concisenessof the programming style.

3 Inventory of strategic programming errors

The implementation of a strategic programming (sub-) problem (say, a traver-sal problem) is normally centered around some problem-specific ingredients(‘rewrite rules’) that have to be organized in a more or less complex strategy.There are various decisions to be made and accordingly, there are opportu-nities for misunderstanding and programming errors. This section presents afine-grained inventory of programming errors by reflecting systematically onthe use of basic strategy combinators and library abstractions in the imple-

24

Page 25: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

mentation of traversal problems. We use a deceptively simple scenario as therunning example. We begin with a short proposal of the assumed process ofdesigning and implementing traversal programs, which in itself may improveunderstanding of strategic programming and help reducing programming er-rors. The following discussion is biased towards transformations, but coverageof queries would require only a few, simple adaptations.

3.1 Design of traversal programs

Traversal programming is based on the central notion of terms of interest—these are the terms to be affected by a transformation. When designing atraversal, the terms of interest are to be identified along several axes:

Types Most obviously, terms of interest are of certain types. For instance,a transformation for salary increase may be concerned with the type ofemployees.

Patterns and conditions Terms of interest often need to match certain pat-terns. For instance, a transformation for the application of a distributive lawdeals with the pattern x ∗ (y+ z). In addition, the applicability of transfor-mations is often subject to (pre-) conditions.

Position-based selection A selection criterion may be applied if the trans-formation should not affect all terms with fitting patterns and conditions.In an extreme case, a single term is to be selected. Such selection typicallyrefers to the position of these terms; think of top-most versus bottom-most.

Origin The initial input is expected to contribute terms with fitting patternsand conditions, but previous applications of rewrite rules may contributeterms as well. Hence, it must be decided whether the latter kind of originalso qualifies for terms of interest. For instance, an unfolding transforma-tion for recursive function definitions may specifically disregard functionapplications that were introduced by unfolding.

3.2 Implementation of traversal programs

When implementing a traversal, the types and patterns of terms of interestare modeled by the left-hand sides of rewrite rules. Conditions are typicallymodeled by the rewrite rules, too, but the choice of the traversal scheme maybe essential for being able to correctly check the conditions. For instance,the traversal scheme may need to pass down information that is needed bya condition eventually. The axes of selection and origin (of terms of interest)are expressed through the choice of a suitable traversal scheme.

Let us provide a summary of basic variation points in traversal implementa-tion. For simplicity, let us focus here on traversal problems whose implemen-tation corresponds to a strategy that has been built by applying one or more

25

Page 26: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

traversal schemes from a library to the problem-specific rewrite rules, possiblysubject to composition of rewrite rules or sub-traversals.

Organizing the strategy involves the following decisions:

Scheme Which traversal scheme is to be used?

• Is a full or a limited traversal required?• Does top-down versus bottom-up order matter?• Does a strategy need to be iterated?• ...

Default What polymorphic and monomorphic defaults are to be used?

• The identity transformation.• The always failing transformation.• Another, more specific, behavior.

Composition How to compose a strategy from multiple parts?

• Use type case (strategy extension) at the level of rewrite rules.• Combine arguments of a traversal scheme in a sequence.• Combine arguments of a traversal scheme in a choice.• Combine traversals in a sequence.• Combine traversals in a choice.

Based on an appropriate running example we shall exercise these choices, andsee the consequences of incorrect decisions as they cause programming errorsin practice. In our experience, wrong choices are the result of insufficientlyunderstanding i) the variation points of traversal schemes, ii) the subtleties ofcontrol flow, and iii) the axes of terms of interest in practical scenarios.

3.3 The running example

To use a purposely simple example, consider the transformation problem of‘incrementing all numbers in a term’. Suppose ` is the rewrite rule that mapsany given number n to n+1. It remains to compose a strategy that can iterate` over any term.

For concreteness’ sake, we operate on n-ary trees of natural numbers. Further,we assume a Peano-like definition of the data type for numbers. Here are thedata types for numbers and trees:

data Nat = Zero | Succ Natdata Tree a = Node {rootLabel :: a, subForest :: [Tree a]}

The Peano-induced recursion implies a simple form of nesting. It goes with-out saying that the Peano-induced nesting form is contrived, but its inclusion

26

Page 27: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

allows us to cover nesting as such—any practical scenario of traversal pro-gramming involves nesting at the data-modeling level—think of nesting ofdepartments in the company example, or nesting of expressions or function-definition blocks in the AST example given in the introduction.

Here are simple tree samples:

tree1 = Node { rootLabel = Zero, subForest = [] } −− A tree of numberstree2 = Node { rootLabel = True, subForest = [] } −− A tree of Booleanstree3 = Node { rootLabel = Succ Zero, subForest = [tree1 , tree1 ] } −− Two subtrees

The rewrite rule for incrementing numbers is represented as follows:

increment n = Succ n

In fact, let us use monadic style because the basic Strafunski-like library intro-duced in §2.5 assumes monadic style for all combinators—in particular, for allarguments. Hence, we commit to the Maybe monad and its constructor Just:

increment n = Just (Succ n)

It remains to complete the rewrite rule into a traversal strategy that incre-ments all numbers in an arbitrary term. That is, we need to make decisionsregarding traversal scheme, default, and composition for the implementation.

Given the options full td, full bu, stop td, once bu, once td, and innermost, whichtraversal scheme is the correct one for the problem at hand? Also, how toexactly apply the chosen scheme to the given rewrite rule? An experiencedstrategist may quickly exclude a few options. For instance, it may be obviousthat the scheme once bu is not appropriate because we want to increment allnumbers, while once bu would only affect one number. In the remainder ofthe section, we will attempt different schemes and vary other details, therebyshowcasing potential programming errors.

3.4 Strategies going wrong

The composed strategy may go wrong in different ways:

• It diverges.• It transforms incorrectly, i.e., numbers are not exactly incremented.• It does not modify the input, i.e., numbers are not incremented at all.• It fails even when the transformation is assumed never to fail.• It succeeds even when failure is preferred for terms without numbers.

Let us consider specific instances of such problems.

27

Page 28: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

3.4.1 Divergent traversal

Let us attempt a full top-down traversal. Alas, the traversal diverges: 9

> full td (adhocT idT increment) tree1... an infinite tree is printed ...

The intuitive reason for non-termination is that numbers are incrementedprior to the traversal’s descent. Hence, the term under traversal grows andeach increment enables another increment.

Let us attempt instead the innermost scheme. Again, traversal diverges:

> innermost (adhocT failT increment) tree1... no output ever is printed ...

The combinator innermost repeats once bu until it fails, but it never fails be-cause there is always a redex to which to apply the increment rule. Hence, tree1

is rewritten indefinitely.

Both decisions here illustrate the case of choosing the wrong traversal schemewhich in turn may be the result of insufficiently understanding some axesof terms of interest (see §3.1) and associated properties of rewrite rules. Inparticular, the traversal schemes used here support the origin axis in a wayterms of interest are created by the traversal.

3.4.2 Incorrect transformation

Let us attempt instead the full bu scheme:

> full bu (adhocT idT increment) tree1Just (Node {rootLabel = Succ Zero, subForest = []})

The root label was indeed incremented. This particular test case looks fine, butif we were testing the same strategy with trees that contain non-zero numbers,then we would learn that the composed strategy replaces each number n by2n + 1 as opposed to n + 1. To see this, one should notice that a number nis represented as a term of depth n + 1, and the choice of the scheme full bu

implies that increment applies to each ‘sub-number’.

More generally, we see an instance of an overlooked applicability condition (see§3.1) in that numbers are terms of interest, but not subterms thereof. The samekind of error could occur in the implementation of any other scenario as long

9 Note on Haskell : Throughout the section, we operate at the Haskell prompt. That is, weshow input past the ‘>’ prompt sign and resulting output, if any, right below the input.

28

Page 29: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

as it involves nesting. In real-world scenarios, nesting may actually arise alsothrough mutual (data type-level) recursion.

3.4.3 No-op traversal

Finally, let us attempt the stop td scheme. Alas, no incrementing happens:

> stop td (adhocT idT increment) tree1Just (Node {rootLabel = Zero, subForest = []})

That is, the result equals Just tree1. The problem is that the strategy shouldcontinue to descend as long as no number is hit, but the polymorphic defaultidT makes the strategy stop for any subterm that is not a number. Let usreplace idT by failT. Finally, we arrive at a proper solution for the originalproblem statement:

> stop td (adhocT failT increment) tree1Just (Node {rootLabel = Succ Zero, subForest = []})

Hence, stop td is the correct traversal scheme for the problem at hand, but wealso need to be careful about using the correct polymorphic default for liftingthe rewrite rule to the strategy level; see §3.2.

3.4.4 Unexpected failure

failT is the archetypal polymorphic default for certain schemes, while it ispatently inappropriate for others. To see this, suppose, we indeed want toreplace each number n by 2n+ 1, as we accidentally ended up doing in §3.4.2.Back then, the polymorphic default idT was appropriate for full bu. In contrast,the default failT is not appropriate:

> full bu (adhocT failT increment) tree1Nothing

This is a case of unexpected failure in the sense that we expect the traversalfor incrementing numbers to succeed for all possible input terms. The problemis again due to the wrong choice of default.

3.4.5 Unexpected success

Let us apply the confirmed scheme and default to a tree of Booleans:

> stop td (adhocT failT increment) tree2Just (Node { rootLabel = True, subForest = [] })

29

Page 30: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Of course, no incrementing happens; the output equals the input. Arguably, astrategic programmer could expect that the traversal should fail, if the rewriterule for incrementing never applies. For comparison, the traversal schemeonce bu does indeed fail in case of inapplicability of its argument. Defininga suitable variation on stop td that indeed fails in the assumed way we leaveas an exercise for the reader. Misunderstood success and failure behavior maypropagate as a programming error as it may affect the control flow in thestrategic program.

3.5 Subtle control flow

Arguably, several of the problems discussed involve subtleties of control flow.It appears to be particularly difficult to understand and to correctly configurecontrol flow of strategies on the grounds of success and failure behavior foroperands in strategy composition.

Let us modify the running example slightly to provide another illustration.We consider the refined problem statement that only even numbers are tobe incremented. In the terminology of rewriting, this statement calls for aconditional rewrite rule: 10

−− Pseudo code for a conditional rewrite ruleincrement even : n −> Succ(n) where even(n)

−− Haskell code (monadic notation)increment even n = do guard (even n); increment n

We use the same traversal scheme as before:

> stop td (adhocT failT increment even) tree1Just (Node {rootLabel = Succ Zero, subForest = []})

This particular test case looks fine, but if we were testing the same strategywith trees that contain odd numbers, then we would learn that the composedstrategy in fact also increments those. The problem is that the failure of theprecondition for increment propagates to the traversal scheme which takes fail-ure to mean ‘continue descent’. However, once we descend into odd numbers,we will hit an even sub-number in the next step, which is hence incremented.So we need to make sure that recursion ceases for all numbers. Thus:

increment even n | even n = Just (Succ n)

10 Both the original increment function and the new ‘conditional’ increment even functiongo arguably beyond the basic notion of a rewrite rule that requires a non-variable patternon the left-hand side. We could easily recover classic style by using two rewrite rules—onefor each form of a natural number.

30

Page 31: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

| otherwise = Just n

The example shows the subtleties of control flow in strategic programming:committing to the specific monomorphic type in adhocT can still fail, and solead to further traversal.

3.6 Dead code

So far we have mainly spotted programming errors through comparison ofexpected with actual output, if any. Let us now switch to the examination ofcomposed strategies. There are recurring patterns of producing dead code instrategic programming. We take the position here that dead code is a symptomof programming errors.

Consider the following patterns of strategy expressions:

• adhocT (adhocT s f1) f2

• choiceT s1 s2

• sequT s1 s2

In the first pattern, if the operands f1 and f2 are of the same type (or moregenerally, the type of f2 can be specialized to the type of f1), then f1 has nochance of being applied. Likewise, in the second pattern, if s1 never possiblyfails, then s2 has no chance of being applied. Finally, in the third pattern, if s1

never possibly succeeds, which is likely to be the symptom of a programmingerror by itself, then, additionally, s2 has no chance of being applied.

Let us illustrate the first kind of programming error: two type-specific casesof the same type that are composed with adhocT. Let us consider a refinedproblem statement such that incrementing of numbers is to be replaced by (i)increment by one all odd numbers, (ii) increment by two all even numbers.Here are the basic building blocks that we need:

atOdd n | odd n = Just (Succ n)| otherwise = Nothing

atEven n | even n = Just (Succ (Succ n))| otherwise = Nothing

Arguably, both rewrite rules could also have been combined in a single func-tion to start with, but we assume here a modular decomposition as the startingpoint. We also leave it as an exercise to the reader to argue that the monomor-phic default Nothing is appropriate for the given problem. Intuitively, we wishto compose these type-specific cases so that both of them are tried.

31

Page 32: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Let us attempt a composition that uses adhocT twice:

> stop td (adhocT (adhocT failT atEven) atOdd) tree1Just (Node {rootLabel = Zero, subForest = []})

Alas, no incrementing seems to happen. The problem is that there are twotype-specific cases for numbers, and the case for odd numbers dominates theone for even numbers. The case for even numbers is effectively dead code. Inthe sample tree, the number, Zero, is even.

The two rewrite rules need to be composed at the monomorphic level of thenumber type—as opposed to the polymorphic level of strategy extension. Tothis end, we need composition combinators that can be applied to functionsof specific types as opposed to generic functions:

msequ :: Monad m => (x −> m x) −> (x −> m x) −> x −> m xmsequ s s’ x = s x >>= s’

mchoice :: MonadPlus m => (x −> m x) −> (x −> m x) −> x −> m xmchoice f g x = mplus (f x) (g x)

Using mchoice, we arrive at a correct composition:

> stop td (adhocT failT (mchoice atEven atOdd)) tree1Just (Node {rootLabel = Succ (Succ Zero), subForest = []})

We face a more subtle form of dead code when the root type for terms ina traversal implies that the traversal cannot encounter subterms of the typeexpected by a type-specific case. Consider again the strategy application thatwe already used for the illustration of potentially unexpected success in §3.4.5:

> stop td (adhocT failT increment) tree2Just (Node { rootLabel = True, subForest = [] })

The output equals the input. In this application, the rewrite rule increment iseffectively dead code. In fact, it is not important what actual input is passedto the strategy. It suffices to know that the input’s type is Tree Boolean. Termsof interest, i.e., numbers, cannot possibly be found below any root of typeTree Boolean and the given strategy is a no-op in such a case. This may beindeed a symptom of a programming error: we either meant to traverse a dif-ferent term (i.e., one that contains numbers) or we meant to invoke a differentstrategy (i.e., one that affects Boolean literals or polymorphic trees). Accord-ingly, one could argue that the strategy application at hand should be rejectedstatically.

32

Page 33: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

3.7 Options of composition

As a final exercise on the matter of strategy composition, let us study onemore time the refined example for incrementing odd and even numbers asintroduced in §3.6. We take for granted the following decisions: stop td for thetraversal scheme and failT for the polymorphic default. Given all the principaloptions for composition, as of §3.2, there are the following concrete optionsfor the example:

1. stop td (adhocT (adhocT failT atEven) atOdd)2. stop td (adhocT failT (mchoice atEven atOdd))3. stop td (adhocT failT (msequ atEven atOdd))4. stop td (choiceT (adhocT failT atEven) (adhocT failT atOdd))5. stop td (sequT (adhocT failT atEven) (adhocT failT atOdd))6. choiceT (stop td (adhocT failT atEven)) (stop td (adhocT failT atOdd))7. sequT (stop td (adhocT failT atEven)) (stop td (adhocT failT atOdd))

(We do not exercise all variations on the order of operands.) Option (1.) hadbeen dismissed already because the two branches involved are of the same type.Option (2.) had been approved as a correct solution. Option (4.) turns out tobe equivalent to option (2.). (This equivalence is implied by basic propertiesof defaults and composition operators.) The strategies of the other optionsdo not implement the intended operation. Demonstrating and explaining theissues with these strategies we leave as an exercise for the reader.

4 Static typing of traversal strategies

We use established means of static typing to curb the identified programmingerrors, to the extent possible, in a way that basic strategy combinators andlibrary abstractions are restricted in generality. In particular, we use statictyping to avoid wrong decisions regarding strategy composition, to reducesubtleties of control flow, and to avoid some forms of dead code. We do notdesign new type systems here. Instead, we attempt to leverage establishedmeans, as well as we can.

The section is organized as a sequel of contributions—each of them consist-ing of language-agnostic advice for improving strategic programming and anillustration in Haskell. We use Haskell for illustrations because it is an estab-lished programming language for statically typed strategic programming andits type system is rather powerful in terms of supporting different forms ofpolymorphism and simple forms of dependent typing [50,65,63,32,60]. A basic‘reading knowledge’ of Haskell, supplemented with the background notes in§2, should be sufficient to understand the essence of the Haskell illustrations.

We provide language-agnostic advice because different languages may need to

33

Page 34: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

achieve the suggested improvements in different ways, if at all. In fact, not evenHaskell’s advanced type system achieves the suggested improvements in a fullysatisfactory manner. Hence, the section may ultimately suggest improvementsof practical type systems or appropriate use of existing proposals for type-system improvements, e.g., [14,64,51,57,45,80,8,16].

4.1 Hard-wire defaults into traversal schemes

Advice 1 By hard-wiring a suitable default into each traversal scheme, ruleout wrong decisions regarding the polymorphic default during strategy composi-tion (see §3.2). Here we assume that the default can either be statically definedfor each scheme or else that it can be determined at runtime by observing otherarguments of the scheme.

We can illustrate the advice in Haskell in a specific manner by reducing thepolymorphism of traversal schemes as follows. While the general schemes of§2.5 are essentially parameterized by polymorphic functions on terms (in fact,rank-2 polymorphic functions [38,57]), the restricted schemes are parameter-ized by specific type-specific cases. There is also a proposal for a variation of‘Scrap Your Boilerplate’ that points in this direction [52].

The following primed definitions take a type-specific case f, which is thengeneralized within the definition by means of the appropriate polymorphicdefault, idT or failT. We delegate to the more polymorphic schemes otherwise.

full td’ :: (Term x, Term y, Monad m) => (x −> m x) −> y −> m yonce bu’ :: (Term x, Term y, MonadPlus m) => (x −> m x) −> y −> m ystop td’ :: (Term x, Term y, MonadPlus m) => (x −> m x) −> y −> m y...

full td’ f = full td (adhocT idT f)once bu’ f = once td (adhocT failT f)stop td’ f = stop td (adhocT failT f)...

These schemes reduce programming errors as follows. Most obviously, poly-morphic defaults are correct by design because they are hard-wired into thedefinitions. A side effect is that the use of strategy extension is now limited tothe library, and hence strategy composition is made simpler by reducing thenumber of options (see §3.7).

However, there are scenarios that require the general schemes; see [70,43,44] forexamples. The problem is that we may need a variable number of type-specificcases. Some scenarios of strategies with multiple cases can be decomposed intomultiple traversals, but even when it is possible, it may be burdensome and

34

Page 35: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

negatively affect performance. Further, there are cases, when the hard-wireddefault is not applicable. Hence, the default should be admissible to overriding.

As a result, the restricted schemes cannot fully replace the general schemes.Therefore, a strategic programming library would need to provide both vari-ants and stipulate usage of the restricted schemes whenever possible.

In principle, one can think of unified schemes that can be applied to singletype-specific cases, collections thereof, and polymorphic functions that readilyincorporate a polymorphic default. Those schemes would need to coerce type-specific cases to polymorphic functions. We will illustrate this idea in a limitedmanner in §4.4.

4.2 Declare and check fallibility contracts

Advice 2 Curb programming errors due to subtle control flow (see §3.5) bydeclaring and checking contracts regarding fallibility. These contracts conveywhether the argument of a possibly restricted traversal scheme is supposed to befallible and whether the resulting traversal is guaranteed to be infallible (subjectto certain preconditions).

The advice is meant to improve strategic programming so that more guidanceis provided as far as the success and failure behavior of traversal schemes andtheir arguments is concerned. According to §2.3, traversal schemes differ interms of their fallibility properties and the dependence of these properties onfallibility properties of the arguments. For instance, full td preserve infallibility,that is, a composed traversal full td s is infallible if the argument s is infallible.In contrast, stop td s is infallible regardless of s.

The function signatures of the schemes of §2.5 hint at fallibility properties:see the distinguished use of Monad vs. MonadPlus. For instance:

full td :: Monad m => T m −> T monce bu :: MonadPlus m => T m −> T mstop td :: MonadPlus m => T m −> T m

However, such hinting does not imply checks. For instance, a programmermay still pass a notoriously failing argument to full td despite the signature’shint that a universally succeeding argument may be perfectly acceptable. Suchhinting may also be misleading. For instance, the appearance of MonadPlus inthe type of stop td may suggest that such a traversal may fail, but, in fact,it cannot. Instead, the appearance of MonadPlus hints at the fact that theargument is supposed to be fallible.

We can illustrate the advice in Haskell in a specific manner by providing

35

Page 36: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

infallible variations on the traversal schemes of §2.5. To this end, we use theidentity monad whenever we want to require or imply infallibility. We use themaybe monad whenever the argument of such a scheme is supposed to befallible. Thus:

full td’ :: T Id −> T Idfull bu’ :: T Id −> T Idstop td’ :: T Maybe −> T Idinnermost’ :: T Maybe −> T Idrepeat’ :: T Maybe −> T Idtry’ :: T Maybe −> T Id

Applications of these restricted schemes are hence guaranteed to be infallible.We cannot provide an infallible variation on once bu due to its nature. It is alsoinstructive to notice that try models transition from a fallible to an infalliblestrategy—not just operationally, as before, but now also at the type level. Theinverse transition is not served.

The primed definitions full td’ and full bu’ simply delegate to the originalschemes, but the other primed definitions need to be defined from scratch be-cause they need to compose infallible and fallible strategy types in a mannerthat requires designated forms of sequence and choice. These new definitionscan be viewed as constructive proofs for fallibility properties.

full td’ s = full td sfull bu’ s = full bu sstop td’ s = s ‘choiceT’‘ allT (stop td’ s)innermost’ s = repeat’ (once bu s)repeat’ s = try’ (s ‘sequT’‘ repeat’ s)try’ s = s ‘choiceT’‘ idT

choiceT’ :: T Maybe −> T Id −> T IdchoiceT’ f g x = maybe (g x) Id (f x)

sequT’ :: T Maybe −> T Id −> T MaybesequT’ f g = f ‘sequT‘ (Just . getId . g)

The type of choiceT’ is interesting in so far that it allows us to compose afallible strategy with an infallible strategy to obtain an infallible strategy.That is, the scope of fallibility is made local.

The fallibility properties were modeled at the expense of eliminating the gen-eral monad parameter. Generality could be recovered though by consistentlyparameterizing all infallible schemes with a plain monad and adding an appli-cation of the monad transformer for Maybe whenever the argument of such ascheme is supposed to be fallible. Thus:

36

Page 37: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

full td’’ :: Monad m => T m −> T m −− equals original typefull bu’’ :: Monad m => T m −> T m −− equals original typestop td’’ :: Monad m => T (MaybeT m) −> T minnermost’’ :: Monad m => T (MaybeT m) −> T mrepeat’’ :: Monad m => T (MaybeT m) −> T mtry’’ :: Monad m => T (MaybeT m) −> T m

The definitions are omitted here as they require non-trivial knowledge ofmonad transformers; see though the paper’s online code distribution. Thesedefinitions declare the fallibility contracts better than the original schemes,but enforcement is limited. The monad-type parameter may be still (acci-dentally) instantiated to an instance of MonadPlus. For instance, the types offull td’’ and full bu’’ are not at all constrained, when compared to the originalschemes.

4.3 Reserve fallibility for modeling control flow

Advice 3 Curb programming errors due to subtle control flow (see §3.5) byreserving fallibility, as discussed so far, for modeling control flow. If successand failure behavior is needed for other purposes, such as assertion checking,then strategies shall use effects that cannot be confused with efforts to modelcontrol flow. The type system must effectively rule out such confusion.

We can illustrate the advice in Haskell in a specific manner by defining thetraversal schemes of §2.5 from scratch in terms of two distinct types for infal-lible versus fallible types:

data T m = T { getT :: forall x. Term x => x −> m x }data T’ m = T’ { getT’ :: forall x. Term x => x −> MaybeT m x }

We use data types here (as opposed to type synonyms) so that the two typescannot possibly be confused. This is discussed in more detail below.

If an infallible strategy needs to fail for reasons other than affecting regularcontrol flow, then the monad parameter can still be used to incorporate themaybe monad or an exception monad, for example. In this manner, strategiesmay perform assertion checking, as often needed for preconditions of nontriv-ial transformations, without running a risk of failure to be consumed by thecontrol-flow semantics of the strategic program. In this manner, strategic pro-gramming is updated to reliably separate control flow and exceptions (or othereffects), as it is common in the general programming field [61,49].

The types of the ‘full’ traversal schemes reflect that control flow is hard-wired:

full td :: Monad m => T m −> T m

37

Page 38: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

full bu :: Monad m => T m −> T m

The types of the ‘once’ traversal schemes reflect that fallibility is essential:

once td :: Monad m => T’ m −> T’ monce bu :: Monad m => T’ m −> T’ m

The other library combinators construct infallible strategies from fallible ones:

stop td :: Monad m => T’ m −> T minnermost :: Monad m => T’ m −> T mrepeat :: Monad m => T’ m −> T mtry :: Monad m => T’ m −> T m

At first sight, these types look deceptively similar to those that we defined forfallibility contracts in §4.2. However, the important difference is that T m andT’ m’ cannot be confused whereas this is possible for T m and T (MaybeT m’)

if m and MaybeT m’ are unifiable.

The definitions of the new schemes are omitted here as they rely on a desig-nated, non-trivial suite of basic strategy combinators; see though the paper’sonline code distribution. It is fair to say that the present illustration alsoaddresses Advice 2 regarding fallibility contracts.

4.4 Enable families of type-specific cases

Advice 4 Rule out dead code due to overlapping type-specific cases (see §3.6)by enabling strongly typed families of type-specific cases as arguments of traver-sal schemes. Such a family is a non-empty list of functions the types of whichare pairwise non-unifiable but they all instantiate the same generic functiontype for strategies.

We can illustrate the advice in Haskell in a specific manner by making useof advanced type-class-based programming. More specifically, we leverage ex-isting library support for strongly typed, heterogenous collections—the HListlibrary [32].

Consider the pattern of composing a strategy from several (say, two) type-specific cases and a polymorphic default:

adhocT (adhocT s f1) f2

The type-specific cases, f1 and f2, are supposed to override the polymorphicdefault s in a point-wise manner. Ignoring static typing for a second, we canrepresent the two type-specific cases instead as a list [f1, f2]. Conceptually, it

38

Page 39: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Type−class−polymorphic type of familyTclass (Monad m, HTypeIndexed f) => FamilyT f mwhere

familyT :: T m −> f −> T m

−− Empty list caseinstance Monad m => FamilyT HNil mwhere

familyT g = g

−− Non−empty list caseinstance ( Monad m

, FamilyT t m, Term x, HOccursNot (x −> m x) t)

=> FamilyT (HCons (x −> m x) t) mwhere

familyT g (HCons h t) = adhocT (familyT g t) h

The list of type-specific cases is constrained to only hold elements of distincttypes; see the constraint HTypeIndexed, which is provided by the HList library.Also notice that the element types are constrained to be function types formonadic transformations; see the pattern x −> m x in the head of the last in-stance. As a proof obligation for the HTypeIndexed constraint, the instance fornon-empty lists must establish that the head’s type does not occur again in thetail of the family; see the constraint HOccursNot, which is again provided by theHList library.

Fig. 20. Derivation of a transformation from type-specific cases and a default.

is a heterogenous list in the sense that the types of the functions for type-specific cases are supposed to be distinct (in fact, non-unifiable); otherwisedead code is admitted. Using HList’s constructors for heterogenous lists, thetype-specific cases are represented as follows:

HCons f1 (HCons f2 HNil)

Now suppose that the types of the type-specific cases all instantiate the poly-morphic type T. Further suppose that there is a function familyT for strat-egy construction; it takes two arguments: the heterogenous collection and atransformation s which serves as polymorphic default. The original strategyis constructed as follows:

familyT (HCons f1 (HCons f2 HNil)) s

The function familyT must be polymorphic in a special manner such that it

39

Page 40: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

can process all heterogenous collections of type-specific cases. To this end, thefunction must be overloaded on all possible types of such collections, which isachieved by type-class-based programming; see Figure 20.

The family-enabled traversal schemes are defined as follows:

full td’ s = full td (familyT idT s)full bu’ s = full bu (familyT idT s)once td’ s = once td (familyT failT s)once bu’ s = once bu (familyT failT s)stop td’ s = stop td (familyT failT s)innermost’ s = innermost (familyT failT s)

That is, the family-enabled schemes invoke the familyT function to resolve theheterogenous collection into a regular generic function subject to the poly-morphic default that is known for each traversal scheme. In this manner, wedo not just avoid dead code for type-specific cases; we also address the issueof Advice 1 in that the polymorphic default is hard-wired into the schemes—though without the restriction to a single type-specific case, as was the casein §4.1.

Admittedly, the illustration involves substantial encoding. For example, typeerrors in type-class-based programming are rather involved, and tend to becouched in terms well below the abstraction level of the strategic programmer.Hence, future type systems should provide first-class support for the requiredform of type case.

4.5 Declare and check reachability contracts

Advice 5 Curb programming errors due to type-specific cases not exercisedduring traversal (see §3.6) by declaring and checking contracts regarding reach-ability. These contracts describe that the argument types of type-specific casescan possibly be encountered on subterm positions of the traversal’s input whoseroot type is known. The type system shall enforce these contracts for composedtraversals.

We can illustrate the advice in Haskell in a specific manner by making useagain of advanced type-class-based programming. That is, we leverage a type-level relation on types, which captures whether or not terms of one type mayoccur within terms of another type. The relation is modelled by the followingtype class:

class ReachableFrom x yinstance ReachableFrom x x −− reflexivity−− Other instances are derived from data types of interest.

40

Page 41: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

The above instance makes sure that relation ReachableFrom is reflexive. Allremaining instances must be derived from the declarations of data types thatare term types. The instances essentially rephrase the constructor componentsof the data-type declarations. For instance, the data type for polymorphictrees and the leveraged data type for polymorphic lists imply the followingcontributions to the relation ReachableFrom:

instance ReachableFrom a [a]instance ReachableFrom a (Tree a)instance ReachableFrom [Tree a] (Tree a)

Recall the example of dead code due to unreachable types in §3.6. The relationReachableFrom clearly demonstrates that numbers can be reached from a treeof numbers (using the second instance) but not from a tree of Booleans.

The relation ReachableFrom can be leveraged directly for the declaration ofcontracts regarding reachability. To this end, appropriate constraints must beadded to the function signatures of the traversal schemes. For simplicity, wefocus here on the simpler function signatures for the traversal schemes of §4.1with hard-wired defaults. Thus:

full td’’ :: (Term x, Term y, Monad m, ReachableFrom x y)=> (x −> m x)−> y −> m y

full td’’ = full td’

Compared to full td’ of §4.1, the ReachableFrom constraint has been added. Tothis end, the application of the forall-quantified type synonym for the resultingstrategy had to be inlined so that the constraint can address the type variabley for the strategy type. Of course, the constraint does not change the behaviorof full top-down traversal, and hence, there is no correctness issue. However,it is not straightforward to see or to prove that the additional constraint doesnot remove any useful behavior. Here, we consider the deep identity traversalor the completely undefined traversal as ‘useless’.

While the explicit declaration of contracts provides valuable documentation, itis possible, in principle, to infer reachability constraints from the traversal pro-grams themselves. This may be difficult with type-class-based programming,but we consider a corresponding static program analysis in §5.2.

The idea underlying this illustration has also been presented in [37] in the con-text of applying ‘Scrap Your Boilerplate’ to XML programming. As we notedbefore in §4.4, the use of type-class-based programming involves substantialencoding, and hence, future type systems should provide more direct meansfor expressing reachability contracts.

41

Page 42: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

5 Static analysis of traversal strategies

We go beyond the limitations of established means of static typing by propos-ing designated static analyses to curb the identified programming errors. Inthis manner, we can address some problems more thoroughly than with estab-lished means of static typing. For instance, we can infer contracts regardingfallibility and reachability—as opposed to checking them previously. Also, theencoding burden of the previous section would be eliminated by a practicaltype system which includes the proposed analyses. Further, we can address ad-ditional problems that were out of reach so far. In particular, we can performdesignated forms of termination analysis to rule out divergent traversal.

The section is organized as a sequel of contributions—each of them consistingof a piece of language-agnostic advice for improving strategic programmingand an associated static analysis to support the advice.

We use abstract interpretation and special-purpose type systems for the speci-fication and implementation of the analyses. We have included a representativeexample of a soundness proof. In all cases, we have modeled the analyses algo-rithmically in Haskell. All but some routine parts of the analyses are includedinto the text; a cursory understanding of the analyses does not require Haskellproficiency.

5.1 Perform fallibility analysis

Advice 6 Curb programming errors due to subtle control flow (see §3.5) bystatically analyzing strategy properties regarding fallibility, i.e., success andfailure behavior. Favorable properties may be stated as contracts in programswhich are verified by the proposed analysis.

Without loss of generality, we focus here on an analysis that determineswhether a strategy can be guaranteed to succeed (read as ‘always succeeds’or ‘infallible’). Similar analyses are conceivable for cases such as ‘always fails’,‘sometimes succeeds’, and others .

We will first apply abstract interpretation to the problem, but come to theconclusion that the precision of the analysis is insufficient to yield any non-trivial results. We will then apply a special-purpose type system; the latterapproach provides sufficient precision. The first approach nevertheless pro-vides insight into success and failure behavior, and the overall framework forabstract interpretation can be later re-purposed for another analysis.

42

Page 43: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

5.1.1 An abstract interpretation-based approach

We use the following lattice for the abstract domain for a simple success andfailure analysis. 11

None

ExistsFailureForallSuccess

Any

The bottom element None represents the absence of any information, and givesthe starting point for fixed point iteration. The two data values above None

represent the following cases:

• There is no value where the strategy fails: ForallSuccess

• There is a failure point for the strategy: ExistsFailure.

In the former case, we also speak of an infallible strategy according to §2.3.Note that this is a partial correctness analysis. Hence, in none of the casesis it implied that the program terminates for all arguments. The ‘top’ value,Any, represents the result of an analysis that concludes with both of the abovecases as being possible. Such a result tells us nothing of value.

We refer to Figure 21 for the Haskell model of the abstract domain. We assumeappropriate type classes for partial orders (POrd), least elements (Bottom),greatest elements (Top), and least upper bounds (Lub).

The actual analysis is shown in full detail in Figure 22. It re-interprets theabstract syntax of §2.4 to perform abstract interpretation on the abstractdomain as opposed to regular interpretation. We discuss the analysis case bycase now. The base cases can be guaranteed to succeed (Id) and to fail (Fail).The composition functions for the compound strategy combinators can easilybe verified to be monotone.

In the case of sequential composition, we infer success if both of the operandssucceed, while (definite) failure can be inferred only if the first operand fails.The reason for this is that the failure point in the domain of the secondoperand may not be in the range of the first. Hence, we conclude with Any insome cases. In the case of choice, we infer success if either of the componentssucceeds, while failure cannot be inferred at all. The reason for this is that twofailing operands may have different failure points. Hence, we have to concludeagain with the imprecise Any value in some cases.

11 We use the general framework of abstract interpretation by Cousot and Cousot [13,12];we are specifically guided by Nielson and Nielson’s style as used in their textbooks [54,55].

43

Page 44: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− General framework for abstract domains

class Eq x => POrd xwhere(<=) :: x −> x −> Bool(<) :: x −> x −> Boolx < y = not (x==y) && x <= y

class POrd x => Bottom x where bottom :: xclass POrd x => Top x where top :: xclass POrd x => Lub x where lub :: x −> x −> x

−− The abstract domain for success and failure analysis

data Sf = None | ForallSuccess | ExistsFailure | Any

instance POrd SfwhereNone <= = True

<= Any = Truex <= y = x == y

instance Bottom Sf where bottom = Noneinstance Top Sf where top = Any

instance Lub Sfwhere

lub None x = xlub x None = xlub Any x = Anylub x Any = Anylub x y = if x == y then x else Any

Fig. 21. The abstract domain for success and failure behavior.

A reference to a Var simply uses the information it contains, and fixEq is thestandard computation of the least fixed point within a lattice, by iterativelyapplying the function to the bottom element (in our analysis None). We infersuccess for an ‘all’ traversal if the argument strategy is guaranteed to succeed;likewise, the ‘all’ traversal has a failure point if the argument strategy has afailure point (because we could construct a term to exercise the failure pointon an immediate subterm position, assuming a homogeneous as opposed to amany-sorted set of terms). We infer ExistsFailure for a ‘one’ traversal because ithas a failure point for every constant term, regardless of the argument strategy.

44

Page 45: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− The actual analysisanalyse :: T Sf −> Sfanalyse Id = ForallSuccessanalyse Fail = ExistsFailureanalyse (Seq s s ’) = analyse s ‘ seq ‘ analyse s’analyse (Choice s s ’) = analyse s ‘ choice ‘ analyse s’analyse (Var x) = xanalyse (Rec f) = fixEq (analyse . f)analyse (All s) = analyse sanalyse (One s) = ExistsFailure

−− Equality−based fixed−point combinatorfixEq :: (Bottom x, Eq x) => (x −> x) −> xfixEq f = iterate bottomwhere

iterate x = let x’ = f xin if (x==x’) then x else iterate x’

−− Abstract interpretation of sequential compositionseq :: Sf −> Sf −> Sfseq None = Noneseq ForallSuccess None = Noneseq ForallSuccess ForallSuccess = ForallSuccessseq ForallSuccess = Anyseq ExistsFailure = ExistsFailureseq Any = Any

−− Abstract interpretation of left−biased choicechoice :: Sf −> Sf −> Sfchoice ForallSuccess = ForallSuccesschoice ForallSuccess = ForallSuccesschoice None = Nonechoice None = Nonechoice = Any

Fig. 22. Abstract interpretation for analyzing the success and failure be-havior of traversal programs.

Figure 23 shows the results of the analysis.

The columns are labeled by the assumption for the success and failure behaviorof the argument strategy s of the traversal scheme. There is no column forNone since this value is only used for fixed-point computation. There are anumber of cells with the None value, which means that the analysis was notable to make any progress during the fixed point computation. The analysis ispatently useless in such cases. There are a number of cells with the Any value,

45

Page 46: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

s :: ForallSuccess s :: ExistsFailure s :: Any

full bu s None None None

full td s None ExistsFailure Any

once bu s ForallSuccess Any Any

once td s ForallSuccess Any Any

stop td s ForallSuccess None None

innermost s ForallSuccess ForallSuccess ForallSuccess

Fig. 23. Exercising success and failure analysis on traversal schemes.

which means that the analysis concluded with an imprecise result: we do notget to know anything of value about the success and failure behavior in suchcases.

All the cells with values ForallSuccess and ExistsFailure are as expected, butoverall the analysis fails to recover behavior in most cases. For instance, weknow that a stop-top-down traversal is guaranteed to succeed (see §2.3), butthe analysis reports None.

One may want to improve the abstract interpretation-based approach so thatit computes more useful results. Here we note that the informal argumentsin support of fallibility and infallibility of traversal schemes typically rely oninduction over all possible terms. It is not straightforward to adjust abstractinterpretation in a way to account for induction.

We abandon abstract interpretation for now. It turns out that a type-system-based approach provides useful results rather easily because it has fundamen-tally different characteristics of dealing with recursion. In §5.2, we will revisitabstract interpretation and apply it successfully to a reachability analysis fortype-specific cases.

5.1.2 A type system-based approach

Let us use deduction based on a special-purpose type system to infer when astrategy can be guaranteed always to yield a value if it terminates, that is, itfails for no input. We use the type True for such a situation and False for thelack thereof.

The rules in Figure 24 describe a typing judgement such that Γ ` s : True isintended to capture the judgement that the strategy s does not fail for anyargument t, in the context Γ. That is, there does not exist any t such thats@ t; ↑, according to the semantics in Figure 4–Figure 5.

Figure 25 rephrases Figure 24 in a directly algorithmic manner in Haskell—alsoproviding type inference. The context Γ, which carries fallibility information

46

Page 47: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Γ ` id : True [idSF]

Γ ` fail : False [failSF]

Γ ` s1 : True ∧ Γ ` s2 : True

Γ ` s1; s2 : True[sequ.1SF]

Γ ` s1 : False ∧ Γ ` s2 : τ

Γ ` s1; s2 : False[sequ.2SF]

Γ ` s1 : τ ∧ Γ ` s2 : False

Γ ` s1; s2 : False[sequ.3SF]

Γ ` s1 : False ∧ Γ ` s2 : True

Γ ` s1←+ s2 : True[choice.1SF]

Γ ` s1 : False ∧ Γ ` s2 : False

Γ ` s1←+ s2 : False[choice.2SF]

Γ ` s1 : True ∧ Γ ` s2 : τ

Γ ` s1←+ s2 : True[choice.3SF]

Γ ` s : τ

Γ ` 2(s) : τ[allSF]

Γ ` s : τ

Γ ` 3(s) : False[oneSF]

v : τ,Γ ` s : τ

Γ ` µv.s : τ[recSF]

Fig. 24. Typing rules for success and failure behavior

about free variables in a term, is needed so that the analysis can deal withrecursively-defined strategies.

The property of infallibility is undecidable, and hence, the type system willnot identify all strategies of type True, but it is guaranteed to be sound, inthat no strategy is mis-identified as being infallible by the type system when

47

Page 48: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Type expressionstype Type = Bool −− Can we conclude that there is definitely no failure ?

−− Type inferencetypeOf :: T Type −> Maybe TypetypeOf Id = Just TruetypeOf Fail = Just FalsetypeOf (Seq s s ’) = liftM2 (&&) (typeOf s) (typeOf s ’)typeOf (Choice s s ’) = liftM2 (||) (typeOf s) (typeOf s ’)typeOf (Var x) = Just xtypeOf (Rec f) = rec f True ‘mplus‘ rec f FalsetypeOf (All s) = typeOf stypeOf (One s) = typeOf s >> Just False

−− Infer type of recursive closurerec :: (Type −> T Type) −> Type −> Maybe Typerec f t = typeOf (f t) >>= \t’ −>

if t==t’ then Just t else Nothing

Fig. 25. Type inference for success and failure behavior

it is not.

When we compare this approach to the abstract interpretation-based ap-proach, then False should be compared with Any as opposed to ExistsFailure.That is, True represents guarantee of success, while False represents lack ofsuch a guarantee, as opposed to existence of a failure point. There is no coun-terpart for ExistsFailure in the type system. There is certainly no counterpartfor None either, because this value is an artifact of fixed point iteration, whichis not present in the type system.

With this comparison in mind, the deduction rules of Figure 24 (and theequations of Figure 25) are very similar to the equations of Figure 22. For

instance, the rules for base cases idSF and failSF state that the identity,id , is infallible, but that the primitive failure, fail , is not. For a sequence to

be infallible, both components need to be infallible (sequ.1SF to sequ.3SF).

If either component of a choice is infallible, the choice is too (choice.1SF,

choice.3SF). A choice between two potentially fallible programs might wellbe infallible, but this analysis can only conclude that this is not guaranteed,and it is here that imprecision comes into the analysis. The type of an ‘all’

traversal coincides with the type of argument strategy (allSF). There is no

guarantee of success for a ‘one’ traversal (oneSF).

Finally, in dealing with the recursive case it is necessary to introduce a type

48

Page 49: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

s :: False s :: True

full bu s False True

full td s False True

once bu s False True

once td s False True

stop td s True True

innermost s True True

Fig. 26. Exercising success and failure types on traversal schemes.

context, Γ, containing typing assertions on variables. To conclude that a re-cursive definition µv.s is infallible, it is sufficient to show that the body of therecursion, s, is infallible assuming that the recursive call, v, is too.

Figure 26 presents the results of using the type system for some commontraversals. Again, the columns label the assumption for the success and failurebehavior of the argument strategy s of the traversal scheme. (We use thecontext parameter of the type system, or, in fact, the Var form of strategyterms, to capture and propagate such assumptions; see the paper’s online codedistribution.) When compared to Figure 23, guarantee of success is inferred forseveral more cases. For instance, such a guarantee is inferred for the schemesof full top-down and bottom-up traversal, subject to the guarantee for theargument. Also, the scheme for stop-top-down traversal is found to universallysucceed, no matter what the argument strategy. The abstract interpretation-based approach could not make a useful prediction for these cases.

5.1.3 Simple dead-code detection

As an aside, there is actually a trivial means to improve the usefulness of thetype system. That is, we can easily exclude certain strategies that involve deadcode in the sense of §3.6. Specifically, we could remove the following rule:

Γ ` s1 : True ∧ Γ ` s2 : τ

Γ ` s1←+ s2 : True[choice.3SF]

In this way, we classify a choice construct s1←+ s2 with an infallible left operandas ill-typed—the point being that the composition is equivalent to s1 with s2

being dead code.

5.1.4 Soundness of the type system

We prove soundness of the type system in Figure 24 relative to the established,natural semantics in Figure 4–Figure 5.

49

Page 50: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Theorem 1 For all strategic programs s if ` s : True then for no term ts@ t; ↑.

Proof We use a proof by contradiction. We suppose that there is some pro-gram s such that ` s : True and that there is an argument t so that s@ t; ↑,and we choose s and t so that the depth of the derivation of s@ t ; ↑ isminimal; from this we derive a contradiction. We work by cases over s.

Identity If s is id then there is no evaluation rule deriving id @ t; ↑ for anyt, contradicting the hypothesis.

Failure If s is fail then there is no typing rule deriving ` fail : True, contra-dicting the hypothesis.

Sequence If s is s1; s2 then the only way that ` s1; s2 : True can be derived

is for the typing rule sequ.1SF to be applied to derivations of ` s1 : Trueand ` s2 : True.

Now, by hypothesis we also have a term t so that s1; s2 @ t; ↑: examiningthe evaluation rules we see that this can only be deduced from s1 @ t ; ↑by rule seq−.1 or from s2 @ t; ↑ by rule seq−.2.

We choose i such that si @ t; ↑; the corresponding derivation is shorterthan s@ t; ↑, a contradiction to the minimality of the derivation for s.

Choice If s is s1←+ s2 then there are two ways that ` s1←+ s2 : True can

be derived: using choice.3SF from a derivation of ` s1 : True or using

choice.1SF from a derivation of ` s2 : True.Now, by hypothesis we also have a term t so that s1←+ s2 @ t ; ↑: ex-

amining the evaluation rules we see that this can only be deduced froms1 @ t; ↑ and s2 @ t; ↑ by rule choice−.

We choose si to be the case where ` si : True. Whichever we choose, thederivation of si @ t ; ↑ is shorter than s@ t ; ↑, a contradiction to theminimality of the derivation for s.

All If s is 2(s′) then ` 2(s′) : True is derived from ` s′ : True. From thenegative rules for evaluation we conclude that t is of the form c(t1, . . . , tn)and for some i we have s′@ ti ; ↑, and the derivation of this will be shorterthan that of s@ t; ↑, in contradiction to the hypothesis.

One If s is 3(s′) then ` 3(s′) : True cannot be derived, directly contradictingthe hypothesis.

Recursion Finally we look at the case that s is of the form µv.s′. We havea derivation (d1, say) of ` µv.s′ : True, and this is constructed by applying

rule recSF to a derivation d2 of v : True ` s′ : True.We also have the argument t so that µv.s′@ t; ↑. This in turn is derived

from a derivation s′[v 7→ µv.s′] @ t; ↑, shorter than the former. So, s′[v 7→µv.s′] will be our counterexample to the minimality of µv.s′, so long as wecan derive ` s′[v 7→ µv.s′] : True.

We construct a derivation of this from d2, replacing each occurrence in

50

Page 51: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Strategy Root type Reachable type-specific cases

1. Id Company ∅2. incSalary Salary {incSalary}3. try incSalary Employee ∅4. All (try incSalary) Employee {incSalary}5. All (try incSalary) Department ∅6. once bu (try incSalary) Department {incSalary}

Fig. 27. Exercising the reachability analysis for companies.

d2 of the variable rule applied to v : True by a copy of the derivation d1,which establishes that the value substituted for v, µv.s′, has the type True,thus giving a derivation of ` s′[v 7→ µv.s′] : True, as required to prove thecontradiction.

5.2 Perform reachability analysis

Advice 7 Curb programming errors due to type-specific cases not exercisedduring traversal (see §3.6) by statically analyzing reachability of the caseswithin strategies that are applied to a term of a statically known type. Suchdead code detection does not require programmer-provided reachability con-tracts; instead it is a general analysis of strategy applications.

For instance, using again the introductory company example, we would liketo obtain the kind of information in Figure 27 by a reachability analysis. Inthis example, we assume one type-specific case, incSalary, which is used in thetraversal program subject to the analysis. The case increases salaries and weassume that it is applicable to salary terms only.

Let us motivate some of the expected results in detail. When applying thestrategy Id (see the first line of the figure), which clearly does not involve anytype-specific case, we obtain the empty set of reachable cases. When applyingthe type-specific incSalary to a salary term (second line), then the case is indeedapplied; hence the result is {incSalary}. We cannot usefully apply incSalary toan employee (third line); hence, we obtain the empty set of reachable cases.We may though apply All (try incSalary) to an employee (fourth line) becausea salary may be encountered as an immediate subterm of an employee. Thelast two lines in the table illustrate the reachability behavior of a traversalscheme, in comparison to a one-layer traversal.

The abstract interpretation relies on many-sorted signatures for the termsto be traversed, as shown in Figure 28; we omit the straightforward def-inition of various constructors and observers. In the code for the abstractinterpretation for reachability, we will only use the observer sorts of type

51

Page 52: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Representation of signaturestype Sort = Stringtype Constr = Stringtype Symbol = (Constr,[Sort ], Sort)data Signature = Signature { sorts :: Set Sort

, symbols :: Set Symbol}

−− Additional observer functionsargSortsOfSort :: Signature −> Sort −> Set Sort...

Fig. 28. An abstract data type for signatures.

Signature −> Set Sort, to retrieve all possible sorts of a signature, and the ob-server argSortsOfSort, to retrieve all sorts of immediate subterms for all possibleterms of a given sort.

For simplicity’s sake, we formally represent type-specific cases simply by theirname. Reachability analysis returns sets of such cases (say, names or strings).Hence we define:

type Case = Stringtype Cases = Set Case

For each case, we need to capture its ‘sort’. Only when a type-specific caseis applied to a term of the designated sort, then the case should be countedas being exercised. More generally: The abstract interpretation computes whattype-specific cases are exercised by a given strategy when faced with terms ofdifferent sorts. To this end, we use the following abstract domain:

type Abs = Map Sort Cases

Here, Map is a Haskell type for finite maps: sorts are associated with type-specific cases. The analysis associates each strategy with such a map—as evi-dent from the following function signature:

analyse :: Signature −> T Abs −> Abs

That is, for each Sort in the given signature the analysis is supposed to returna set of (named) type-specific cases which may be executed by the traversal ifthe traversal is applied to a term of the sort. For instance:

> let incSalary = fromList [(”Salary”,Set.fromList [”incSalary” ])]

52

Page 53: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

analyse :: Signature −> T Abs −> Absanalyse sig = analyse’whereanalyse’ :: T Abs −> Absanalyse’ Id = bottomanalyse’ Fail = bottomanalyse’ (Seq s s ’) = analyse’ s ‘ lub ‘ analyse’ s’analyse’ (Choice s s ’) = analyse’ s ‘ lub ‘ analyse’ s’analyse’ (Var x) = xanalyse’ (Rec f) = fixEq (analyse’ . f)analyse’ (All s) = transform sig $ analyse’ sanalyse’ (One s) = transform sig $ analyse’ s

transform :: Signature −> Abs −> Abstransform sig abs= Map.fromList$ map perSort$ Set. toList$ sorts sigwhereperSort :: Sort −> (Sort, Cases)perSort so = (so,cases)wherecases = lubs

$ map perArgSort$ Set. toList argSorts

whereargSorts = argSortsOfSort sig soperArgSort = flip Map.lookup abs

Fig. 29. Abstract interpretation for analyzing the reachability of type-spe-cific cases relative to a given signature.

> analysis companySignature (All (All (Var incSalary)))[(”Unit”,fromList [”incSalary” ]),( ”Manager”,fromList [”incSalary”])]

The first input line shows the assembly of a type-specific case which is rep-resented here as a trivial map of type Abs. The second input line starts areachability analysis. The printed map for the result of the analysis statesthat incSalary can be reached from both Unit and Manager. Indeed, salarycomponents occur exactly two constructor levels below Unit and Manager.

The analysis is safe in that it is guaranteed to return all cases which are exe-cuted on some input; it is however an over-approximation, and so no guaranteeis provided that all returned cases are actually executed.

53

Page 54: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

The analysis proceeds by induction over the structure of strategies, and isparametrized by the Signature over which the strategy is evaluated. The anal-ysis crucially relies on the algebraic status of finite maps to define partialorders with general least upper bounds subject to the co-domain of the mapsbeing a lattice itself. (We use the set of all subsets of type-specific cases as theco-domain.) The bottom value of this partial order is the map that maps allvalues of the domain to the bottom value of the co-domain. Here we note thatsuch maps are an established tool in static program analysis. For instance,maps may be used in the analysis of imperative programs for property statesas opposed to concrete states for program variables as in [54,55].

The central part of the analysis is the treatment of the one-layer traversals All s

and One s. The reachable cases are determined separately for each possiblesort, and these per-sort results are finally combined in a map. For each givensort so, the recursive call of the analysis, analyse’ s, is exercised for all possiblesorts of immediate subterms of terms of sort so. Fixed point iteration willeventually reach all reachable sorts in this manner.

The analysis is conservative in so far that it distinguishes neither Seq fromChoice nor All from One in any manner. Also, the analysis assumes all con-structors to be universally feasible, which is generally not the case due totype-specific cases and their recursive application—think of the patterns inrewrite rules. As a result, certain reachability-related programming errors willgo unnoticed. Consider the following example:

stop td (Choice leanDepartment (try incSalary)) myCompany

For the sake of a concrete intuition, we assume that leanDepartment will pensionoff all eligible employees, if any. Here we assume that leanDepartment appliesto terms of sort Department and it succeeds for all such terms. As a result ofleanDepartment’s success at the department level of company terms, the stop-top-down traversal will never actually hit employees or salaries that are onlyto be found below departments.

This illustration clearly suggests that a success and failure analysis should beincorporated into the reachability analysis for precision’s sake. To this end, itwould be beneficial to know whether a type-specific case universally succeedsfor all terms of the sort in question. Further, the analysis should treat sequencedifferently from choice, and an ‘all’ traversal differently from a ‘one’ traversal.We omit such elaborations here.

5.3 Perform termination analysis

Advice 8 Curb various kinds of programming errors that may manifest them-selves as divergent traversals. These includes wrong decisions regarding the

54

Page 55: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

interpret (full_bu s) t! interpret (Rec (\x -> Seq (All (Var x)) s)) t! interpret (Seq (All (Var f)) s) t! interpret (All (Var f)) t >>= interpret s

Recursive closure is applied to a term t with a given measure.

Recursive reference is applied to subterms with smaller depth.

Fig. 30. Illustration of termination checking.

traversal scheme and misunderstood properties of rewrite rules. To this end,perform a static termination analysis that leverages appropriate measures inproving termination of recursive traversals.

That is, we seek an analysis that determines, conservatively, whether a givenrecursive strategy is guaranteed to converge. For instance, the intended analy-sis should infer the following convergence properties. A full bottom-up traver-sal converges regardless of the argument strategy, as long as the argumentstrategy itself converges universally. A full top-down traversal converges aslong as the argument strategy converges universally and does not increasesome suitable measure such as the depth of the term.

Based on experiences with modeling strategies in Isabelle/HOL [30], our anal-ysis for termination checking essentially leverages an induction principle; seeFigure 30 for an illustration. That is, the analysis is meant to verify that ameasure of the term (such as the depth of the term), as seen by the recursiveclosure as a whole, is decreased throughout the body of the recursive closureuntil the recursive reference is invoked. The figure shows a few steps of inter-pretation when applying a full bottom-up traversal to a term. The recursivereference is eventually applied to immediate subterms of the original term.Hence, the mere depth measure of terms is sufficient here for the inductionprinciple.

We will first develop a basic termination analysis that leverages the depth mea-sure. However, many traversals in strategic programming cannot be proven toconverge just on the grounds of depth. For instance, a program transformationfor macro expansion could be reasonably implemented with a full top-downtraversal, but such a traversal clearly increases term depth. Accordingly, wewill generalize the analysis to deal with measures other than depth.

55

Page 56: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

5.3.1 Measure relations on terms and strategies

The key concept of the termination analysis is a certain style of manipulatingmeasures symbolically. We will explain this concept here for the depth measurefor ease of understanding, but the style generalizes easily for other measures.

Based on the intuition of Figure 30, the analysis must track the depth measureof terms throughout the symbolic execution of a strategy so that the depth canbe tested at the point of recursive reference. That is, the depth must be shownto be smaller at the point of recursive reference than the point of enteringthe recursive closure. In this manner, we establish that the depth measure issmaller for each recursive unfolding, which implies convergence.

Obviously, the analysis cannot use actual depth, but it needs to use an abstractdomain. We use a finite domain Rel whose values describe the relation betweenthe depths of two terms: i) the term in a given position of the body of arecursive closure; ii) the term at the entrance of the recursive closure. Theidea is that the relation is initialized to ‘≤’ (or ‘=’ if the abstract domainprovided this option) as we enter the recursive closure, and it is accordinglyupdated as we symbolically interpret the body of the closure. Ultimately, weare interested in the relation at the point of recursive reference. These are thevalues of Rel; see Figure 31 for a full specification:

−− Relation on measuresdata Rel = Leq | Less | Any

One can think of the values as follows. The value Leq models that the depth ofthe term was not increased during the execution of the body of the recursiveclosure so far. The value Less models that the depth of the term was strictlydecreased instead. This is the relation that must hold at the point of therecursive reference. The value Any models that we do not know for certainwhether the depth was preserved, increased, or decreased.

So far we have emphasized a depth relation for terms. However, the analysisalso relies on depth relations for strategies. That is, we can use the samedomain Rel to describe the effect of a strategy on the depth. In this case, onecan think of the values as follows. The value Leq models that the strategy doesnot increase the depth of the term. The value Less models that the strategystrictly decreases the depth of term. The value Any models that we do not knowfor certain whether the strategy preserves, increases, or decreases depth.

For clarity, we use these type synonyms:

type TRel = Rel −− Term propertytype SRel = Rel −− Strategy property

56

Page 57: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

−− Relation on measuresdata Rel = Leq | Less | Any

−− Partial order and LUB for Relinstance POrd Relwhere

<= Any = TrueLess <= Leq = Truer <= r’ = r == r’

instance Lub Relwhere

lub Less Leq = Leqlub Leq Less = Leqlub r r’ = if r == r’ then r else Any

−− Rel arithmeticplus :: Rel −> Rel −> Relplus Less Less = Lessplus Less Leq = Lessplus Leq Less = Lessplus Leq Leq = Leqplus = Any

decrease :: Rel −> Reldecrease Leq = Lessdecrease Less = Lessdecrease Any = Any

increase :: Rel −> Relincrease Less = Leqincrease Leq = Anyincrease Any = Any

Fig. 31. Relation on measures such as depth of terms.

We set up the type of the analysis as follows:

type Abs = TRel −> Maybe TRelanalyse :: T SRel −> Abs

In fact, we would like to use variables of the T type again to capture effectsfor arguments of traversal schemes. Hence, we should distinguish recursivereferences from other references; we use an extra Boolean to this end; True

encodes recursive references. Thus:

57

Page 58: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

analyse :: T (SRel,Bool) −> Abs

At the top level, we begin analyzing a strategy expression (presumably a recur-sive closure) by assuming a TRel value of Leq. Also, we effectively provide typeinference in that we compute an SRel value for the given strategy expression.Thus:

typeOf :: T (SRel,Bool) −> Maybe SReltypeOf s = analyse s Leq

5.3.2 Termination analysis with the depth measure

The analysis is defined in Figure 32. The analysis can be viewed as an algorith-mically applicable type system which tracks effects on measures as types. Justin the same way as the standard semantics threads a term through evaluation,this analysis threads a term property of type TRel through symbolic evalua-tion. The case for Id preserves the property (because Id preserves the depth, infact, the term). The case for Fail decreases depth in a vacuous sense. The casefor Seq sequentially composes the effects for the operand strategies. The casefor Choice takes the least upper bound of the effects for the operand strate-gies. For instance, if one operand possibly increases depth, then the composedstrategy is stipulated to potentially increase depth.

The interesting cases are those for variables (including the case of recursivereferences), recursive closures and one-layer traversal. The case for Var checkswhether we face a recursive reference (i.e., b == True) because in this case wemust insist on the current TRel value to be Less. If this precondition holds,then the strategy property for the variable is combined with the current termproperty using the plus operation (say, addition) for type Rel.

The case for Rec essentially resets the TRel value to Leq; see analyse ... Leq,and attempts the analysis of the body for all possible assumptions about theeffect of the recursive references; see [Less,Leq,Any]. If the analysis returns withany computed effect, then this result is required to be less or equal to the oneassumed for the recursive reference; see ‘<=’. The ordering on the attemptsimplies that the least constraining type is inferred (i.e., the largest value ofSRel).

Finally, the cases for All and One are handled identically as follows. The termproperty is temporarily decreased as the argument strategy is symbolicallyevaluated, and the resulting term property is again increased on the way out.This models the fact that argument strategies of ‘all’ and ‘one’ are only appliedto subterms. It is important to understand that the operations increase anddecrease are highly constrained. In particular, once the TRel value has reached

58

Page 59: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

analyse :: T (SRel, Bool) −> Absanalyse Id = Justanalyse Fail = const (Just Less)analyse (Seq s s ’) = maybe Nothing (analyse s’) . analyse s

analyse (Choice s s ’)= \r −>

case (analyse s r, analyse s’ r) of(Just r1, Just r2) −> Just (lub r1 r2)

−> Nothing

analyse (Var (r,b))= \r’ −>

if not b || r’ < Leqthen Just (plus r’ r)else Nothing

analyse (Rec f)= \r −> maybe Nothing (Just . plus r) (typeOfClosure r)wheretypeOfClosure r = if null attempts

then Nothingelse Just (head attempts)

whereattempts = catMaybes (map wtClosure’ [Less,Leq,Any])wtClosure r = maybe False (<=r) (analyse (f (r,True)) Leq)wtClosure’ r = if wtClosure r then Just r else Nothing

analyse (All s) = transform (analyse s)analyse (One s) = transform (analyse s)

transform :: Abs −> Abstransform f r = maybe Nothing (Just . increase) (f (decrease r))

Fig. 32. A static analysis for termination checking relative to the depthmeasure for terms.

Any, decrease does not get us back onto a termination-proven path.

The analysis is able to infer termination types for a range of interesting traver-sal scenarios; see Figure 33. The table clarifies that a full bottom-up traversalmakes no assumption about the measure effect of the argument strategy, whilea full top-down traversal does not get assigned a termination type for an un-constrained argument; see the occurrence of Nothing.

The depth measure is practically useless for typical applications of repeat, but

59

Page 60: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

s :: Any s :: Leq s :: Less

full bu s Just Any Just Leq Just Less

full td s Nothing Just Leq Just Leq

stop td s Just Any Just Leq Just Leq

once bu s Just Any Just Leq Just Leq

repeat s Nothing Nothing Just Leq

innermost s Nothing Nothing Nothing

Fig. 33. Exercising the termination analysis on traversal schemes.

we can observe nevertheless that an application of repeat will only terminate,if its argument is ‘strictly decreasing’. This property is useful once we takeinto account other measures.

At this point, we are not yet able to find any terminating use case for innermost.This is not surprising because innermost composes repeat and once bu, wherethe former requires a strictly decreasing strategy (i.e., Less), and the latter canonly be type-checked with Leq as the termination type. Compound measures,as discussed below, come to the rescue.

5.3.3 Termination analysis with compound measures

The static analysis can be elaborated to deal with measures other than depth.In this paper, we demonstrate a measure of the number of occurrences of aspecific constructor in a term, i.e., the constructor count. This sort measureis applicable to transformation scenarios where a specific constructor is sys-tematically eliminated, e.g., in the sense of macro expansion.

This approach could also be generalized to deal with a measure for the numberof matches for a specific pattern in a term, such as the LHS of a rewrite rule.This elaboration is not discussed any further though.

The general idea is to associate strategy expressions and argument strategiesof schemes in a traversal program with a suitable measure. In this paper, werequire that the programmer provides the measure, but ultimately such mea-sures may be inferred. We need a new type, Measure, to represent compoundmeasure in a list-like structure. Here, we assume that the depth measure al-ways appears in the last (least significant) position.

data Measure = Depth | Count Constr Measuretype Constr = String

The type of measure transformation and the signature of the program analysismust be changed such that we use non-empty lists of values of type TRel

60

Page 61: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

s ::[ Less,Any]

full td s Just [Less,Any]

once bu s Just [Less,Any]

repeat s Just [Leq,Any]

innermost s Just [Leq,Any]

Fig. 34. Exercising compound measures.

as opposed to singletons before, thereby accounting for compound measures.Thus:

type Abs = [TRel] −> Maybe [TRel]analysis :: T ([SRel],Bool) −> Abs

The initial list of type [TRel] is trivially obtained from (the length of) themeasure by a function like this:

leqs Depth = [Leq]leqs (Count m) = Leq : leqs m

Thus, the analysis computes the effects of the strategy while assuming thatthe declared measure holds for the argument. At this level of development, theanalysis is oblivious to the actual constructor names because T does not involveany expressiveness for dealing with specific constructors. Measure claims areto be verified for given rewrite rules that serve as arguments, but this part isomitted here.

The power of such termination types is illustrated in Figure 34. The new typefor full td shows that we do not rely on the argument strategy to be non-increasing on the depth; we may as well use a depth-increasing argument,as long as it is non-increasing on some constructor count. The new type foronce bu is strictly decreasing, and hence, its iteration with repeat results in atermination type for innermost. We refer to the paper’s online code distributionfor details.

6 Related work

We will focus here on related work that deals with or is directly applicableto programming errors in traversal programming with traversal strategies orotherwise.

Simplified traversal programming In an effort to manage the relativecomplexity of traversal strategies or the generic functions of “Scrap Your

61

Page 62: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

Boilerplate”, simplified forms of traversal programming have been proposed.The functional programming-based work of [52] describes less generic traversalschemes (akin to those of §4.1), and thereby suffices with simpler types, andmay achieve efficiency of traversal implementation more easily. The rewriting-based work of [67] follows a different route; it limits programmability of traver-sal by providing only a few schemes and few parameters. Again, such a systemmay be easier to grasp for the programmer, and efficiency of traversal imple-mentation may be achievable more easily. Our work is best understood as anattempt to uphold the generality or flexibility and relative simplicity (in termsof language constructs involved) of traversal strategies while using orthogonalmeans such as advanced typing or static analysis for helping with programcomprehension.

Implicit strategy extension It is fair to say that several challenges relateto strategy extension. When traversal schemes are not parameterized withgeneric functions, as mentioned above, then strategy extension is no longerneeded by the programmer, but expressiveness will be limited. There exists aproposal [17] for a language design that essentially makes strategy extensionimplicit in an otherwise typeful setting. This approach may imply a slightlysimpler programming model, but it is not obvious that it necessarily reducesprogramming errors. This question remains open, but we refer to [35] for a(dated) discussion of the matter.

Runtime checks With respect to our various efforts to declare, infer, andenforce fallibility properties it is useful to note that Stratego supports thewith operator as alternative to the where clause for conditional rules (and asstrategy combinator as well), which indicates that its argument should be atransformation that always succeeds. When it does not succeed, a run-timeexception is raised. It is reported, anecdotally, for example, by a reviewer ofthis paper, that this feature has been helpful for detection of programmingerrors. Such annotations can be useful for expressing the intent of programmersand may be useful both for runtime checking and as input to static analysesin future systems.

Adaptive programming While the aforementioned related work is (tran-sitively) inspired by traversal strategies a la Stratego, there is the independenttraversal programming approach of adaptive programming [56,47,42]. Traver-sal specifications are more disciplined in this paradigm. Generally, the focus ismore on problem-specific traversal specifications as opposed to generic traver-sal schemes. Also, there is a separation between traversal specifications andcomputations or actions, thereby simplifying some analyses, e.g., a termina-tion analysis. The technique of §4.5 to statically check for reachable types isinspired by adaptive programming. Certain categories of programming errorsare less of an issue, if any, in adaptive programming. Interestingly, there arerecent efforts to evolve adaptive programming, within the bounds of functional

62

Page 63: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

object-oriented programming, to a programming paradigm that is appears tobe more similar to strategic programming [1].

The XML connection Arguably, the most widely used kinds of traversalprograms in practice are XML queries and transformations such as those basedon XPath [73], XSLT [75], and XQuery [74]. Some limited cross-paradigmaticcomparison of traversal strategies and XML programming has been presentedin [36,15]. Most notably, there are related problems in XML programming. Forinstance, one would like to know that an XPath query does not necessarilyreturn the empty node set. The XQuery specification [74] even addresses thisissue, to some extent, in the XQuery type system. While XSLT also inherits allXPath-related issues (just as much as XQuery), there is an additional challengedue to its template mechanism. Default templates, in particular, provide akind of traversal capability. Let us mention related work on analyzing XMLprograms. [18] describes program analysis for XSLT based on an analysis of thecall graph of the templates and the underlying DTD for the data. In commonwith our work is a conservative estimation of sufficient conditions for programtermination as well as some form dead code analysis. There is recent work onlogics for XML [19] to perform static analysis of XML paths and types [21],and a dead code elimination for XQuery programs [20].

Properties of traversal programs Mentions of algebraic laws and otherproperties of strategic primitives and some traversal schemes appear in theliterature on Stratego-like strategies [70,41,72,67,29,38,35,58,15,30]. In [15],laws feed into automated program calculation for the benefit of optimization(“by specialization”) and reverse engineering (so that generic programs areobtained from boilerplate code). In [29], specialized laws of applications oftraversal schemes are leveraged to enable fusion-like techniques for optimizingstrategies. We contend that better understanding of properties of traversalstrategies, just by itself, curbs programming errors, but none of these previ-ous efforts have linked properties to programming errors. Also, there is noprevious effort on performing static analysis for the derivation of general pro-gram properties about termination, metadata-based reachability, or successand failure behavior.

Termination analysis Our termination analysis is arguably naive in that itfocuses on recursion patterns of traversals. A practical system for a full-fledgedstrategic programming language would definitely need to include existing tech-niques for termination analysis, as they are established in the programminglanguages and rewriting communities. We mention some recent work in theadjacency of (functional) traversal strategies. [23,66] address termination anal-ysis for rewriting with strategies (but without covering programmable traver-sal strategies). [62,22] address termination analysis for higher-order functionalprograms; it may be possible to extend these systems with awareness for one-layer traversal and generic functions for traversal. [2] addresses termination

63

Page 64: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

analysis for generic functional programs of the kind of Generic Haskell [25];the approach is based on type-based termination and exploits the fact thatgeneric functions are defined by induction on types, which is not directly thecase for traversal strategies, though.

7 Concluding remarks

The ultimate motivation for the work presented here is to make traversalprogramming with strategies easier and safer. To this end, strategy librariesand the underlying programming languages need to improve so that contractsof traversal strategies are accurately captured and statically verified. That is,we envisage that “design by contract” is profoundly instantiated for traversalprogramming with strategies. These contracts would deal, for example, with(in)fallibility or measures for termination.

Throughout the paper, we have revealed pitfalls of strategic programmingand discovered related properties of basic strategy combinators and commonlibrary combinators. To respond to these, we have developed concrete adviceon suggested improvements to strategy libraries and the underlying program-ming languages:

• Hard-wire defaults into traversal schemes. (§4.1)• Declare and check fallibility contracts. (§4.2)• Reserve fallibility for modeling control flow. (§4.3)• Enable families of type-specific cases. (§4.4)• Declare and check reachability contracts. (§4.5)• Perform fallibility analysis. (§5.1)• Perform reachability analysis. (§5.2)• Perform termination analysis. (§5.3)

Such advice comes without any claim of completeness. Also, such advice mustnot be confused with a proper design for the ultimate library and language.

Arguably, some improvements can be achieved by revising strategy librariesso that available static typing techniques are leveraged. We demonstrated thispath with several Haskell-based experiments. This path is limited in several re-spects. First, only part of the advice can be addressed in this manner. Second,the typing techniques may not be generally available for languages used intraversal programming. Third, substantial encoding effort is needed in severalcases.

Hence, our work suggests that type systems of programming languages needto become more expressive so that all facets of traversal contracts can be cap-

64

Page 65: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

tured at an appropriate level of abstraction and statically verified in a mannerthat also accounts for language usability. We assert that, in fact, strategic pro-gramming is in need of a form of dependent types, an extensible type system,or, indeed, an extensible language framework that admits pluggable staticanalysis.

The development of the paper is based on a series of programming errors asthey arise from a systematic discussion of the process of design and implemen-tation of strategic programs. Empirical research would be needed to confirmthe relevance of the alleged pitfalls. However, based on the authors’ experiencewith strategic programming in research, education, and software development,the authors can confirm that there is anecdotal evidence for the existence ofthe presented problems.

In this paper, we have largely ignored another challenge for strategic program-ming, namely performance. In fact, disappointing performance may count asanother kind of programming error. Better formal and pragmatic understand-ing of traversal programming is needed to execute traversal strategies moreefficiently. Hence, performance is suggested as a major theme for future work.There is relevant, previous work on fusion-like techniques for traversal strate-gies [29], calculational techniques for the transformation of traversal strategies[15], and complementary ideas from the field of adaptive programming [47].

Acknowledgments. Simon Thompson has received support by the Vrije Universiteit, Am-sterdam for a related research visit in 2004. The authors received helpful feedback from theLDTA reviewers and the discussion at the workshop. Thanks are due to the reviewers of theinitial journal submission who provided substantial advice. Part of the material has beenused in Ralf Lammel’s invited talk at LOPSTR/PPDP 2009, and all feedback is gratefullyacknowledged. Much of the presented insights draw from past collaboration and discussionswith Simon Peyton Jones, Karl Lieberherr, Claus Reinke, Eelco Visser, Joost Visser, andVictor Winter.

References

[1] A. Abdelmeged and K. J. Lieberherr. Recursive adaptive computations using per objectvisitors. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, pages825–826. ACM, 2007.

[2] A. Abel. Type-based termination of generic programs. Science of ComputerProgramming, 74(8):550–567, 2009.

[3] E. Balland, P. Brauner, R. Kopetz, P.-E. Moreau, and A. Reilles. Tom: PiggybackingRewriting on Java. In Term Rewriting and Applications, 18th International Conference,RTA 2007, Proceedings, volume 4533 of LNCS, pages 36–47. Springer, 2007.

[4] E. Balland, P.-E. Moreau, and A. Reilles. Rewriting strategies in java. ENTCS, 219:97–111, 2008.

[5] G. Bierman, E. Meijer, and W. Schulte. The Essence of Data Access in Cω. InECOOP’05, Object-Oriented Programming, 19th European Conference, Proceedings,volume 3586 of LNCS, pages 287–311. Springer, 2005.

65

Page 66: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

[6] P. Borovansky, C. Kirchner, H. Kirchner, P.-E. Moreau, and C. Ringeissen. An Overviewof ELAN. In C. Kirchner and H. Kirchner, editors, Proceedings of the InternationalWorkshop on Rewriting Logic and its Applications (WRLA’98), volume 15 of ENTCS.Elsevier Science, 1998.

[7] P. Borovansky, C. Kirchner, H. Kirchner, and C. Ringeissen. Rewriting with strategiesin ELAN: a functional semantics. International Journal of Foundations of ComputerScience, 2001.

[8] A. Bove, P. Dybjer, and U. Norell. A Brief Overview of Agda — A FunctionalLanguage with Dependent Types. In Proceedings of the 22nd International Conferenceon Theorem Proving in Higher Order Logics, TPHOLs ’09, volume 5674 of LNCS, pages73–78. Springer, 2009.

[9] M. Brand, M. Sellink, and C. Verhoef. Generation of components for software renovationfactories from context-free grammars. In I. Baxter, A. Quilici, and C. Verhoef, editors,Proceedings Fourth Working Conference on Reverse Engineering, pages 144–153, 1997.

[10] M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser. Stratego/XT 0.16:components for transformation systems. In PEPM’06: Proceedings of the 2006ACM SIGPLAN Workshop on Partial Evaluation and Semantics-based ProgramManipulation, pages 95–99. ACM, 2006.

[11] J. R. Cordy. The TXL source transformation language. Science of ComputerProgramming, 61(3):190–210, 2006.

[12] P. Cousot. Abstract Interpretation. ACM Comput. Surv., 28(2):324–328, 1996.

[13] P. Cousot and R. Cousot. Basic concepts of abstract interpretation. In Building theInformation Society, IFIP 18th World Computer Congress, 2004, Topical Sessions,Proceedings, pages 359–366. Kluwer, 2004.

[14] K. Crary and S. Weirich. Flexible type analysis. In ICFP ’99: Proceedings of the fourthACM SIGPLAN international conference on Functional programming, pages 233–248.ACM, 1999.

[15] A. Cunha and J. Visser. Transformation of structure-shy programs: applied to XPathqueries and strategic functions. In PEPM’07: Proceedings of the 2007 ACM SIGPLANWorkshop on Partial Evaluation and Semantics-based Program Manipulation, pages11–20. ACM, 2007.

[16] B. C. d. S. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits.In Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-OrientedProgramming, Systems, Languages, and Applications, OOPSLA 2010, pages 341–360.ACM, 2010.

[17] E. Dolstra and E. Visser. First-class Rules and Generic Traversal. Technical ReportUU-CS-2001-38, Institute of Information and Computing Sciences, Utrecht University,2001.

[18] C. Dong and J. Bailey. Static Analysis of XSLT Programs. In K.-D. Scheweand H. Williams, editors, Fifteenth Australasian Database Conference (ADC2004),Conferences in Research and Practice in Information Technology. Australian ComputerSociety, Inc., 2004.

[19] P. Geneves. Logics for XML. PhD thesis, Institut National Polytechnique de Grenoble,2006.

[20] P. Geneves and N. Layaıda. Eliminating Dead-Code from XQuery Programs. InICSE’10, Proceedings of the ACM/IEEE 32nd International Conference on SoftwareEngineering. ACM, 2010.

[21] P. Geneves, N. Layaıda, and A. Schmitt. Efficient Static Analysis of XML Pathsand Types. In PLDI ’07: Proceedings of the 2007 ACM SIGPLAN conference onProgramming language design and implementation, pages 342–351. ACM, 2007.

[22] J. Giesl, S. Swiderski, P. Schneider-Kamp, and R. Thiemann. Automated TerminationAnalysis for Haskell: From Term Rewriting to Programming Languages. In TermRewriting and Applications, 17th International Conference, RTA 2006, Proceedings,volume 4098 of LNCS, pages 297–312. Springer, 2006.

66

Page 67: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

[23] I. Gnaedig and H. Kirchner. Termination of rewriting under strategies. ACMTransactions on Computational Logic, 10(2), 2009.

[24] R. Hinze. A New Approach to Generic Functional Programming. In T. Reps, editor,Proceedings of the 27th Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, Boston, Massachusetts, January 19-21, pages 119–132, 2000.

[25] R. Hinze. A new approach to generic functional programming. In POPL ’00: Proceedingsof the 27th ACM SIGPLAN-SIGACT symposium on Principles of programminglanguages, pages 119–132. ACM, 2000.

[26] R. Hinze and A. Loh. ”Scrap Your Boilerplate” Revolutions. In Proceedings,Mathematics of Program Construction, 8th International Conference, MPC 2006,volume 4014 of LNCS, pages 180–208. Springer, 2006.

[27] R. Hinze, A. Loh, and B. C. D. S. Oliveira. ”Scrap Your Boilerplate” Reloaded.In FLOPS’06: Proceedings of Functional and Logic Programming, 8th InternationalSymposium, volume 3945 of LNCS, pages 13–29. Springer, 2006.

[28] P. Jansson and J. Jeuring. PolyP - a polytypic programming language extension.In POPL ’97: The 24th ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, pages 470–482. ACM, 1997.

[29] P. Johann and E. Visser. Strategies for Fusing Logic and Control via Local,Application-Specific Transformations. Technical Report UU-CS-2003-050, Departmentof Information and Computing Sciences, Utrecht University, 2003.

[30] M. Kaiser and R. Lammel. An Isabelle/HOL-based model of stratego-like traversalstrategies. In PPDP ’09: Proceedings of the 11th ACM SIGPLAN conference onPrinciples and practice of declarative programming, pages 93–104. ACM, 2009.

[31] L. C. L. Kats, A. M. Sloane, and E. Visser. Decorated Attribute Grammars:Attribute Evaluation Meets Strategic Programming. In Compiler Construction, 18thInternational Conference, CC 2009, Proceedings, volume 5501 of LNCS, pages 142–157.Springer, 2009.

[32] O. Kiselyov, R. Lammel, and K. Schupke. Strongly typed heterogeneous collections.In Haskell’04: Proceedings of the ACM SIGPLAN workshop on Haskell, pages 96–107.ACM, 2004.

[33] R. Lammel. The Sketch of a Polymorphic Symphony. In WRS’02: Proceedingsof International Workshop on Reduction Strategies in Rewriting and Programming,volume 70 of ENTCS. Elsevier Science, 2002. 21 pages.

[34] R. Lammel. Towards generic refactoring. In Proceedings of the 2002 ACM SIGPLANWorkshop on Rule-Based Programming, pages 15–28. ACM, 2002.

[35] R. Lammel. Typed generic traversal with term rewriting strategies. Journal Logic andAlgebraic Programming, 54(1-2):1–64, 2003.

[36] R. Lammel. Scrap your boilerplate with XPath-like combinators. In POPL ’07:Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles ofprogramming languages, pages 137–142. ACM, 2007.

[37] R. Lammel. Scrap your boilerplate with XPath-like combinators. In POPL’07:Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles ofprogramming languages, pages 137–142. ACM, 2007.

[38] R. Lammel and S. L. Peyton Jones. Scrap your boilerplate: a practical design patternfor generic programming. In TLDI’03: Proceedings of the 2003 ACM SIGPLANinternational workshop on Types in languages design and implementation, pages 26–37. ACM, 2003.

[39] R. Lammel and S. L. Peyton Jones. Scrap more boilerplate: reflection, zips, andgeneralised casts. In ICFP’04: Proceedings of the ninth ACM SIGPLAN internationalconference on Functional programming, pages 244–255. ACM, 2004.

[40] R. Lammel and S. L. Peyton Jones. Scrap your boilerplate with class: extensiblegeneric functions. In ICFP’05: Proceedings of the tenth ACM SIGPLAN internationalconference on Functional programming, pages 204–215. ACM, 2005.

[41] R. Lammel, E. Visser, and J. Visser. The essence of strategic programming – an inquiryinto trans-paradigmatic genericity. Draft, 2002–2003.

67

Page 68: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

[42] R. Lammel, E. Visser, and J. Visser. Strategic Programming Meets AdaptiveProgramming. In AOSD’03: Conference proceedings of Aspect-Oriented SoftwareDevelopment, pages 168–177. ACM, 2003.

[43] R. Lammel and J. Visser. Typed Combinators for Generic Traversal. In PADL’02:Proceedings of Practical Aspects of Declarative Programming, volume 2257 of LNCS,pages 137–154. Springer, 2002.

[44] R. Lammel and J. Visser. A Strafunski Application Letter. In PADL’03: Proceedings ofPractical Aspects of Declarative Programming, volume 2562 of LNCS, pages 357–375.Springer, 2003.

[45] D. Leijen. HMF: simple type inference for first-class polymorphism. In Proceeding ofthe 13th ACM SIGPLAN international conference on Functional programming, ICFP2008, pages 283–294. ACM, 2008.

[46] H. Li, S. Thompson, and C. Reinke. The Haskell Refactorer, HaRe, and its API.ENTCS, 141(4):29–34, 2005.

[47] K. J. Lieberherr, B. Patt-Shamir, and D. Orleans. Traversals of object structures:Specification and Efficient Implementation. ACM Transactions on ProgrammingLanguages and Systems, 26(2):370–412, 2004.

[48] B. Luttik and E. Visser. Specification of rewriting strategies. In M. P. A. Sellink, editor,2nd International Workshop on the Theory and Practice of Algebraic Specifications(ASF+SDF’97), Electronic Workshops in Computing. Springer, 1997.

[49] S. Marlow. An extensible dynamically-typed hierarchy of exceptions. In Haskell ’06:Proceedings of the 2006 ACM SIGPLAN workshop on Haskell, pages 96–106. ACM,2006.

[50] C. McBride. Faking it: Simulating dependent types in Haskell. J. Funct. Program.,12(4&5):375–392, 2002.

[51] C. McBride. Epigram: Practical Programming with Dependent Types. In AdvancedFunctional Programming, 5th International School, AFP 2004, Revised Lectures, volume3622 of LNCS, pages 130–170. Springer, 2005.

[52] N. Mitchell and C. Runciman. Uniform boilerplate and list processing. In Haskell ’07:Proceedings of the ACM SIGPLAN workshop on Haskell workshop, pages 49–60. ACM,2007.

[53] G. Munkby, A. P. Priesnitz, S. Schupp, and M. Zalewski. Scrap++: scrap yourboilerplate in C++. In Proceedings of the ACM SIGPLAN Workshop on GeneticProgramming, WGP 2006, pages 66–75. ACM, 2006.

[54] F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer,2005.

[55] H. R. Nielson and F. Nielson. Semantics with Applications (An Appetizer). Springer,2007.

[56] J. Palsberg, B. Patt-Shamir, and K. J. Lieberherr. A New Approach to CompilingAdaptive Programs. Science of Computer Programming, 29(3):303–326, 1997.

[57] S. L. Peyton Jones, D. Vytiniotis, S. Weirich, and M. Shields. Practical type inferencefor arbitrary-rank types. J. Funct. Program., 17(1):1–82, 2007.

[58] F. Reig. Generic proofs for combinator-based generic programs. In Trends in FunctionalProgramming, pages 17–32, 2004.

[59] D. Ren and M. Erwig. A generic recursion toolbox for Haskell or: scrap your boilerplatesystematically. In Haskell’06: Proceedings of the 2006 ACM SIGPLAN workshop onHaskell, pages 13–24. ACM, 2006.

[60] A. Rodriguez, J. Jeuring, P. Jansson, A. Gerdes, O. Kiselyov, and B. C. d. S. Oliveira.Comparing libraries for generic programming in Haskell. In Proceedings of the 1st ACMSIGPLAN Symposium on Haskell, Haskell 2008, pages 111–122. ACM, 2008.

[61] B. G. Ryder and M. L. Soffa. Influences on the design of exception handling: ACMSIGSOFT project on the impact of software engineering research on programminglanguage design. SIGPLAN Notices, 38(6):16–22, 2003.

68

Page 69: Programming errors in traversal programs over structured datalaemmel/syb42/paper.pdfThe Stratego approach inspired strategic programming approaches for other programming paradigms

[62] D. Sereni and N. D. Jones. Termination Analysis of Higher-Order Functional Programs.In Programming Languages and Systems, Third Asian Symposium, APLAS 2005,Proceedings, volume 3780 of LNCS, pages 281–297. Springer, 2005.

[63] C.-c. Shan. Sexy types in action. SIGPLAN Not., 39:15–22, 2004.

[64] M. Shields and E. Meijer. Type-indexed rows. In Proceedings of the 28th ACMSIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’01,pages 261–275. ACM, 2001.

[65] P. Thiemann. Programmable type systems for domain specific languages. ENTCS,76:233–251, 2002.

[66] R. Thiemann and C. Sternagel. Loops under Strategies. In Rewriting Techniques andApplications, 20th International Conference, Proceedings, volume 5595 of LNCS, pages17–31. Springer, 2009.

[67] M. van den Brand, P. Klint, and J. J. Vinju. Term rewriting with traversal functions.ACM Transactions Software Engineering Methodology, 12(2):152–190, 2003.

[68] E. Visser. Language Independent Traversals for Program Transformation. InProceedings of WGP’2000, Technical Report, Universiteit Utrecht, pages 86–104, 2000.

[69] E. Visser. Program Transformation with Stratego/XT: Rules, Strategies, Tools,and Systems in Stratego/XT 0.9. In Domain-Specific Program Generation, DagstuhlSeminar, 2003, Revised Papers, volume 3016 of LNCS, pages 216–238. Springer, 2004.

[70] E. Visser, Z. Benaissa, and A. Tolmach. Building program optimizers with rewritingstrategies. In ICFP’98: Proceedings of the third ACM SIGPLAN internationalconference on Functional programming, pages 13–26. ACM, 1998.

[71] E. Visser and Z.-e.-A. Benaissa. A Core Language for Rewriting. In Second InternationalWorkshop on Rewriting Logic and its Applications (WRLA 1998), volume 15 of ENTCS.Elsevier Science, 1998.

[72] J. Visser. Generic Traversal over Typed Source Code Representations. PhD thesis,University of Amsterdam, 2003.

[73] W3C. XML Path Language (XPath) Version 1.0. Available online at http://www.w3.org/TR/xpath/, W3C Recommendation 16 November 1999.

[74] W3C. XQuery 1.0: An XML Query Language. Available online at http://www.w3.org/TR/xquery/, W3C Recommendation 23 January 2007.

[75] W3C. XSL Transformations (XSLT) Version 2.0. Available online at http://www.w3.org/TR/xslt20/, W3C Recommendation 23 January 2007.

[76] V. L. Winter. Strategy Construction in the Higher-Order Framework of TL. ENTCS,124(1):149–170, 2005.

[77] V. L. Winter and J. Beranek. Program Transformation Using HATS 1.84. InGenerative and Transformational Techniques in Software Engineering, InternationalSummer School, GTTSE 2005, Revised Papers, volume 4143 of LNCS, pages 378–396.Springer, 2006.

[78] V. L. Winter, J. Beranek, F. Fraij, S. Roach, and G. L. Wickstrom. A transformationalperspective into the core of an abstract class loader for the SSP. ACM Trans. EmbeddedComput. Syst., 5(4):773–818, 2006.

[79] V. L. Winter and M. Subramaniam. The transient combinator, higher-order strategies,and the distributed data problem. Science of Computer Programming, 52:165–212,2004.

[80] D. N. Xu, S. L. Peyton Jones, and K. Claessen. Static contract checking for Haskell.In Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL 2009, pages 41–52. ACM, 2009.

69