Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation

Optimizing and IncrementalizingHigher-order Collection eries

by AST Transformation

Dissertation

der Mathematisch- und Naturwissenschalichen Fakultaumlt

der Eberhard Karls Universitaumlt Tuumlbingen

zur Erlangung des Grades eines

Doktors der Naturwissenschaen

(Dr rer nat)

vorgelegt von

Paolo G Giarrusso

aus Catania (Italien)

Tuumlbingen

Gedruckt mit Genehmigung der Mathematisch-Naturwissenschalichen Fakultaumlt

Tag der muumlndlichen alifikation mm2017

Dekan Prof Dr Wolfgang Rosenstiel

1 Berichterstaer Prof Dr Klaus Ostermann

2 Berichterstaer Prof Dr Fritz Henglein

To the memory of my mother Rosalia

Alla memoria di mia madre Rosalia

Abstract

In modern programming languages queries on in-memory collections are often more expensive thanneeded While database queries can be readily optimized collection queries often donrsquot translatetrivially to databases as modern programming languages are far more expressive than databasessupporting both nested data and rst-class functions

Collection queries can be optimized and incrementalized by hand but this reduces modularityand is often too error-prone to be feasible or to enable maintenance of resulting programs In thisthesis we optimize and incrementalize such collection queries to free programmers from suchburdens Resulting programs are expressed in the same core language so that they can be subjectedto other standard optimizations

To enable optimizing collection queries which occur inside programs we develop a stagedvariant of the Scala collection API that reies queries as ASTs On top of this interface we adaptdomain-specic optimizations from the elds of programming languages and databases amongothers we rewrite queries to use indexes chosen by programmers Thanks to the use of indexes weshow signicant speedups in our experimental evaluation with an average of 12x and a maximumof 12800x

To incrementalize higher-order programs by program transformation we extend nite dier-encing [Paige and Koenig 1982 Blakeley et al 1986 Gupta and Mumick 1999] and develop therst approach to incrementalization by program transformation for higher-order programs Baseprograms are transformed to derivatives programs that transform input changes to output changesWe prove that our incrementalization approach is correct We develop the theory underlying thisapproach for simply-typed and untyped λ-calculus and discuss extensions to System F

Derivatives often need to reuse results produced by base programs to enable such reuse weextend work by Liu and Teitelbaum [1995] to higher-order programs and develop and prove correcta program transformation converting higher-order programs to cache-transfer-style

For ecient incrementalization it is necessary to choose and incrementalize by hand appropriateprimitive operations We incrementalize a signicant subset of collection operations and performcase studies showing order-of-magnitude speedups both in practice and in asymptotic complexity

Zusammenfassung

In modernen universellen Programmiersprachen sind Abfragen auf Speicher-basierten Kollektio-nen oft rechenintensiver als erforderlich Waumlhrend Datenbankenabfragen vergleichsweise einfachoptimiert werden koumlnnen faumlllt dies bei Speicher-basierten Kollektionen oft schwer denn universelleProgrammiersprachen sind in aller Regel ausdrucksstaumlrker als Datenbanken Insbesondere unter-stuumltzen diese Sprachen meistens verschachtelte rekursive Datentypen und Funktionen houmlhererOrdnung

Kollektionsabfragen koumlnnen per Hand optimiert und inkrementalisiert werden jedoch verringertdies haumlug die Modularitaumlt und ist oft zu fehleranfaumlllig um realisierbar zu sein oder um Instandhal-tung von entstandene Programm zu gewaumlhrleisten Die vorliegende Doktorarbeit demonstriert wieAbfragen auf Kollektionen systematisch und automatisch optimiert und inkrementalisiert werdenkoumlnnen um Programmierer von dieser Last zu befreien Die so erzeugten Programme werden inderselben Kernsprache ausgedruumlckt um weitere Standardoptimierungen zu ermoumlglichen

Teil I entwickelt eine Variante der Scala API fuumlr Kollektionen die Staging verwendet um Abfra-gen als abstrakte Syntaxbaumlume zu reizieren Auf Basis dieser Schnittstelle werden anschlieszligenddomaumlnenspezische Optimierungen von Programmiersprachen und Datenbanken angewandt unteranderem werden Abfragen umgeschrieben um vom Programmierer ausgewaumlhlte Indizes zu benutzenDank dieser Indizes kann eine erhebliche Beschleunigung der Ausfuumlhrungsgeschwindigkeit gezeigtwerden eine experimentelle Auswertung zeigt hierbei Beschleunigungen von durchschnittlich 12xbis zu einem Maximum von 12800x

Um Programme mit Funktionen houmlherer Ordnung durch Programmtransformation zu inkre-mentalisieren wird in Teil II eine Erweiterung der Finite-Dierenzen-Methode vorgestellt [Paigeand Koenig 1982 Blakeley et al 1986 Gupta and Mumick 1999] und ein erster Ansatz zur Inkre-mentalisierung durch Programmtransformation fuumlr Programme mit Funktionen houmlherer Ordnungentwickelt Dabei werden Programme zu Ableitungen transformiert dh zu Programmen die Ein-gangsdierenzen in Ausgangdierenzen umwandeln Weiterhin werden in den Kapiteln 12ndash13 dieKorrektheit des Inkrementalisierungsansatzes fuumlr einfach-getypten und ungetypten λ-Kalkuumll bewie-sen und Erweiterungen zu System F besprochen

Ableitungen muumlssen oft Ergebnisse der urspruumlnglichen Programme wiederverwenden Um einesolche Wiederverwendung zu ermoumlglichen erweitert Kapitel 17 die Arbeit von Liu and Teitelbaum[1995] zu Programmen mit Funktionen houmlherer Ordnung und entwickeln eine Programmtransfor-mation solcher Programme im Cache-Transfer-Stil

Fuumlr eine eziente Inkrementalisierung ist es weiterhin notwendig passende Grundoperationenauszuwaumlhlen und manuell zu inkrementalisieren Diese Arbeit deckt einen Groszligteil der wich-tigsten Grundoperationen auf Kollektionen ab Die Durchfuumlhrung von Fallstudien zeigt deutlicheLaufzeitverbesserungen sowohl in Praxis als auch in der asymptotischen Komplexitaumlt

Contents

Abstract v

Zusammenfassung vii

Contents ix

List of Figures xv

List of Tables xvii

1 Introduction 111 This thesis 2

111 Included papers 3112 Excluded papers 3113 Navigating this thesis 4

I Optimizing Collection Queries by Reication 7

2 Introduction 921 Contributions and summary 1022 Motivation 10

221 Optimizing by Hand 11

3 Automatic optimization with SQuOpt 1531 Adapting a Query 1532 Indexing 17

4 Implementing SQuOpt 1941 Expression Trees 1942 Optimizations 2143 Query Execution 22

5 A deep EDSL for collection queries 2351 Implementation Expressing the interface in Scala 2452 Representing expression trees 2453 Lifting rst-order expressions 2554 Lifting higher-order expressions 2755 Lifting collections 29

x Contents

56 Interpretation 3157 Optimization 32

6 Evaluating SQuOpt 3561 Study Setup 3562 Experimental Units 3763 Measurement Setup 4064 Results 40

7 Discussion 4371 Optimization limitations 4372 Deep embedding 43

721 What worked well 43722 Limitations 44723 What did not work so well Scala type inference 44724 Lessons for language embedders 46

8 Related work 4781 Language-Integrated Queries 4782 Query Optimization 4883 Deep Embeddings in Scala 48

831 LMS 4984 Code Querying 49

9 Conclusions 5191 Future work 51

II Incremental λ-Calculus 53

10 Introduction to dierentiation 55101 Generalizing the calculus of nite dierences 57102 A motivating example 57103 Introducing changes 58

1031 Incrementalizing with changes 59104 Dierentiation on open terms and functions 61

1041 Producing function changes 611042 Consuming function changes 621043 Pointwise function changes 621044 Passing change targets 62

105 Dierentiation informally 63106 Dierentiation on our example 64107 Conclusion and contributions 66

1071 Navigating this thesis part 661072 Contributions 66

Contents xi

11 A tour of dierentiation examples 69111 Change structures as type-class instances 69112 How to design a language plugin 70113 Incrementalizing a collection API 70

1131 Changes to type-class instances 711132 Incrementalizing aggregation 711133 Modifying list elements 731134 Limitations 74

114 Ecient sequence changes 75115 Products 75116 Sums pattern matching and conditionals 76

1161 Optimizing lter 78117 Chapter conclusion 78

12 Changes and dierentiation formally 79121 Changes and validity 79

1211 Function spaces 821212 Derivatives 831213 Basic change structures on types 841214 Validity as a logical relation 851215 Change structures on typing contexts 85

122 Correctness of dierentiation 861221 Plugin requirements 881222 Correctness proof 88

123 Discussion 911231 The correctness statement 911232 Invalid input changes 921233 Alternative environment changes 921234 Capture avoidance 92

124 Plugin requirement summary 94125 Chapter conclusion 95

13 Change structures 97131 Formally dening oplus and change structures 97

1311 Example Group changes 981312 Properties of change structures 99

132 Operations on function changes informally 1001321 Examples of nil changes 1001322 Nil changes on arbitrary functions 1011323 Constraining oplus on functions 102

133 Families of change structures 1021331 Change structures for function spaces 1021332 Change structures for products 104

134 Change structures for types and contexts 105135 Development history 107136 Chapter conclusion 107

xii Contents

14 Equational reasoning on changes 109141 Reasoning on changes syntactically 109

1411 Denotational equivalence for valid changes 1101412 Syntactic validity 111

142 Change equivalence 1141421 Preserving change equivalence 1141422 Sketching an alternative syntax 1171423 Change equivalence is a PER 118

143 Chapter conclusion 119

15 Extensions and theoretical discussion 121151 General recursion 121

1511 Dierentiating general recursion 1211512 Justication 122

152 Completely invalid changes 122153 Pointwise function changes 123154 Modeling only valid changes 124

1541 One-sided vs two-sided validity 125

16 Dierentiation in practice 127161 The role of dierentiation plugins 127162 Predicting nil changes 128

1621 Self-maintainability 128163 Case study 129164 Benchmarking setup 132165 Benchmark results 133166 Chapter conclusion 133

17 Cache-transfer-style conversion 135171 Introduction 135172 Introducing Cache-Transfer Style 136

1721 Background 1371722 A rst-order motivating example computing averages 1371723 A higher-order motivating example nested loops 140

173 Formalization 1411731 The source language λAL 1411732 Static dierentiation in λAL 1451733 A new soundness proof for static dierentiation 1461734 The target language iλAL 1471735 CTS conversion from λAL to iλAL 1511736 Soundness of CTS conversion 152

174 Incrementalization case studies 1541741 Averaging integers bags 1541742 Nested loops over two sequences 1561743 Indexed joins of two bags 157

175 Limitations and future work 1591751 Hiding the cache type 1591752 Nested bags 1591753 Proper tail calls 159

Contents xiii

1754 Pervasive replacement values 1591755 Recomputing updated values 1601756 Cache pruning via absence analysis 1601757 Unary vs n-ary abstraction 160

176 Related work 161177 Chapter conclusion 162

18 Towards dierentiation for System F 163181 The parametricity transformation 164182 Dierentiation and parametricity 166183 Proving dierentiation correct externally 167184 Combinators for polymorphic change structures 168185 Dierentiation for System F 171186 Related work 172187 Prototype implementation 172188 Chapter conclusion 172

19 Related work 173191 Dynamic approaches 173192 Static approaches 174

1921 Finite dierencing 1741922 λndashdi and partial dierentials 1751923 Static memoization 176

20 Conclusion and future work 177

Appendixes 181

A Preliminaries 181A1 Our proof meta-language 181

A11 Type theory versus set theory 182A2 Simply-typed λ-calculus 182

A21 Denotational semantics for STLC 184A22 Weakening 185A23 Substitution 186A24 Discussion Our mechanization and semantic style 186A25 Language plugins 187

B More on our formalization 189B1 Mechanizing plugins modularly and limitations 189

C (Un)typed ILC operationally 193C1 Formalization 195

C11 Types and contexts 196C12 Base syntax for λA 196C13 Change syntax for λ∆A 199C14 Dierentiation 199C15 Typing λArarr and λ∆Ararr 199C16 Semantics 200

xiv Contents

C17 Type soundness 200C2 Validating our step-indexed semantics 201C3 Validity syntactically (λArarr λ∆Ararr) 202C4 Step-indexed extensional validity (λArarr λ∆Ararr) 204C5 Step-indexed intensional validity 207C6 Untyped step-indexed validity (λA λ∆A) 210C7 General recursion in λArarr and λ∆Ararr 210C8 Future work 212

C81 Change composition 212C9 Development history 213C10 Conclusion 213

D Defunctionalizing function changes 215D1 Setup 215D2 Defunctionalization 217

D21 Defunctionalization with separate function codes 219D22 Defunctionalizing function changes 220

D3 Defunctionalization and cache-transfer-style 224

Acknowledgments 229

Bibliography 231

List of Figures

21 Denition of the schema and of some content 1122 Our example query on the schema in Fig 21 and a function which postprocesses

its result 1223 Composition of queries in Fig 22 after inlining query unnesting and hoisting 13

31 SOpt version of Fig 22 recordQuery contains a reication of the queryrecords its result 16

51 Desugaring of code in Fig 22 2952 Lifting TraversableLike 31

61 Measurement Setup Overview 3765 Find covariant equals methods 3966 A simple index denition 40

121 Dening dierentiation and proving it correct The rest of this chapter explainsand motivates the above denitions 80

131 Dening change structures 108

161 The λ-term histogram with Haskell-like syntactic sugar 128162 A Scala implementation of primitives for bags and maps 130163 Performance results in logndashlog scale 134

171 Source language λAL (syntax) 142172 Step-indexed big-step semantics for base terms of source language λAL 142173 Step-indexed big-step semantics for the change terms of the source language λAL 144174 Static dierentiation in λAL 145175 Step-indexed relation between values and changes 146176 Target language iλAL (syntax) 148177 Target language iλAL (semantics of base terms and caches) 149178 Target language iλAL (semantics of change terms and cache updates) 150179 Cache-Transfer Style (CTS) conversion from λAL to iλAL 1511710 Extending CTS translation to values change values environments and change

environments 1521711 Extension of an environment with cache values dF uarr C i dF prime (for i = 1 2) 153

181 A change structure that allows modifying list elements 170

xvi List of Figures

A1 Standard denitions for the simply-typed lambda calculus 183

C1 ANF λ-calculus λA and λ∆A 196C2 ANF λ-calculus λArarr and λ∆Ararr type system 197C3 ANF λ-calculus (λA and λ∆A) CBV semantics 198C4 Dening extensional validity via logical relations and big-step semantics 203C5 Dening extensional validity via step-indexed logical relations and big-step semantics 206C6 Dening extensional validity via untyped step-indexed logical relations and big-step

semantics 211

D1 A small example program for defunctionalization 217D2 Defunctionalized program 218D3 Implementing change structures using Codelike instances 226D4 Implementing FunOps using Codelike instances 227

List of Tables

62 Performance results As in in Sec 61 (1) denotes the modular Scala implementation(2) the hand-optimized Scala one and (3minus) (3o ) (3x ) refer to the SOpt imple-mentation when run respectively without optimizations with optimizations withoptimizations and indexing Queries marked with the R superscript were selectedby random sampling 38

63 Average performance ratios This table summarizes all interesting performanceratios across all queries using the geometric mean [Fleming and Wallace 1986]The meaning of speedups is discussed in Sec 61 39

64 Description of abstractions removed during hand-optimization and number of que-ries where the abstraction is used (and optimized away) 39

xviii List of Tables

Chapter 1

Introduction

Many program perform queries on collections of data and for non-trivial amounts of data it isuseful to execute the queries eciently When the data is updated often enough it can also be usefulto update the results of some queries incrementally whenever the input changes so that up-to-dateresults are quickly available even for queries that are expensive to execute

For instance a program manipulating anagraphic data about citizens of a country might needto compute statistics on them such as their average age and update those statistics when the set ofcitizens changes

Traditional relational database management systems (RDBMS) support both queries optimiza-tion and (in quite a few cases) incremental update of query results (called there incremental viewmaintenance)

However often queries are executed on collections of data that are not stored in a database butin collections manipulated by some program Moving in-memory data to RDBMSs typically doesnot improve performance [Stonebraker et al 2007 Rompf and Amin 2015] and reusing databaseoptimizers is not trivial

Moreover many programming languages are far more expressive than RDBMSs Typical RDBMScan only manipulate SQL relations that is multisets (or bags) of tuples (or sometimes sets of tuplesafter duplicate elimination) Typical programming languages (PL) support also arbitrarily nestedlists and maps of data and allow programmers to dene new data types with few restrictions atypical PL will also allow a far richer set of operations than SQL

However typical PLs do not apply typical database optimizations to collection queries and ifqueries are incrementalized this is often done by hand even though code implementing incrementalquery is error-prone and hard-to-maintain [Salvaneschi and Mezini 2013]

Whatrsquos worse some of these manual optimizations are best done over the whole program Someoptimizations only become possible after inlining which reduces modularity if done by hand1

Worse adding an index on some collection can speed up looking for some information buteach index must be maintained incrementally when the underlying data changes Depending onthe actual queries and updates performed on the collection and on how often they happen it mightturn out that updating an index takes more time than the index saves hence the choice of whichindexes to enable depends on the whole program However addingremoving an index requiresupdating all PL queries to use it while RDBMS queries can use an index transparently

Overall manual optimizations are not only eort-intensive and error-prone but they alsosignicantly reduce modularity

1In Chapter 2 we dene modularity as the ability to abstract behavior in a separate function (possibly part of a dierentmodule) to enable reuse and improve understandability

2 Chapter 1 Introduction

11 This thesisTo reduce the need for manual optimizations in this thesis we propose techniques for optimizinghigher-order collection queries and executing them incrementally By generalizing database-inspired approaches we provide approaches to query optimization and incrementalization by codetransformation to apply to higher-order collection queries including user-dened functions Insteadof building on existing approaches to incrementalization such as self-adjusting computation [Acar2009] we introduce a novel incrementalization approach which enables further transformationson the incrementalization results This approach is in fact not restricted to collection queries butapplicable to other domains but it requires adaptation for the domain under consideration Furtherresearch is needed but a number of case studies suggest applicability even in some scenarios beyondexisting techniques [Koch et al 2016]

We consider the problem for functional programming languages such as Haskell or Scala andwe consider collection queries written using the APIs of their collection libraries which we treat asan embedded domain-specic language (EDSL) Such APIs (or DSLs) contain powerful operators oncollections such as map which are higher-order that is they take as arguments arbitrary functionsin the host language that we must also handle

Therefore our optimizations and incrementalizations must handle programs in higher-orderEDSLs that can contain arbitrary code in the host language Hence many of our optimizationswill exploit on properties of our collection EDSL but will need to handle host language code Werestrict the problem to purely functional programs (without mutation or other side eects mostlyincluding non-termination) because such programs can be more ldquodeclarativerdquo and because avoidingside eects can enable more powerful optimization and simplify the work of the optimizer at thesame time

This thesis is divided into two parts

bull In Part I we optimize collection queries by static program transformation based on a set ofrewrite rules [Giarrusso et al 2013]

bull In Part II we incrementalize programs by transforming them to new programs This thesispresents the rst approach that handles higher-order programs through program transforma-tion hence while our main examples use collection queries we phrase the work in terms ofλ-calculi with unspecied primitives [Cai Giarrusso Rendel and Ostermann 2014] In Chap-ter 17 we extend ILC with a further program transformation step so that base programs canstore intermediate results and derivatives can reuse them but without resorting to dynamicmemoization and necessarily needing to look results up at runtime To this end we build onwork by Liu [2000] and extend it to a higher-order typed setting

Part II is more theoretical than Part I because optimizations in Part I are much better understoodthan our approach to incrementalization

To incrementalize programs we are the rst to extend to higher-order programs techniques basedon nite dierencing for queries on collections [Paige and Koenig 1982] and databases [Blakeleyet al 1986 Gupta and Mumick 1999] Incrementalizing by nite dierencing is a well-understoodtechnique for database queries How to generalize it for higher-order programs or beyond databaseswas less clear so we spend signicant energy on providing sound mathematical foundations forthis transformation

In fact it took us a while to be sure that our transformation was correct and to understand whyour rst correctness proof [Cai Giarrusso Rendel and Ostermann 2014] while a signicant stepwas still more complex than needed In Part II especially Chapter 12 we oer a mathematicallymuch simpler proof

Chapter 1 Introduction 3

Contributions of Part I are listed in Sec 21 Contributions of Part II are listed at the end ofSec 107

111 Included papersThis thesis includes material from joint work with colleagues

Part I is based on work by Giarrusso Ostermann Eichberg Mitschke Rendel and Kaumlstner [2013]and Chapters 2 to 4 6 8 and 9 come from that manuscript While the work was in collaborationa few clearer responsibilities arose I did most of the implementation work and collaborated tothe writing among other things I devised the embedding for collection operations implementedoptimizations and indexing implemented compilation when interpretation did not achieve sucientperformance and performed the performance evaluation Michael Eichberg and Ralf Mitschkecontributed to the evaluation by adapting FindBugs queries Christian Kaumlstner contributed amongother things to the experiment design for the performance evaluation Klaus Ostermann proposedthe original idea (together with an initial prototype) and supervised the project

The rst chapters of Part II are originally based on work by Cai Giarrusso Rendel and Ostermann[2014] though signicantly revised Chapter 10 contains signicantly revised text from that paperand Chapters 16 and 19 survive mostly unchanged This work was even more of a team eort Iinitiated and led the overall project and came up with the original notion of change structures CaiYufei contributed dierentiation itself and its rst correctness proof Tillmann Rendel contributedsignicant simplications and started the overall mechanization eort Klaus Ostermann providedsenior supervision that proved essential to the project Chapters 12 and 13 contain a novel correctnessproof for simply-typed λ-calculus for its history and contributions see Sec 135

Furthermore Chapter 17 comes from a recently submitted manuscript [Giarrusso Reacutegis-Gianasand Schuster Submitted] I designed the overall approach the transformation and the case studyon sequences and nested loops Proofs were done in collaboration with Yann Reacutegis-Gianas heis the main author of the Coq mechanization and proof though I contributed signicantly to thecorrectness proof for ILC in particular with the proofs described in Appendix C and their partialAgda mechanization The history of this correctness proof is described in Appendix C9 PhilippSchuster contributed to the evaluation devising language plugins for bags and maps in collaborationwith me

112 Excluded papersThis thesis improves modularity of collection queries by automating dierent sorts of optimizationson them

During my PhD work I collaborated on several other papers mostly on dierent facets ofmodularity which do not appear in this thesis

Modularity Module boundaries hide information on implementation that is not relevant to clientsHowever it often turns out that some scenarios clients or task require such hidden information Asdiscussed manual optimization requires hidden information hence researchers strive to automateoptimizations But other tasks often violate these boundaries I collaborated on research onunderstanding why this happens and how to deal with this problem [Ostermann Giarrusso Kaumlstnerand Rendel 2011]

Software Product Lines In some scenarios a dierent form of modular software is realizedthrough software product lines (SPLs) where many variants of a single software artifact (with

dierent sets of features) can be generated This is often done using conditional compilation forinstance through the C preprocessor

But if a software product line uses conditional compilation processing automatically its sourcecode is dicult mdash in fact even lexing or parsing it is a challenge Since the number of variants isexponential in the number of features generating and processing each variant does not scale

To address these challenges I collaborated with the TypeChef project to develop a variability-aware lexer [Kaumlstner Giarrusso and Ostermann 2011a] and parser [Kaumlstner Giarrusso RendelErdweg Ostermann and Berger 2011b]

Another challenge is predicting non-functional properties (such as code size or performance) ofa variant of a software product line before generation and direct measurement Such predictionscan be useful to select eciently a variant that satises some non-functional requirements Icollaborated to research on predicting such non-functional properties we measure the propertieson a few variants and extrapolate the results to other variants In particular I collaborated tothe experimental evaluation on a few open source projects where even a simple model reachedsignicant accuracy in a few scenarios [Siegmund Rosenmuumlller Kaumlstner Giarrusso Apel andKolesnikov 2011 2013]

Domain-specic languages This thesis is concerned with DSLs both ones for the domain ofcollection queries but also (in Part II) more general ones

Dierent forms of language composition both among DSLs and across general-purpose lan-guages and DSLs are possible but their relationship is nontrivial I collaborated to work classifyingthe dierent forms of language compositions [Erdweg Giarrusso and Rendel 2012]

The implementation of SOpt as described in Part I requires an extensible representationof ASTs similar to the one used by LMS [Rompf and Odersky 2010] While this representation ispretty exible it relies on Scalarsquos support of GADTs [Emir et al 2006 2007b] which is known to besomewhat fragile due to implementation bugs In fact sound pattern matching on Scalarsquos extensibleGADTs is impossible without imposing signicant restrictions to extensibility due to languageextensions not considered during formal study [Emir et al 2006 2007a] the problem arises due tothe interaction between GADTs declaration-site variance annotations and variant renement oftype arguments at inheritance time To illustrate the problem Irsquove shown it already applies to anextensible denitional interpreter for λlt [Giarrusso 2013] While solutions have been investigatedin other settings [Scherer and Reacutemy 2013] the problem remains open for Scala to this day

113 Navigating this thesisThe two parts of this thesis while related by the common topic of collection queries can be readmostly independently from each other Summaries of the two parts are given respectively in Sec 21and Sec 1071

Numbering We use numbering and hyperlinks to help navigation we explain our conventionsto enable exploiting them

To simplify navigation we number all sorts of ldquotheoremsrdquo (including here denitions remarkseven examples or descriptions of notation) per-chapter with counters including section numbersFor instance Denition 1221 is the rst such item in Chapter 12 followed by Theorem 1222 andthey both appear in Sec 122 And we can be sure that Lemma 1228 comes after both ldquotheoremsrdquobecause of its number even if they are of dierent sorts

Similarly we number tables and gures per-chapter with a shared counter For instance Fig 61is the rst gure or table in Chapter 6 followed by Table 62

To help reading at times we will repeat or anticipate statements of ldquotheoremsrdquo to refer to themwithout forcing readers to jump back and forth We will then reuse the original number For instanceSec 122 contains a copy of Slogan 1033 with its original number

For ease of navigation all such references are hyperlinked in electronic versions of this thesisand the table of contents is exposed to PDF readers via PDF bookmarks

Part I

Optimizing Collection Queries byReication

Chapter 2

Introduction

In-memory collections of data often need ecient processing For on-disk data ecient processingis already provided by database management systems (DBMS) thanks to their query optimizerswhich support many optimizations specic to the domain of collections Moving in-memory datato DBMSs however typically does not improve performance [Stonebraker et al 2007] and queryoptimizers cannot be reused separately since DBMS are typically monolithic and their optimizersdeeply integrated A few collection-specic optimizations such as shortcut fusion [Gill et al1993] are supported by compilers for purely functional languages such as Haskell However theimplementation techniques for those optimizations do not generalize to many other ones suchas support for indexes In general collection-specic optimizations are not supported by thegeneral-purpose optimizers used by typical (JIT) compilers

Therefore programmers when needing collection-related optimizations perform them manuallyTo allow that they are often forced to perform manual inlining [Peyton Jones and Marlow 2002]But manual inlining modies source code by combining distinct functions together while oftendistinct functions should remain distinct because they deal with dierent concerns or because onefunction need to be reused in a dierent context In either case manual inlining reduces modularitymdash dened here as the ability to abstract behavior in a separate function (possibly part of a dierentmodule) to enable reuse and improve understandability

For these reasons currently developers need to choose between modularity and performanceas also highlighted by Kiczales et al [1997] on a similar example Instead we envision that theyshould rely on an automatic optimizer performing inlining and collection-specic optimizationsThey would then achieve both performance and modularity1

One way to implement such an optimizer would be to extend the compiler of the languagewith a collection-specic optimizer or to add some kind of external preprocessor to the languageHowever such solutions would be rather brittle (for instance they lack composability with otherlanguage extensions) and they would preclude optimization opportunities that arise only at runtime

For this reason our approach is implemented as an embedded domain-specic language (EDSL)that is as a regular library We call this library SOpt the Scala QUery OPTimizer SOptconsists of a Scala EDSL for queries on collections based on the Scala collections API An expressionin this EDSL produces at run time an expression tree in the host language a data structure whichrepresents the query to execute similar to an abstract syntax tree (AST) or a query plan Thanks to

1In the terminology of Kiczales et al [1997] our goal is to be able to decompose dierent generalized procedures of aprogram according to its primary decomposition while separating the handling of some performance concerns To this endwe are modularizing these performance concerns into a metaprogramming-based optimization module which we believecould be called in that terminology aspect

the extensibility of Scala expressions in this language look almost identical to expressions with thesame meaning in Scala When executing the query SOpt optimizes and compiles these expressiontrees for more ecient execution Doing optimization at run time instead of compile-time avoidsthe need for control-ow analyses to determine which code will be actually executed [Chamberset al 2010] as we will see later

We have chosen Scala [Odersky et al 2011] to implement our library for two reasons (i) Scala isa good meta-language for EDSLs because it is syntactically exible and has a powerful type systemand (ii) Scala has a sophisticated collections library with an attractive syntax (for-comprehensions)to specify queries

To evaluate SOpt we study queries of the FindBugs tool [Hovemeyer and Pugh 2004] Werewrote a set of queries to use the Scala collections API and show that modularization incurssignicant performance overhead Subsequently we consider versions of the same queries usingSOpt We demonstrate that the automatic optimization can reconcile modularity and performancein many cases Adding advanced optimizations such as indexing can even improve the performanceof the analyses beyond the original non-modular analyses

21 Contributions and summaryOverall our main contributions in Part I are the following

bull We illustrate the tradeo between modularity and performance when manipulating collectionscaused by the lack of domain-specic optimizations (Sec 22) Conversely we illustrate howdomain-specic optimizations lead to more readable and more modular code (Chapter 3)

bull We present the design and implementation of SOpt an embedded DSL for queries oncollections in Scala (Chapter 4)

bull We evaluate SOpt to show that it supports writing queries that are at the same timemodular and fast We do so by re-implementing several code analyses of the FindBugstool The resulting code is more modular andor more ecient in some cases by orders ofmagnitude In these case studies we measured average speedups of 12x with a maximum of12800x (Chapter 6)

22 MotivationIn this section we show how the absense of collection-specic optimizations forces programmersto trade modularity against performance which motivates our design of SOpt to resolve thisconict

As our running example through the chapter we consider representing and querying a simplein-memory bibliography A book has in our schema a title a publisher and a list of authors Eachauthor in turn has a rst and last name We represent authors and books as instances of the Scalaclasses Author and Book shown in Fig 21 The class declarations list the type of each eld Titlespublishers and rst and last names are all stored in elds of type String The list of authors isstored in a eld of type Seq[Author] that is a sequence of authors ndash something that would bemore complex to model in a relational database The code fragment also denes a collection ofbooks named books

As a common idiom to query such collections Scala provides for-comprehensions For instancethe for-comprehension computing records in Fig 22 nds all books published by Pearson Educationand yields for each of those books and for each of its authors a record containing the book title

package schemacase class Author(firstName String lastName String)case class Book(title String publisher String

authors Seq[Author ])

val books Set[Book] = Set(new Book(Compilers Principles Techniques and Tools

Pearson EducationSeq(new Author(Alfred V Aho)

new Author(Monica S Lam)new Author(Ravi Sethi)new Author(Jeffrey D Ullman))

other books )

Figure 21 Denition of the schema and of some content

the full name of that author and the number of additional coauthors The generator book larr booksfunctions like a loop header The remainder of the for-comprehension is executed once per bookin the collection Consequently the generator author larr bookauthors starts a nested loop Thereturn value of the for-comprehension is a collection of all yielded records Note that if a book hasmultiple authors this for-comprehensions will return multiple records relative to this book one foreach author

We can further process this collection with another for-comprehension possibly in a dierentmodule For example still in Fig 22 the function titleFilter lters book titles containing theword Principles and drops from each record the number of additional coauthors

In Scala the implementation of for-comprehensions is not xed Instead the compiler desugarsa for-comprehension to a series of API calls and dierent collection classes can implement thisAPI dierently Later we will use this exibility to provide an optimizing implementation of for-comprehensions but in this section we focus on the behavior of the standard Scala collectionswhich implement for-comprehensions as loops that create intermediate collections

221 Optimizing by HandIn the naive implementation in Fig 22 dierent concerns are separated hence it is modular Howeverit is also inecient To execute this code we rst build the original collection and only later weperform further processing to build the new result creating the intermediate collection at theinterface between these functions is costly Moreover the same book can appear in records morethan once if the book has more than one author but all of these duplicates have the same titleNevertheless we test each duplicate title separately whether it contains the searched keyword Ifbooks have 4 authors on average this means a slowdown of a factor of 4 for the ltering step

In general one can only resolve these ineciencies by manually optimizing the query howeverwe will observe that these manual optimizations produce less modular code2

To address the rst problem above that is to avoid creating intermediate collections we canmanually inline titleFilter and records we obtain two nested for-comprehensions Furthermorewe can unnest the inner one [Fegaras and Maier 2000]

2The existing Scala collections API supports optimization for instance through non-strict variants of the query operators(called lsquoviewsrsquo in Scala) but they can only be used for a limited set of optimizations as we discuss in Chapter 8

case class BookData(title String authorName String coauthors Int)

val records =for

book larr booksif bookpublisher == Pearson Educationauthor larr bookauthors

yield new BookData(booktitle authorfirstName + +authorlastName bookauthorssize - 1)

def titleFilter(records Set[BookData]keyword String) =

for record larr recordsif recordtitlecontains(keyword)

yield (recordtitle recordauthorName)

val res = titleFilter(records Principles)

Figure 22 Our example query on the schema in Fig 21 and a function which postprocesses itsresult

To address the second problem above that is to avoid testing the same title multiple times wehoist the ltering step that is we change the order of the processing steps in the query to rstlook for keyword within booktitle and then iterate over the set of authors This does not changethe overall semantics of the query because the lter only accesses the title but does not dependon the author In the end we obtain the code in Fig 23 The resulting query processes the title ofeach book only once Since ltering in Scala is done lazily the resulting query avoids building anintermediate collection

This second optimization is only possible after inlining and thereby reducing the modularity ofthe code because it mixes together processing steps from titleFilter and from the denition ofrecords Therefore reusing the code creating records would now be harder

To make titleFilterHandOpt more reusable we could turn the publisher name into a parame-ter However the new versions of titleFilter cannot be reused as-is if some details of the inlinedcode change for instance we might need to lter publishers dierently or not at all On the otherhand if we express queries modularly we might lose some opportunities for optimization Thedesign of the collections API both in Scala and in typical languages forces us to manually optimizeour code by repeated inlining and subsequent application of query optimization rules which leadsto a loss of modularity

def titleFilterHandOpt(books Set[Book]publisher String keyword String) =

for book larr booksif bookpublisher == publisher ampamp booktitlecontains(keyword)author larr bookauthors

yield (booktitle authorfirstName + + authorlastName)val res = titleFilterHandOpt(books

Pearson Education Principles)

Figure 23 Composition of queries in Fig 22 after inlining query unnesting and hoisting

Chapter 3

Automatic optimization withSQuOpt

The goal of SOpt is to let programmers write queries modularly and at a high level of abstractionand deal with optimization by a dedicated domain-specic optimizer In our concrete exampleprogrammers should be able to write queries similar to the one in Fig 22 but get the eciency ofthe one in Fig 23 To allow this SOpt overloads for-comprehensions and other constructs suchas string concatenation with + and eld access bookauthor Our overloads of these constructsreify the query as an expression tree SOpt can then optimize this expression tree and executethe resulting optimized query Programmers explicitly trigger processing by SOpt by adaptingtheir queries as we describe in next subsection

31 Adapting a QueryTo use SOpt instead of native Scala queries we rst assume that the query does not use side eectsand is thus purely functional We argue that purely functional queries are more declarative Sideeects are used to improve performance but SOpt makes that unnecessary through automaticoptimizations In fact the lack of side eects enables more optimizations

In Fig 31 we show a version of our running example adapted to use SOpt We rst discusschanges to records To enable SOpt a programmer needs to (a) import the SOpt library(b) import some wrapper code specic to the types the collection operates on in this case Bookand Author (more about that later) (c) convert explicitly the native Scala collections involved tocollections of our framework by a call to asSquopt (d) rename a few operators such as == to ==(this is necessary due to some Scala limitations) and (e) add a separate step where the query isevaluated (possibly after optimization) All these changes are lightweight and mostly of a syntacticnature

For parameterized queries like titleFilter we need to also adapt type annotations Theones in titleFilterQuery reveal some details of our implementation Expressions that are reiedhave type Exp[T] instead of T As the code shows resQuery is optimized before compilationThis call will perform the optimizations that we previously did by hand and will return a queryequivalent to that in Fig 23 after verifying their safety conditions For instance after inliningthe lter if booktitlecontains(keyword) does not reference author hence it is safe to hoistNote that checking this safety condition would not be possible without reifying the predicate Forinstance it would not be sucient to only reify the calls to the collection API because the predicate

16 Chapter 3 Automatic optimization with SQuOpt

import squopt_import schemasquopt_

val recordsQuery =for

book larr booksasSquoptif bookpublisher == Pearson Educationauthor larr bookauthors

yield new BookData(booktitle authorfirstName + + authorlastName bookauthorssize - 1)

val records = recordsQueryeval

def titleFilterQuery(records Exp[Set[BookData]] keyword Exp[String ]) = for record larr recordsif recordtitlecontains(keyword)

val resQuery = titleFilterQuery(recordsQuery Principles)val res = resQueryoptimizeeval

Figure 31 SOpt version of Fig 22 recordQuery contains a reication of the query recordsits result

Chapter 3 Automatic optimization with SQuOpt 17

is represented as a boolean function parameter In general our automatic optimizer inspects thewhole reication of the query implementation to check that optimizations do not introduce changesin the overall result of the query and are therefore safe

32 IndexingSOpt also supports the transparent usage of indexes Indexes can further improve the eciencyof queries sometimes by orders of magnitude In our running example the query scans all books tolook for the ones having the right publisher To speed up this query we can preprocess books tobuild an index that is a dictionary mapping from each publisher to a collection of all the books itpublished This index can then be used to answer the original query without scanning all books

We construct a query representing the desired dictionary and inform the optimizer that it shoulduse this index where appropriate

val idxByPublisher = booksasSquoptindexBy(_publisher)OptimizationaddIndex(idxByPublisher)

The indexBy collection method accepts a function that maps a collection element to a keycollindexBy(key) returns a dictionary mapping each key to the collection of all elements ofcoll having that key Missing keys are mapped to an empty collection1 OptimizationaddIndexsimply preevaluates the index and updates a dictionary mapping the index to its preevaluated result

A call to optimize on a query will then take this index into account and rewrite the query toperform index lookup instead of scanning if possible For instance the code in Fig 31 would betransparently rewritten by the optimizer to a query similar to the following

val indexedQuery =for

book larr idxByPublisher(Pearson Education)author larr bookauthors

Since dictionaries in Scala are functions in the above code dictionary lookup on idxByPublisheris represented simply as function application The above code iterates over books having the desiredpublisher instead of scanning the whole library and performs the remaining computation from theoriginal query Although the index use in the listing above is notated as idxByPublisher(Pearson Education)only the cached result of evaluating the index is used when the query is executed not the reiedindex denition

This optimization could also be performed manually of course but the queries are on a higherabstraction level and more maintainable if indexing is dened separately and applied automaticallyManual application of indexing is a crosscutting concern because adding or removing an indexaects potentially many queries SOpt does not free the developer from the task of assessingwhich index will lsquopay orsquo (we have not considered automatic index creation yet) but at least itbecomes simple to add or remove an index since the application of the indexes is modularized inthe optimizer

1For readers familiar with the Scala collection API we remark that the only dierence with the standard groupBy methodis the handling of missing keys

Chapter 4

Implementing SQuOpt

After describing how to use SOpt we explain how SOpt represents queries internally andoptimizes them Here we give only a brief overview of our implementation technique it is describedin more detail in Sec 51

41 Expression TreesIn order to analyze and optimize collection queries at runtime SOpt reies their syntacticstructure as expression trees The expression tree reects the syntax of the query after desugaringthat is after for-comprehensions have been replaced by API calls For instance recordsQuery fromFig 31 points to the following expression tree (with some boilerplate omitted for clarity)

new FlatMap(new Filter(

new Const(books)v2 rArr new Eq(new Book_publisher(v2)

new Const(Pearson Education)))v3 rArr new MapNode(

new Book_authors(v3)v4 rArr new BookData(

new Book_title(v3)new StringConcat(

new StringConcat(new Author_firstName(v4)new Const( ))

new Author_lastName(v4))new Plus(new Size(new Book_authors(v3))

new Negate(new Const (1))))))

The structure of the for-comprehension is encoded with the FlatMap Filter and MapNodeinstances These classes correspond to the API methods that for-comprehensions get desugaredto SOpt arranges for the implementation of flatMap to construct a FlatMap instance etcThe instances of the other classes encode the rest of the structure of the collection query that iswhich methods are called on which arguments On the one hand SOpt denes classes suchas Const or Eq that are generic and applicable to all queries On the other hand classes such asBook_publisher cannot be predened because they are specic to the user-dened types used

20 Chapter 4 Implementing SQuOpt

in a query SOpt provides a small code generator which creates a case class for each methodand eld of a user-dened type Functions in the query are represented by functions that createexpression trees representing functions in this way is frequently called higher-order abstract syntax[Pfenning and Elliot 1988]

We can see that the reication of this code corresponds closely to an abstract syntax tree forthe code which is executed however many calls to specic methods like map are represented byspecial nodes like MapNode rather than as method calls For the optimizer it becomes easier tomatch and transform those nodes than with a generic abstract syntax tree

Nodes for collection operations are carefully dened by hand to provide them highly generictype signatures and make them reusable for all collection types In Scala collection operations arehighly polymorphic for instance map has a single implementation working on all collection typeslike List Set and we similarly want to represent all usages of map through instances of a singlenode type namely MapNode Having separate nodes ListMapNode SetMapNode and so on would beinconvenient for instance when writing the optimizer However map on a List[Int] will produceanother List while on a Set it will produce another Set and so on for each specic collectiontype (in rst approximation) moreover this is guaranteed statically by the type of map Yet thanksto advanced typesystem features map is dened only once avoiding redundancy but has a typepolymorphic enough to guarantee statically that the correct return value is produced Since ourtree representation is strongly typed we need to have a similar level of polymorphism in MapNodeWe achieved this by extending the techniques described by Odersky and Moors [2009] as detailedin our technical report [Giarrusso et al 2012]

We get these expression trees by using Scala implicit conversions in a particular style whichwe adopted from Rompf and Odersky [2010] Implicit conversions allow to add for each methodAfoo(B) an overload of Exp[A]foo(Exp[B]) Where a value of type Exp[T] is expected a valueof type T can be used thanks to other implicit conversions which wrap it in a Const node The initialcall of asSquopt triggers the application of the implicit conversions by converting the collection tothe leaf of an expression tree

It is also possible to call methods that do not return expression trees however such methodcalls would then only be represented by an opaque MethodCall node in the expression tree whichmeans that the code of the method cannot be considered in optimizations

Crucially these expression trees are generated at runtime For instance the rst Const containsa reference to the actual collection of books to which books refers If a query uses another querysuch as records in Fig 31 then the subquery is eectively inlined The same holds for method callsinside queries If these methods return an expression tree (such as the titleFilterQuery methodin Fig 31) then these expression trees are inlined into the composite query Since the reicationhappens at runtime it is not necessary to predict the targets of dynamically bound method calls Anew (and possibly dierent) expression tree is created each time a block of code containing queriesis executed

Hence we can say that expression trees represent the computation which is going to be executedafter inlining control ow or virtual calls in the original code typically disappear mdash especially if theymanipulate the query as a whole This is typical of deeply embedded DSLs like ours where codeinstead of performing computations produces a representation of the computation to perform [Elliottet al 2003 Chambers et al 2010]

This inlining can duplicate computations for instance in this code

val num Exp[Int] = 10val square = num numval sum = square + square

Chapter 4 Implementing SQuOpt 21

evaluating sum will evaluate square twice Elliott et al [2003] and we avoid this using common-subexpression elimination

42 OptimizationsOur optimizer currently supports several algebraic optimizations Any query and in fact everyreied expression can be optimized by calling the optimize function on it The ability to optimizereied expressions that are not queries is useful for instance optimizing a function that produces aquery is similar to a ldquoprepared statementrdquo in relational databases

The optimizations we implemented are mostly standard in compilers [Muchnick 1997] ordatabases

bull Query unnesting merges a nested query into the containing one [Fegaras and Maier 2000Grust and Scholl 1999] replacing for instance

for val1 larr (for val2 larr coll yield f(val2 ))yield g(val1)

for val2 larr coll val1 = f(val2) yield g(val1)

bull Bulk operation fusion fuses higher-order operators on collections

bull Filter hoisting tries to apply lters as early as possible in database query optimization it isknown as selection pushdown For lter hoisting it is important that the full query is reiedbecause otherwise the dependencies of the lter condition cannot be determined

bull We reduce during optimization tuplecase class accesses For instance (a b)_1 is simpliedto a This is important because the produced expression does not depend on b removing thisfalse dependency can allow for instance a lter containing this expression to be hoisted to acontext where b is not bound

bull Indexing tries to apply one or more of the available indexes to speed up the query

bull Common subexpression elimination (CSE) avoids that the same computation is performedmultiple times we use techniques similar to Rompf and Odersky [2010]

bull Smaller optimizations include constant folding reassociation of associative operators andremoval of identity maps (collmap(x rArr x) typically generated by the translation offor-comprehensions)

Each optimization is applied recursively bottom-up until it does not trigger anymore dierentoptimizations are composed in a xed pipeline

Optimizations are only guaranteed to be semantics-preserving if queries obey the restrictionswe mentioned for instance queries should not involve side-eects such as assignments or IO andall collections used in queries should implement the specications stated in the collections APIObviously the choice of optimizations involves many tradeos for that reason we believe that it isall the more important that the optimizer is not hard-wired into the compiler but implemented as alibrary with potentially many dierent implementations

To make changes to the optimizer more practical we designed our query representationso that optimizations are easy to express restricting to pure queries also helps For instance

lter fusion can be implemented in few lines of code just match against expressions of formcollfilter(pred2)filter(pred1) and rewrite them1

val mergeFilters = ExpTransformer case Sym(Filter(Sym(Filter(collection pred2)) pred1)) rArr

collfilter(x rArr pred2(x) ampamp pred1(x))

A more complex optimization such as lter hoisting requires only 20 lines of codeWe have implemented a prototype of the optimizer with the mentioned optimizations Many

additional algebraic optimizations can be added in future work by us or others a candidate would beloop hoisting which moves out of loops arbitrary computations not depending on the loop variable(and not just lters) With some changes to the optimizerrsquos architecture it would also be possible toperform cost-based and dynamic optimizations

43 Query ExecutionCalling the eval method on a query will convert it to executable bytecode this bytecode will beloaded and invoked by using Java reection We produce a thunk that when evaluated will executethe generated code

In our prototype we produce bytecode by converting expression trees to Scala code and invokingon the result the Scala compiler scalac Invoking scalac is typically quite slow and we currentlyuse caching to limit this concern however we believe it is merely an engineering problem toproduce bytecode directly from expression trees just as compilers do

Our expression trees contain native Scala values wrapped in Const nodes and in many casesone cannot produce Scala program text evaluating to the same value To allow executing suchexpression trees we need to implement cross-stage persistence (CSP) the generated code will be afunction accepting the actual values as arguments [Rompf and Odersky 2010] This allows sharingthe compiled code for expressions which dier only in the embedded values

More in detail our compilation algorithm is as follows (a) We implement CSP by replacing em-bedded Scala values by references to the function arguments so for instance List(1 2 3)map(x rArr x + 1)becomes the function (s1 List[Int] s2 Int) rArr s1map(x rArr x + s2) (b) We look upthe produced expression tree together with the types of the constants we just removed in a cachemapping to the generated classes If the lookup fails we update the cache with the result of the nextsteps (c) We apply CSE on the expression (d) We convert the tree to code compile it and load thegenerated code

Preventing errors in generated code Compiler errors in generated code are typically a con-cern with SOpt however they can only arise due to implementation bugs in SOpt (forinstance in pretty-printing which cannot be checked statically) so they do not concern users Sinceour query language and tree representation are statically typed type-incorrect queries will berejected statically For instance consider again idxByPublisher described previously

val idxByPublisher = booksasSquoptindexBy(_publisher)

Since Bookpublisher returns a String idxByPublisher has type Exp[Map[String Book]]Looking up a key of the wrong type for instance by writing idxByPublisher(book) wherebook Book will make scalac emit a static type error

1Sym nodes are part of the boilerplate we omitted earlier

Chapter 5

A deep EDSL for collection queries

In this chapter we discuss collections as a critical case study [Flyvbjerg 2006] for deep DSL embed-ding

As discussed in the previous chapter to support optimizations we require a deep embedding ofthe collections DSL

While the basic idea of deep embedding is well known it is not obvious how to realize deepembedding when considering the following additional goals

bull To support users adopting SOpt a generic SOpt query should share the ldquolook and feelrdquoof the ordinary collections API In particular query syntax should remain mostly unchangedIn our case we want to preserve Scalarsquos for-comprehension1 syntax and its notation foranonymous functions

bull Again to support users adopting SOpt a generic SOpt query should not only sharethe syntax of the ordinary collections API it should also be well-typed if and only if thecorresponding ordinary query is well-typed This is particularly challenging in the Scalacollections library due to its deep integration with advanced type-system features such ashigher-kinded generics and implicit objects [Odersky and Moors 2009] For instance callingmap on a List will return a List and calling map on a Set will return a Set Hence the object-language representation and the transformations thereof should be as ldquotypedrdquo as possibleThis precludes among others a rst-order representation of object-language variables asstrings

bull SOpt should be interoperable with ordinary Scala code and Scala collections For instanceit should be possible to call normal non-reied functions within a SOpt query or mixnative Scala queries and SOpt queries

bull The performance of SOpt queries should be reasonable even without optimizations Anon-optimized SOpt query should not be dramatically slower than a native Scala queryFurthermore it should be possible to create new queries at run time and execute them withoutexcessive overhead This goal limits the options of applicable interpretation or compilationtechniques

We think that these challenges characterize deep embedding of queries on collections as acritical case study [Flyvbjerg 2006] for DSL embedding That is it is so challenging that embedding

1Also known as for expressions [Odersky et al 2011 Ch 23]

24 Chapter 5 A deep EDSL for collection queries

techniques successfully used in this case are likely to be successful on a broad range of other DSLsIn this chapter we report from the case study the successes and failures of achieving these goals inSOpt

51 Implementation Expressing the interface in ScalaTo optimize a query as described in the previous section SOpt needs to reify optimize andexecute queries Our implementation assigns responsibility for these steps to three main componentsA generic library for reication and execution of general Scala expressions a more specializedlibrary for reication and execution of query operators and a dedicated query optimizer Queriesneed then to be executed through either compilation (already discussed in Sec 43) or interpretation(to discuss in Sec 56) We describe the implementation in more detail in the rest of this sectionThe full implementation is also available online2

A core idea of SOpt is to reify Scala code as a data structure in memory A programmer coulddirectly create instances of that data structure but we also provide a more convenient interfacebased on advanced Scala features such as implicit conversions and type inference That interfaceallows to automatically reify code with a minimum of programmer annotations as shown in theexamples in Chapter 3 Since this is a case study on Scalarsquos support for deep embedding of DSLswe also describe in this section how Scala supports our task In particular we report on techniqueswe used and issues we faced

52 Representing expression treesIn the previous section we have seen that expressions that would have type T in a native Scalaquery are reied and have type Exp[T] in SOpt The generic type Exp[T] is the base for ourreication of Scala expression as expression trees that is as data structures in memory We providea subclass of Exp[T] for every dierent form of expression we want to reify For example in Fig 22the expression authorfirstName + + authorlastName must be reied even though it isnot collection-related for otherwise the optimizer could not see whether author is used Knowingthis is needed for instance to remove variables which are bound but not used Hence this expressionis reied as

StringConcat(StringConcat(AuthorFirstName(author) Const( )) AuthorLastName(author ))

This example uses the constructors of the following subclasses of Exp[T] to create the expressiontree

case class Const[T](t T) extends Exp[T]case class StringConcat(str1 Exp[String] str2 Exp[String ]) extends Exp[String]case class AuthorFirstName(t Exp[Author ]) extends Exp[String]case class AuthorLastName(t Exp[Author ]) extends Exp[String]

Expression nodes additionally implement support code for tree traversals to support optimiza-tions which we omit here

This representation of expression trees is well-suited for a representation of the structure ofexpressions in memory and also for pattern matching (which is automatically supported for caseclasses in Scala) but inconvenient for query writers In fact in Fig 31 we have seen that SOptprovides a much more convenient front-end The programmer writes almost the usual code for typeT and SOpt converts it automatically to Exp[T]

2hpwwwinformatikuni-marburgde~pgiarrussoSOpt

Chapter 5 A deep EDSL for collection queries 25

53 Lifting rst-order expressionsWe call the process of converting from T to Exp[T] lifting Here we describe how we lift rst-orderexpressions ndash Scala expressions that do not contain anonymous function denitions

To this end consider again the fragment

authorfirstName + + authorlastName

now in the context of the SOpt-enabled query in Fig 31 It looks like a normal Scala expressioneven syntactically unchanged from Fig 22 However evaluating that code in the context of Fig 31does not concatenate any strings but creates an expression tree instead Although the code lookslike the same expression it has a dierent type Exp[String] instead of String This dierence inthe type is caused by the context The variable author is now bound in a SOpt-enabled queryand therefore has type Exp[Author] instead of Author We can still access the firstName eld ofauthor because expression trees of type Exp[T] provide the same interface as values of type Texcept that all operations return expressions trees instead of values

To understand how an expression tree of type Exp[T] can have the same interface as a value oftype T we consider two expression trees str1 and str2 of type Exp[String] The implementationof lifting diers depending on the kind of expression we want to lift

Method calls and operators In our example the operator + should be available on Exp[String]but not on Exp[Boolean] because + is available on String but not on Boolean Furthermore wewant str1 + str2 to have type Exp[String] and to evaluate not to a string concatenation butto a call of StringConcat that is to an expression tree which represents str1 + str2 This is asomewhat unusual requirement because usually the interface of a generic type does not depend onthe type parameters

To provide such operators and to encode expression trees we use implicit conversions in asimilar style as Rompf and Odersky [2010] Scala allows to make expressions of a type T implicitlyconvertible to a dierent type U To this end one must declare an implicit conversion functionhaving type T rArr U Calls to such functions will be inserted by the compiler when required to xa type mismatch between an expression of type T and a context expecting a value of type U Inaddition a method call em(args) can trigger the conversion of e to a type where the method m ispresent3 Similarly an operator usage as in str1 + str2 can also trigger an implicit conversionan expression using an operator like str1 + str2 is desugared to the method call str1+(str2)which can trigger an implicit conversion from str1 to a type providing the + method Thereforefrom now on we do not distinguish between operators and methods

To provide the method + on Exp[String] we dene an implicit conversion from Exp[String]to a new type providing a + method which creates the appropriate expression node

implicit def expToStringOps(t Exp[String ]) = new StringOps(t)class StringOps(t Exp[String ])

def +(that Exp[String ]) Exp[String] = StringConcat(t that)

This is an example of the well-known Scala enrich-my-library pattern4 [Odersky 2006]With these declarations in scope the Scala compiler rewrites str1 + str2 to expToStringOps(str1)+(str2)

which evaluates to StringConcat(str1 str2) as desired The implicit conversion functionexpToStringOps is not applicable to Exp[Boolean] because it explicitly species the receiver ofthe +-call to have type Exp[String] In other words expressions like str1 + str2 are now lifted

3For the exact rules see Odersky et al [2011 Ch 21] and Odersky [2011]4Also known as pimp-my-library pattern

on the level of expression trees in a type-safe way For brevity we refer to the dened operator asExp[String]+

Literal valuesHowever a string concatenation might also include constants as in str1 + fooor bar + str1 To lift str1 + foo we introduce a lifting for constants which wraps them ina Const node

implicit def pure[T](t T) Exp[T] = Const(t)

The compiler will now rewrite str1 + foo to expToStringOps(str1) + pure(foo) andstr1 + foo + str2 to expToStringOps(str1) + pure(foo) + str2 Dierent implicitconversions cooperate successfully to lift the expression

Analogously it would be convenient if the similar expression bar + str1 would be rewrittento expToStringOps(pure(bar)) + str1 but this is not the case because implicit coercions arenot chained automatically in Scala Instead we have to manually chain existing implicit conversionsinto a new one

implicit def toStringOps(t String) = expToStringOps(pure(t))

so that bar + str1 is rewritten to toStringOps(bar) + str1User-dened methods Calls of user-dened methods like authorfirstName are lifted the

same way as calls to built-in methods such as string concatenation shown earlier For the runningexample the following denitions are necessary to lift the methods from Author to Exp[Author]

package schemasquopt

implicit def expToAuthorOps(t Exp[Author ]) = new AuthorOps(t)implicit def toAuthorOps(t Author) = expToAuthorOps(pure(t))

class AuthorOps(t Exp[Author ]) def firstName Exp[String] = AuthorFirstName(t)def lastName Exp[String] = AuthorLastName(t)

Implicit conversions for user-dened types cooperate with other ones for instance authorfirstName + + authorlastNameis rewritten to

(expToStringOps(expToAuthorOps(author ) firstName) + pure( )) +expToAuthorOps(author ) lastName

Author is not part of SOpt or the standard Scala library but an application-specic classhence SOpt cannot pre-dene the necessary lifting code Instead the application programmerneeds to provide this code to connect SOpt to his application To support the applicationprogrammer with this tedious task we provide a code generator which discovers the interfaceof a class through reection on its compiled version and generates the boilerplate code such asthe one above for Author It also generates the application-specic expression tree types such asAuthorFirstName as shown in Sec 52 In general query writers need to generate and import theboilerplate lifting code for all application-specic types they want to use in a SOpt query

If desired we can exclude some methods to restrict the language supported in our deeplyembedded programs For instance SOpt requires the user to write side-eect-free queries hencewe do not lift methods which perform side eects

Using similar techniques we can also lift existing functions and implicit conversionsTuples and other generic constructors The techniques presented above for the lifting of

method calls rely on overloading the name of the method with a signature that involves Exp Implicit

resolution (for method calls) will then choose our lifted version of the function or method to satisfythe typing requirements of the context or arguments of the call Unfortunately this technique doesnot work for tuple constructors which in Scala are not resolved like ordinary calls Instead supportfor tuple types is hard-wired into the language and tuples are always created by the predenedtuple constructors

For example the expression (str1 str2) will always call Scalarsquos built-in Tuple2 constructorand correspondingly have type (Exp[String] Exp[String]) We would prefer that it calls alifting function and produces an expression tree of type Exp[(String String)] instead

Even though we cannot intercept the call to Tuple2 we can add an implicit conversion to becalled after the tuple is constructed

implicit def tuple2ToTuple2Exp[A1 A2](tuple (Exp[A1] Exp[A2])) LiftTuple2[A1 A2] =LiftTuple2[A1 A2](tuple_1 tuple_2)

case class LiftTuple2[A1 A2](t1 Exp[A1] t2 Exp[A2]) extends Exp[(A1 A2)]

We generate such conversions for dierent arities with a code generator These conversionswill be used only when the context requires an expression tree Note that this technique is onlyapplicable because tuples are generic and support arbitrary components including expression trees

In fact we use the same technique also for other generic constructors to avoid problemsassociated with shadowing of constructor functions For example an implicit conversion is usedto lift Seq[Exp[T]] to Exp[Seq[T]] code like Seq(str1 str2) rst constructs a sequence ofexpression trees and then wraps the result with an expression node that describes a sequence

Subtyping So far we have seen that for each rst-order method m operating on instances of Twe can create a corresponding method which operates on Exp[T] If the method accepts parametershaving types A1 An and has return type R the corresponding lifted method will acceptparameters having types Exp[A1] Exp[An] and return type Exp[R] However Exp[T]also needs to support all methods that T inherits from its super-type S To ensure this we declarethe type constructor Exp to be covariant in its type parameter so that Exp[T] correctly inheritsthe liftings from Exp[S] This works even with the enrich-my-library pattern because implicitresolution respects subtyping in an appropriate way

Limitations of Lifting Lifting methods of Any or AnyRef (Scala types at the root of the inheri-tance hierarchy) is not possible with this technique Exp[T] inherits such methods and makes themdirectly available hence the compiler will not insert an implicit conversion Therefore it is notpossible to lift expressions such as x == y rather we have to rely on developer discipline to use== and = instead of == and =

An expression like foo + bar + str1 is converted to

toStringOps(foo + bar) + str1

Hence part of the expression is evaluated before being reied This is harmless here since we wantfoo + bar to be evaluated at compile-time that is constant-folded but in other cases it ispreferable to prevent the constant folding We will see later examples where queries on collectionsare evaluated before reication defeating the purpose of our framework and how we work aroundthose

54 Lifting higher-order expressionsWe have shown how to lift rst-order expressions however the interface of collections also useshigher-order methods that is methods that accept functions as parameters and we need to lift themas well to reify the complete collection EDSL For instance the map method applies a function to

each element of a collection In this section we describe how we reify such expressions of functiontype

Higher-order abstract syntax To represent functions we have to represent the bound vari-ables in the function bodies For example a reication of str rArr str + needs to reify thevariable str of type String in the body of the anonymous function This reication shouldretain the information where str is bound We achieve this by representing bound variables us-ing higher-order abstract syntax (HOAS) [Pfenning and Elliot 1988] that is we represent themby variables bound at the meta level To continue the example the above function is reied as(str Exp[String]) rArr str + Note how the type of str in the body of this version isExp[String] because str is a reied variable now Correspondingly the expression str + islifted as described in the previous section

With all operations in the function body automatically lifted the only remaining syntacticdierence between normal and lifted functions is the type annotation for the functionrsquos parameterFortunately Scalarsquos local type inference can usually deduce the argument type from the context forexample from the signature of the map operation being called Type inference plays a dual role hereFirst it allows the query writer to leave out the annotation and second it triggers lifting in thefunction body by requesting a lifted function instead of a normal function This is how in Fig 31 asingle call to asSquopt triggers lifting of the overall query

Note that reied functions have type Exp[A] rArr Exp[B] instead of the more regular Exp[A rArr B]We chose the former over the latter to support Scalarsquos syntax for anonymous functions and for-comprehensions which is hard-coded to produce or consume instances of the pre-dened A rArr Btype We have to reect this irregularity in the lifting of methods and functions by treating thetypes of higher-order arguments accordingly

User-dened methods revised We can now extend the lifting of signatures for methods orfunctions from the previous section to the general case that is the case of higher-order functionsWe lift a method or function with signature

def m[A1 An](a1 T1 an Tn) R

to a method or function with the following signature

def m[A1 An](a1 LinT1o an LinTno) LinRoAs before the denition of the lifted method or function will return an expression node representingthe call If the original was a function the lifted version is also dened as a function If the originalwas a method on type T then the lifted version is enriched onto T

The type transformation Li converts the argument and return types of the method or functionto be lifted For most types Li just wraps the type in the Exp type constructor but functiontypes are treated specially Li recursively descends into function types to convert their argumentsseparately Overall Li behaves as follows

Lin(A1 An) rArr R]o =(LinA1o LinAno rArr LinRo

LinAo = Exp[A]

We can use this extended denition of method lifting to implement map lifting for Lists that isa method with signature Exp[List[T]]map(Exp[T] rArr Exp[U])

implicit def expToListOps[T](coll Exp[List[T]]) = new ListOps(coll)implicit def toListOps[T](coll List[T]) = expToListOps(coll)

class ListOps(coll Exp[List[T]]) def map[U](f Exp[T] rArr Exp[U]) = ListMapNode(coll Fun(f))

val records = bookswithFilter(book rArr bookpublisher == Pearson Education)flatMap(book rArr bookauthorsmap(author rArr BookData(booktitle authorfirstName + + authorlastName

bookauthorssize - 1)))

Figure 51 Desugaring of code in Fig 22

case class ListMapNode[T U](coll Exp[List[T]] mapping Exp[T rArr U]) extends Exp[List[U]]

Note how maprsquos parameter f has type Exp[T] rArr Exp[U] as necessary to enable Scalarsquos syntaxfor anonymous functions and automatic lifting of the function body This implementation wouldwork for queries on lists but does not support other collection types or queries where the collectiontype changes We show in Sec 55 how SOpt integrates with such advanced features of the ScalaCollection EDSL

55 Lifting collectionsIn this section we rst explain how for-comprehensions are desugared to calls to library functionsallowing an external library to give them a dierent meaning We summarize afterwards neededinformation about the subset of the Scala collection EDSL that we reify Then we present how weperform this reication We nally present the reication of the running example (Fig 31)

For-comprehensions As we have seen an idiomatic encoding in Scala of queries on collectionsare for-comprehensions Although Scala is an impure functional language and supports side-eectfulfor-comprehensions only pure queries are supported in our framework because this enables orsimplies many domain-specic analyses Hence we will restrict our attention to pure queries

The Scala compiler desugars for-comprehensions into calls to three collection methods mapflatMap and withFilter which we explain shortly the query in Fig 22 is desugared to the codein Fig 51

The compiler performs type inference and type checking on a for-comprehension only afterdesugaring it this aords some freedom for the types of map flatMap and withFilter methods

The Scala Collection EDSL A Scala collection containing elements of type T implements thetrait Traversable[T] On an expression coll of type Traversable[T] one can invoke methodsdeclared (in rst approximation) as follows

def map[U](f T rArr U) Traversable[U]def flatMap[U](f T rArr Traversable[U]) Traversable[U]def withFilter[U](p T rArr Boolean ) Traversable[T]

For a Scala collection coll the expression collmap(f) applies f to each element of collcollects the results in a new collection and returns it collflatMap(f) applies f to each elementof coll concatenates the results in a new collection and returns it collwithFilter(p) producesa collection containing the elements of coll which satisfy the predicate p

However Scala supports many dierent collection types and this complicates the actual types ofthese methods Each collection can further implement subtraits like Seq[T] lt Traversable[T]

(for sequences) Set[T] lt Traversable[T] (for sets) and Map[K V] lt Traversable[(K V)](for dictionaries) for each such trait dierent implementations are provided

One consequence of this syntactic desugaring is that a single for-comprehension can operateover dierent collection types The type of the result depends essentially on the type of the rootcollection that is books in the example above The example above can hence be altered to producea sequence rather than a set by simply converting the root collection to another type

val recordsSeq = for book larr bookstoSeqif bookpublisher == Pearson Educationauthor larr bookauthors

yield BookData(booktitle authorfirstName + + authorlastName bookauthorssize - 1)

Precise static typing The Scala collections EDSL achieves precise static typing while avoidingcode duplication [Odersky and Moors 2009] Precise static typing is necessary because the returntype of a query operation can depend on subtle details of the base collection and query argumentsTo return the most specic static type the Scala collection DSL denes a type-level relation betweenthe source collection type the element type for the transformed collection and the type for theresulting collection The relation is encoded through the concept pattern [Oliveira et al 2010] iethrough a type-class-style trait CanBuildFrom[From Elem To] and elements of the relation areexpressed as implicit instances

For example a nite map can be treated as a set of pairs so that mapping a function frompairs to pairs over it produces another nite map This behavior is encoded in an instance of typeCanBuildFrom[Map[K V] (K1 V1) Map[K1 V1] The Map[K V] is the type of the basecollection (K1 V1) is the return type of the function and Map[K1 V1] is the return type of themap operation

It is also possible to map some other function over the nite map but the result will be ageneral collection instead of a nite map This behavior is described by an instance of typeCanBuildFrom[Traversable[T] U Traversable[U] Note that this instance is also applicableto nite maps because Map is a subclass of Traversable Together these two instances describehow to compute the return type of mapping over a nite map

Code reuse Even though these two use cases for mapping over a nite map have dierent returntypes they are implemented as a single method that uses its implicit CanBuildFrom parameter tocompute both the static type and the dynamic representation of its result So the Scala CollectionsEDSL provides precise typing without code duplication In our deep embedding we want to preservethis property

CanBuildFrom is used in the implementation of map flatMap and withFilter To furtherincrease reuse the implementations are provided in a helper trait TraversableLike[T Repr]with the following signatures

def map[U](f T rArr U)( implicit cbf CanBuildFrom[Repr U That ]) Thatdef flatMap[U](f T rArr Traversable[U])( implicit cbf CanBuildFrom[Repr U That ]) Thatdef withFilter[U](p T rArr Boolean ) Repr

The Repr type parameter represents the specic type of the receiver of the method callThe lifting The basic idea is to use the enrich-my-library pattern to lift collection methods

from TraversableLike[T Repr] to TraversableLikeOps[T Repr]

implicit def expToTraversableLikeOps[T Repr](v Exp[TraversableLike[T Repr ]]) =new TraversableLikeOps[T Repr](v)

class TraversableLikeOps[T Repr lt Traversable[T] with TraversableLike[T Repr ]](t Exp[Repr]) val t Exp[Repr]

def withFilter(f Exp[T] rArr Exp[Boolean ]) Exp[Repr] = Filter(thist FuncExp(f))

def map[U That lt TraversableLike[U That ]](f Exp[T] rArr Exp[U])(implicit c CanBuildFrom[Repr U That ]) Exp[That] = MapNode(thist FuncExp(f))

other methods definitions of MapNode Filter omitted

Figure 52 Lifting TraversableLike

Subclasses of TraversableLike[T Repr] also subclass both Traversable[T] and Repr totake advantage of this during interpretation and optimization we need to restrict the type ofexpToTraversableLikeOps getting the following conversion5

implicit def expToTraversableLikeOps[T Repr lt Traversable[T] with TraversableLike[T Repr ]](v Exp[Repr]) =

new TraversableLikeOps[T Repr](v)

The query operations are dened in class TraversableLikeOps[T Repr] a few examples areshown in Fig 526

Note how the lifted query operations use CanBuildFrom to compute the same static return typeas the corresponding non-lifted query operation would compute This reuse of type-level machineryallows SOpt to provide the same interface as the Scala Collections EDSL

Code reuse revisited We already saw how we could reify List[T]map through a specicexpression node ListMapNode However this approach would require generating many variantsfor dierent collections with slightly dierent types writing an optimizer able to handle all suchvariations would be unfeasible because of the amount of code duplication required Instead byreusing Scala type-level machinery we obtain a reication which is statically typed and at the sametime avoids code duplication in both our lifting and our optimizer and in general in all possibleconsumers of our reication making them feasible to write

56 InterpretationAfter optimization SOpt needs to interpret the optimized expression trees to perform the queryTherefore the trait Exp[T] declares a def interpret() T method and each expression nodeoverrides it appropriately to implement a mostly standard typed tagless [Carette et al 2009]environment-based interpreter The interpreter computes a value of type T from an expression treeof type Exp[T] This design allows query writers to extend the interpreter to handle application-specic operations In fact the lifting generator described in Sec 54 automatically generatesappropriate denitions of interpret for the lifted operations

5Due to type inference bugs the actual implicit conversion needs to be slightly more complex to mention T directly inthe argument type We reported the bug at hpsissuesscala-langorgbrowseSI-5298

6Similar liftings are introduced for traits similar to TraversableLike like SeqLike SetLike MapLike and so on

For example the interpretation of string concatenation is simply string concatenation as shownin the following fragment of the interpreter Note that type-safety of the interpreter is checkedautomatically by the Scala compiler when it compiles the fragments

case class StringConcat(str1 Exp[String] str2 Exp[String ]) extends Exp[String] def interpret () = str1interpret () + str2interpret ()

The subset of Scala we reify roughly corresponds to a typed lambda calculus with subtyping andtype constructors It does not include constructs for looping or recursion so it should be stronglynormalizing as long as application programmers do not add expression nodes with non-terminatinginterpretations However query writers can use the host language to create a reied expression ofinnite size This should not be an issue if SOpt is used as a drop-in replacement for the ScalaCollection EDSL

During optimization nodes of the expression tree might get duplicated and the interpretercould in principle observe this sharing and treat the expression tree as a DAG to avoid recomputingresults Currently we do not exploit this unlike during compilation

57 OptimizationOur optimizer is structured as a pipeline of dierent transformations on a single intermediaterepresentation constituted by our expression trees Each phase of the pipeline and the pipelineas a whole produce a new expression having the same type as the original one Most of ourtransformations express simple rewrite rules with or without side conditions which are appliedon expression trees from the bottom up and are implemented using Scalarsquos support for patternmatching [Emir et al 2007b]

Some optimizations like lter hoisting (which we applied manually to produce the code inFig 23) are essentially domain-specic and can improve complexity of a query To enable suchoptimizations to trigger however one needs often to perform inlining-like transformations and tosimplify the result Inlining-related transformation can for instance produce code like (x y)_1which we simplify to x reducing abstraction overhead but also (more importantly) making syn-tactically clear that the result does not depend on y hence might be computed before y is evenbound This simplication extends to user-dened product types with denitions in Fig 21 codelike BookData(booktitle )title is simplied to booktitle

We have implemented thus optimizations of three classes

bull general-purpose simplications like inlining compile-time beta-reduction constant foldingand reassociation on primitive types and other simplications7

bull domain-specic simplications whose main goal is still to enable more important optimiza-tions

bull domain-specic optimizations which can change the complexity class of a query such aslter hoisting hash-join transformation or indexing

Among domain-specic simplications we implement a few described in the context of themonoid comprehension calculus [Grust and Scholl 1996b Grust 1999] such as query unnestingand fusion of bulk operators Query unnesting allows to unnest a for comprehension nested insideanother and produce a single for-comprehension Furthermore we can fuse dierent collection

7Beta-reduction and simplication are run in a xpoint loop [Peyton Jones and Marlow 2002] Termination is guaranteedbecause our language does not admit general recursion

operations together collection operators like map flatMap and withFilter can be expressed asfolds producing new collections which can be combined Scala for-comprehension are howevermore general than monoid comprehensions8 hence to ensure safety of some optimizations we needsome additional side conditions9

Manipulating functionsTo be able to inspect a HOAS function body funBody Exp[S] rArr Exp[T]like str rArr str + we convert it to rst-order abstract syntax (FOAS) that is to an expressiontree of type Exp[T] To this end we introduce a representation of variables and a generator of freshones since variable names are auto-generated they are internally represented simply as integersinstead of strings for eciency

To convert funBody from HOAS to FOAS we apply it to a fresh variable v of type TypedVar[S]obtaining a rst-order representation of the function body having type Exp[T] and containingoccurrences of v

This transformation is hidden into the constructor Fun which converts Exp[S] rArr Exp[T]a representation of an expression with one free variable to Exp[S rArr T] a representation of afunction

case class App[S T](f Exp[S rArr T] t Exp[S]) extends Exp[T]def Fun[-S +T](f Exp[S] rArr Exp[T]) Exp[S rArr T] =

val v = Fungensym[S]()FOASFun(funBody(v) v)

case class FOASFun[S T](val foasBody Exp[T] v TypedVar[S]) extends Exp[S rArr T]implicit def app[S T](f Exp[S rArr T]) Exp[S] rArr Exp[T] =

arg rArr App(f arg)

Conversely function applications are represented using the constructor App an implicit conver-sion allows App to be inserted implicitly Whenever f can be applied to arg and f is an expressiontree the compiler will convert f(arg) to app(f)(arg) that is App(f arg)

In our example Fun(str rArr str + ) producesFOASFun(StringConcat(TypedVar[String](1) Const()) TypedVar[String](1))

Since we auto-generate variable names it is easiest to implement represent variable occurrencesusing the Barendregt convention where bound variables must always be globally unique we mustbe careful to perform renaming after beta-reduction to restore this invariant [Pierce 2002 Ch 6]

We can now easily implement substitution and beta-reduction and through that as shownbefore enable other optimizations to trigger more easily and speedup queries

8For instance a for-comprehension producing a list cannot iterate over a set9For instance consider a for-comprehension producing a set and nested inside another producing a list This comprehen-

sion does not correspond to a valid monoid comprehension (see previous footnote) and query unnesting does not applyhere if we unnested the inner comprehension into the outer one we would not perform duplicate elimination on the innercomprehension aecting the overall result

Chapter 6

Evaluating SQuOpt

The key goals of SOpt are to reconcile modularity and eciency To evaluate this claim weperform a rigorous performance evaluation of queries with and without SOpt We also analyzemodularization potential of these queries and evaluate how modularization aects performance(with and without SOpt)

We show that modularization introduces a signicant slowdown The overhead of using SOptis usually moderate and optimizations can compensate this overhead remove the modularizationslowdown and improve performance of some queries by orders of magnitude especially whenindexes are used

61 Study SetupThroughout this chapter we have already shown several compact queries for which our optimiza-tions increase performance signicantly compared to a naive execution Since some optimizationschange the complexity class of the query (eg by using an index) so the speedups grow with thesize of the data However to get a more realistic evaluation of SOpt we decided to perform anexperiment with existing real-world queries

As we are interested in both performance and modularization we have a specication and threedierent implementations of each query that we need to compare

(0) Query specication We selected a set of existing real-world queries specied and imple-mented independently from our work and prior to it We used only the specication of thesequeries

(1) Modularized Scala implementation We reimplemented each query as an expression onScala collections mdash our baseline implementation For modularity we separated reusabledomain abstractions into subqueries We conrmed the abstractions with a domain expertand will later illustrate them to emphasize their general nature

(2) Hand-optimized Scala implementation Next we asked a domain expert to performedmanual optimizations on the modularized queries The expert should perform optimizationssuch as inlining and lter hoisting where he could nd performance improvements

(3) SOpt implementation Finally we rewrote the modularized Scala queries from (1) asSOpt queries The rewrites are of purely syntactic nature to use our library (as describedin Sec 31) and preserve the modularity of the queries

36 Chapter 6 Evaluating SQuOpt

Since SOpt supports executing queries with and without optimizations and indexes wemeasured actually three dierent execution modes of the SOpt implementation

(3minus) SOpt without optimizer First we execute the SOpt queries without performingoptimization rst which should show the SOpt overhead compared to the modular Scalaimplementation (1) However common-subexpression elimination is still used here sinceit is part of the compilation pipeline This is appropriate to counter the eects of excessiveinlining due to using a deep embedding as explained in Sec 41

(3o ) SOpt with optimizer Next we execute SOpt queries after optimization

(3x ) SOpt with optimizer and indexes Finally we execute the queries after providing a setof indexes that the optimizer can consider

In all cases we measure query execution time for the generated code excluding compilationwe consider this appropriate because the results of compilations are cached aggressively and can bereused when the underlying data is changed potentially even across executions (even though thisis not yet implemented) as the data is not part of the compiled code

We use additional indexes in (3x ) but not in the hand-optimized Scala implementation (2) Weargue that indexes are less likely to be applied manually because index application is a crosscuttingconcern and makes the whole query implementation more complicated and less abstract Still weoer measurement (3o ) to compare the speedup without additional indexes

This gives us a total of ve settings to measure and compare (1 2 3minus 3o and 3x ) Betweenthem we want to observe the following interesting performance ratios (speedups or slowdownscomputed through the indicated divisions)

(M) Modularization overhead (the relative performance dierence between the modularized andthe hand-optimized Scala implementation 12)

(S) SOpt overhead (the overhead of executing unoptimized SOpt queries 13minus smaller isbetter)

(H) Hand-optimization challenge (the performance overhead of our optimizer against hand-optimizations of a domain expert 23o bigger is better) This overhead is partly due to theSOpt overhead (S) and partly to optimizations which have not been automated or havenot been eective enough This comparison excludes the eects of indexing since this isan optimization we did not perform by hand we also report (Hrsquo) = 23x which includesindexing

(O) Optimization potential (the speedup by optimizing modularized queries 13o bigger isbetter)

(X) Index inuence (the speedup gained by using indexes 3o3x ) (bigger is better)

(T) Total optimization potential with indexes (13x bigger is better) which is equal to (O) times (X )

In Fig 61 we provide an overview of the setup We made our raw data available and our resultsreproducible [Vitek and Kalibera 2011]1

Chapter 6 Evaluating SQuOpt 37

SQuOpt with optimizer

Modularized Scala

Implementation

Hand-opt Scala

Implementation

SQuOpt without optimizer

and indexes

Reference Implementation

Specification

3- 3o 3x

Legend derived from comparison

Figure 61 Measurement Setup Overview

62 Experimental UnitsAs experimental units we sampled a set of queries on code structures from FindBugs 20 [Hovemeyerand Pugh 2004] FindBugs is a popular bug-nding tool for Java Bytecode available as open sourceTo detect instances of bug patterns it queries a structural in-memory representation of a codebase (extracted from bytecode) Concretely a single loop traverses each class and invokes allvisitors (implemented as listeners) on each element of the class Many visitors in turn performactivities concerning multiple bug detectors which are fused together An extreme example is thatin FindBugs query 4 is dened in class DumbMethods together with other 41 bug detectors fordistinct types of bugs Typically a bug detector is furthermore scattered across the dierent methodsof the visitor which handle dierent elements of the class We believe this architecture has beenchosen to achieve good performance however we do not consider such manual fusion of distinctbug detectors together as modular We selected queries from FindBugs because they representtypical non-trivial queries on in-memory collections and because we believe our framework allowsexpressing them more modularly

We sampled queries in two batches First we manually selected 8 queries (from approx400 queries in FindBugs) chosen mainly to evaluate the potential speedups of indexing (queriesthat primarily looked for declarations of classes methods or elds with specic properties queriesthat inspect the type hierarchy and queries that required analyzing methods implementation)Subsequently we randomly selected a batch of 11 additional queries The batch excluded queriesthat rely on control-dataow analyses (ie analyzing the eect of bytecode instructions on thestack) due to limitations of the bytecode tookit we use In total we have 19 queries as listed inTable 62 (the randomly selected queries are marked with the superscript R)

We implemented each query three times (see implementations (1)ndash(3) in Sec 61) following thespecications given in the FindBugs documentation (0) Instead of using a hierarchy of visitors as theoriginal implementations of the queries in FindBugs we wrote the queries as for-comprehensions inScala on an in-memory representation created by the Scala toolkit BAT2 BAT in particular providescomprehensive support for writing queries against Java bytecode in an idiomatic way We exemplifyan analysis in Fig 65 It detects all co-variant equals methods in a project by iterating over allclass les (line 2) and all methods searching for methods named ldquoequalsrdquo that return a boolean

1Data available at hpwwwinformatikuni-marburgde~pgiarrussoSOpt2hpgithubcomDelorsBAT

Performance

ratiosId

Description

3minus

CovariantcompareTo()dened

085026

2Explicitgarbage

collectioncall

496258

11761150

3Protected

nalclass11

ExplicitrunFinalizersOnExit()call

509262

11501123

5clone()dened

innon-Cloneable

class29

Covariantequals()dened29

Publicnalizerdened29

Dubiouscatching

ofIllegalMonitorStateException

2612800

Uniniteldread

duringconstruction

ofsuper896

3673017

960960

utablestaticeld

declaredpublic

95279511

91159350

935010

Refactoranoninnerclasstostatic

88048767

87188700

870010

Inecientuse

oftoArray(O

bject[])3714

19054046

34143414

itiveboxed

andunboxed

forcoercion3905

16725044

32243224

oubleprecision

conversionfrom

38871796

52893010

301022

Privilegedm

ethodused

outsidedoPrivileged

505302

1319337

Mutable

publicstaticeldshould

Serializableclassism

emberofnon-serclass

methodsused

outsideSw

ingthread

116345

Finalizeronlycallssuperclassnalize

Table62PerformanceresultsAsin

inSec61(1)denotesthem

odularScalaim

plementation(2)thehand-optim

izedScala

oneand(3minus)(3

o)(3x)

refertothe

plementation

runrespectivelywithoutoptim

izationswith

optimizationsw

ithoptim

izationsandindexingQ

ueriesm

arkedw

iththe

Rsuperscriptw

ereselected

byrandom

sampling

M (12) S (13minus) H (23o ) Hrsquo (23x ) O (13o ) X (3o3x ) T (13x )

Geometric means ofperformance ratios 24x 12x 08x 51x 19x 63x 12x

Table 63 Average performance ratios This table summarizes all interesting performance ratiosacross all queries using the geometric mean [Fleming and Wallace 1986] The meaning of speedupsis discussed in Sec 61

Abstraction UsedAll elds in all class les 4All methods in all class les 3All method bodies in all class les 3All instructions in all method bodies and theirbytecode index

Sliding window (size n) over all instructions (andtheir index)

Table 64 Description of abstractions removed during hand-optimization and number of querieswhere the abstraction is used (and optimized away)

value and dene a single parameter of the type of the current classAbstractions In the reference implementations (1) we identied several reusable abstractions

as shown in Table 64 The reference implementations of all queries except 17R use exactly one ofthese abstractions which encapsulate the main loops of the queries

Indexes For executing (3x ) (SOpt with indexes) we have constructed three indexes to speedup navigation over the queried data of queries 1ndash8 Indexes for method name exception handlersand instruction types We illustrate the implementation of the method-name index in Fig 66 itproduces a collection of all methods and then indexes them using indexBy its argument extractsfrom an entry the key that is the method name We selected which indexes to implement usingguidance from SOpt itself during optimizations SOpt reports which indexes it could haveapplied to the given query Among those we tried to select indexes giving a reasonable compromisebetween construction cost and optimization speedup We rst measured the construction cost ofthese indexes

for classFile larr classFilesasSquoptmethod larr classFilemethodsif methodisAbstract ampamp methodname == equals ampamp

methoddescriptorreturnType == BooleanTypeparameterTypes larr Let(methoddescriptorparameterTypes)if parameterTypeslength == 1 ampamp parameterTypes (0) == classFilethisClass

yield (classFile method)

Figure 65 Find covariant equals methods

val methodNameIdx Exp[Map[String Seq[(ClassFile Method )]]] = (for classFile larr classFilesasSquoptmethod larr classFilemethods

yield (classFile method )) indexBy(entry rArr entry_2name)

Figure 66 A simple index denition

Index Elapsed time (ms)Method name 9799plusmn294Exception handlers 17929plusmn321Instruction type 416649plusmn20285

For our test data index construction takes less than 200 ms for the rst two indexes which ismoderate compared to the time for loading the bytecode in the BAT representation (475532plusmn14166)Building the instruction index took around 4 seconds which we consider acceptable since this indexmaps each type of instruction (eg INSTANCEOF) to a collection of all bytecode instructions of thattype

63 Measurement SetupTo measure performance we executed the queries on the preinstalled JDK class library (rtjar)containing 58M of uncompressed Java bytecode We also performed a preliminary evaluation byrunning queries on the much smaller ScalaTest library getting comparable results that we hence donot discuss Experiments were run on a 8-core Intel Core i7-2600 340 GHz with 8 GB of RAMrunning Scientic Linux release 62 The benchmark code itself is single-threaded so it uses only onecore however the JVM used also other cores to ooad garbage collection We used the preinstalledOpenJDK Java version 170_05-icedtea and Scala 2100-M7

We measure steady-state performance as recommended by Georges et al [2007] We invokethe JVM p = 15 times at the beginning of each JVM invocation all the bytecode to analyze isloaded in memory and converted into BATrsquos representation In each JVM invocation we iterateeach benchmark until the variations of results becomes low enough We measure the variations ofresults through the coecient of variation (CoV standard deviation divided by the mean) Thus weiterate each benchmark until the CoV in the last k = 10 iterations drops under the threshold θ = 01or until we complete q = 50 iterations We report the arithmetic mean of these measurements (andalso report the usually low standard deviation on our web page)

64 ResultsCorrectness We machine-checked that for each query all variants in Table 62 agree

Modularization Overhead We rst observe that performance suers signicantly when usingthe abstractions we described in Table 64 These abstractions while natural in the domain and in thesetting of a declarative language are not idiomatic in Java or Scala because without optimizationthey will obviously lead to bad performance They are still useful abstractions from the point ofview of modularity though mdash as indicated by Table 64 mdash and as such it would be desirable if onecould use them without paying the performance penalty

Scala Implementations vs FindBugs Before actually comparing between the dierent Scalaand SOpt implementations we rst ensured that the implementations are comparable to theoriginal FindBugs implementation A direct comparison between the FindBugs reference implemen-tation and any of our implementations is not possible in a rigorous and fair manner FindBugs bugdetectors are not fully modularized therefore we cannot reasonably isolate the implementationof the selected queries from support code Furthermore the architecture of the implementationhas many dierences that aect performance among others FindBugs also uses multithreadingMoreover while in our case each query loops over all classes in FindBugs as discussed above asingle loop considers each class and invokes all visitors (implemented as listeners) on it

We measured startup performance [Georges et al 2007] that is the performance of runningthe queries only once to minimize the eect of compiler optimizations We setup our SOpt-based analyses to only perform optimization and run the optimized query To setup FindBugs wemanually disabled all unrelated bug detectors we also made the modied FindBugs source codeavailable The result is that the performance of the Scala implementations of the queries (3minus) hasperformance of the same order of magnitude as the original FindBugs queries ndash in our tests theSOpt implementation was about twice as fast However since the comparison cannot be madefair we refrained from a more detailed investigation

SQuOpt Overhead and Optimization Potential We present the results of our benchmarksin Table 62 Column names refer to a few of the denitions described above for readability we donot present all the ratios previously introduced for each query but report the raw data In Table 63we report the geometric mean Fleming and Wallace [1986] of each ratio computed with the sameweight for each query

We see that in its current implementation SOpt can cause a overhead S (13minus) up to 34xOn average SOpt queries are 12x faster These dierences are due to minor implementationdetails of certain collection operators For query 18R instead we have that the the basic SOptimplementation is 129x faster and are investigating the reason we suspect this might be related tothe use of pattern matching in the original query

As expected not all queries benet from optimizations out of 19 queries optimization aordsfor 15 of them signicant speedups ranging from a 12x factor to a 12800x factor 10 queries arefaster by a factor of at least 5 Only queries 10R 11R and 12R fail to recover any modularizationoverhead

We have analyzed the behavior of a few queries after optimization to understand why theirperformance has (or has not) improved

Optimization makes query 17R slower we believe this is because optimization replaces lteringby lazy ltering which is usually faster but not here Among queries where indexing succeedsquery 2 has the least speedup After optimization this query uses the instruction-type index tond all occurrences of invocation opcodes (INVOKESTATIC and INVOKEVIRTUAL) after this step thequery looks among those invocations for ones targeting runFinalizersOnExit Since invocationopcodes are quite frequent the used index is not very specic hence it allows for little speedup(95x) However no other index applies to this query moreover our framework does not maintainany selectivity statistics on indexes to predict these eects Query 19R benets from indexingwithout any specic tuning on our part because it looks for implementations of finalize withsome characteristic hence the highly selective method-name index applies After optimizationquery 8 becomes simply an index lookup on the index for exception handlers looking for handlersof IllegalMonitorStateException it is thus not surprising that its speedup is thus extremelyhigh (12800x) This speedup relies on an index which is specic for this kind of query and buildingthis index is slower than executing the unoptimized query On the other hand building this index isentirely appropriate in a situation where similar queries are common enough Similar considerationsapply to usage of indexing in general similarly to what happens in databases

Optimization Overhead The current implementation of the optimizer is not yet optimizedfor speed (of the optimization algorithm) For instance expression trees are traversed and rebuiltcompletely once for each transformation However the optimization overhead is usually notexcessive and is 548 plusmn 855 ms varying between 35 ms and 3817 ms (mostly depending on thequery size)

Limitations Although many speedups are encouraging our optimizer is currently a proof-of-concept and we experienced some limitations

bull In a few cases hand-optimized queries are still faster than what the optimizer can produceWe believe these problems could be addressed by adding further optimizations

bull Our implementation of indexing is currently limited to immutable collections For mutablecollections indexes must be maintained incrementally Since indexes are dened as specialqueries in SOpt incremental index maintenance becomes an instance of incrementalmaintenance of query results that is of incremental view maintenance We plan to supportincremental view maintenance as part of future work however indexing in the current formis already useful as illustrated by our experimental results

Threats to Validity With rigorous performance measurements and the chosen setup our studywas setup to maximize internal and construct validity Although we did not involve an externaldomain expert and we did not compare the results of our queries with the ones from FindBugs (exceptwhile developing the queries) we believe that the queries adequately represent the modularity andperformance characteristics of FindBugs and SOpt However since we selected only queriesfrom a single project external validity is limited While we cannot generalize our results beyondFindBugs yet we believe that the FindBugs queries are representative for complex in-memoryqueries performed by applications

Summary We demonstrated on our real-world queries that relying on declarative abstractionsin collection queries often causes a signicant slowdown As we have seen using SOpt withoutoptimization or when no optimizations are possible usually provides performance comparableto using standard Scala however SOpt optimizations can in most cases remove the slowdowndue to declarative abstractions Furthermore relying on indexing allows to achieve even greaterspeedups while still using a declarative programming style Some implementation limitationsrestrict the eectiveness of our optimizer but since this is a preliminary implementation we believeour evaluation shows the great potential of optimizing queries to in-memory collections

Chapter 7

Discussion

In this chapter we discuss the degree to which SOpt fullled our original design goals and theconclusions for host and domain-specic language design

71 Optimization limitationsIn our experiments indexing achieved signicant speedups but when indexing does not applyspeedups are more limited in comparison later projects working on collection query optimizationsuch as OptiQL [Rompf et al 2013 Sujeeth et al 2013b] gave better speedups as also discussed inSec 831

A design goal of this project was to incrementalize optimized queries and while it is easy toincrementalize collection operators such as map flatMap or filter it was much less clear to ushow to optimize the result of inlining We considered using shortcut fusion but we did not knowa good way to incrementalize the resulting programs later work as described in Part II clariedwhat is possible and what not

Another problem is that most fusion techniques are designed for sequential processing henceconict with incrementalization Most fusion techniques generally assume bulk operations scancollections in linear order For instance shortcut fusion rewrites operators in terms of foldrDuring parallel andor incremental computation instead it is better to use tree-shaped folds that isto split the task in a divide-and-conquer fashion so that the various subproblems form a balancedtree This division minimizes the height of the tree hence the number of steps needed to combineresults from subproblems into the nal result as also discussed by Steele [2009] It is not so clearhow to apply shortcut fusion to parallel and incremental programs Maier and Odersky [2013]describe an incremental tree-shaped fold where each subproblem that is small enough is solved byscanning its input in linear order but does not perform code transformation and does not studyhow to perform fusion

72 Deep embedding

721 What worked wellSeveral features of Scala contributed greatly to the success we achieved With implicit conversionsthe lifting can be made mostly transparent The advanced type system features were quite helpfulto make the expression tree representation typed The fact that for-comprehensions are desugared

44 Chapter 7 Discussion

before type inference and type checking was also a prerequisite for automatic lifting The syntacticexpressiveness and uniformity of Scala in particular the fact that custom types can have the samelook-and-feel as primitive types were also vital to lift expressions on primitive types

722 LimitationsDespite these positive experiences and our experimental success our embedding has a few signicantlimitations

The rst limitation is that we only lift a subset of Scala and some interesting features are missingWe do not support statements in nested blocks in our queries but this could be implemented reusingtechniques from Delite [Rompf et al 2011] More importantly for queries pattern matching cannotbe supported by deep embedding similar to ours In contrast to for-comprehension syntax patternmatching is desugared only after type checking Emir et al [2007b] which prevents us from liftingpattern matching notation More specically because an extractor Emir et al [2007b] cannot returnthe representation of a result value (say Exp[Boolean]) to later evaluate it must produce its nalresult at pattern matching time There is initial work on ldquovirtualized pattern matchingrdquo1 and wehope to use this feature in future work

We also experienced problems with operators that cannot be overloaded such as == or if-elseand with lifting methods in scalaAny which forced us to provide alternative syntax for thesefeatures in queries The Scala-virtualized project [Moors et al 2012] aims to address these limitationsunfortunately it advertises no change on the other problems we found which we subsequentlydetail

It would also be desirable if we could enforce the absence of side eects in queries but sinceScala like most practical programming languages except Haskell does not track side eects in thetype system this does not seem to be possible

Finally compared to lightweight modular staging [Rompf and Odersky 2010] (the foundation ofDelite) and to polymorphic embedding [Hofer et al 2008] we have less static checking for someprogramming errors when writing queries the recommended way to use Delite is to write a EDSLprogram in one trait in terms of the EDSL interface only and combine it with the implementationin another trait In polymorphic embedding the EDSL program is a function of the specic imple-mentation (in this case semantics) Either approach ensures that the DSL program is parametric inthe implementation and hence cannot refer to details of a specic implementation However wejudged the syntactic overhead for the programmer to be too high to use those techniques ndash in ourimplementation we rely on encapsulation and on dynamic checks at query construction time toachieve similar guarantees

The choice of interpreting expressions turned out to be a signicant performance limitation Itcould likely be addressed by using Delite and lightweight modular staging instead but we wantedto experiment with how far we can get within the language in a well-typed setting

723 What did not work so well Scala type inferenceWhen implementing our library we often struggled against limitations and bugs in the Scalacompiler especially regarding type inference and its interaction with implicit resolution and wewere often constrained by its limitations Not only Scalarsquos type inference is not complete but welearned that its behavior is only specied by the behavior of the current implementation in manycases where there is a clear desirable solution type inference fails or nds an incorrect substitutionwhich leads to a type error Hence we cannot distinguish in the discussion the Scala language fromits implementation We regard many of Scalarsquos type inference problems as bugs and reported them

1hpstackoverflowcomquestions8533826what-is-scalas-experimental-virtual-paern-matcher

Chapter 7 Discussion 45

as such when no previous bug report existed as noted in the rest of this section Some of them arelong-standing issues others of them were accepted for other ones we received no feedback yetat the time of this writing and another one was already closed as WONTFIX indicating that a xwould be possible but have excessive complexity for the time being2

Overloading The code in Fig 31 uses the lifted BookData constructor Two denitions ofBookData are available with signatures BookData(String String Int) and BookData(Exp[String] Exp[String] Exp[Int])and it seems like the Scala compiler should be able to choose which one to apply using overloadresolution This however is not supported simply because the two functions are dened in dierentscopes3 hence importing BookData(Exp[String] Exp[String] Exp[Int]) shadows locallythe original denition

Type inference vs implicit resolution The interaction between type inference and implicitresolution is a hard problem and Scalac has also many bugs but the current situation requiresfurther research for instance there is not even a specication for the behavior of type inference4

As a consequence to the best of our knowledge some properties of type inference have notbeen formally established For instance a reasonable user expectation is that removing a call to animplicit conversion does not alter the program if it is the only implicit conversion with the correcttype in scope or if it is more specic than the others [Odersky et al 2011 Ch 21] This is notalways correct because removing the implicit conversion reduces the information available for thetype inference algorithm we observed multiple cases5 where type inference becomes unable togure out enough information about the type to trigger implicit conversion

We also consider signicant that Scala 28 required making both type inference and implicitresolution more powerful specically in order to support the collection library [Moors 2010Odersky et al 2011 Sec 217] further extensions would be possible and desirable For instanceif type inference were extended with higher-order unication6 [Pfenning 1988] it would bettersupport a part of our DSL interface (not discussed in this chapter) by removing the need for typeannotations

Nested patternmatches for GADTs in Scala Writing a typed decomposition for Exp requirespattern-matching support for generalized algebraic datatypes (GADTs) We found that supportfor GADTs in Scala is currently insucient Emir et al [2007b] dene the concept of typecasingessentially a form of pattern-matching limited to non-nested patterns and demonstrate that Scalasupports typecasing on GADTs in Scala by demonstrating a typed evaluator however typecasingis rather verbose for deep patterns since one has to nest multiple pattern-matching expressionsWhen using normal pattern matches instead the support for GADT seems much weaker7 Henceone has to choose between support for GADT and the convenience of nested pattern matching Athird alternative is to ignore or disable compiler warnings but we did not consider this option

Implicit conversions do not chain While implicit conversions by default do not chain it issometimes convenient to allow chaining selectively For instance let us assume a context such thata Exp[A] b Exp[B] and c Exp[C] In this context let us consider again how we lift tuplesWe have seen that the expression (a b) has type (Exp[A] Exp[B]) but can be converted toExp[(A B)] through an implicit conversion Let us now consider nested tuples like ((a b) c)it has type ((Exp[A] Exp[B]) Exp[C]) hence the previous conversion cannot be applied tothis expression

Odersky et al [2011 Ch 21] describe a pattern which can address this goal Using this patternto lift pairs we must write an implicit conversion from pairs of elements which can be implicitly

2hpsissuesscala-langorgbrowseSI-25513hpsissuesscala-langorgbrowseSI-25514hpsissuesscala-langorgbrowseSI-5298focusedCommentId=55971comment-55971 reported by us5hpsissuesscala-langorgbrowseSI-5592 reported by us6hpsissuesscala-langorgbrowseSI-27127Due to bug hpsissuesscala-langorgbrowseSI-5298focusedCommentId=56840comment-56840 reported by us

converted to expressions Instead of requiring (Exp[A] Exp[B]) the implicit conversion shouldrequire (A B) with the condition that A can be converted to Exp[Arsquo] and B to Exp[Brsquo] Thisconversion solves the problem if applied explicitly but the compiler refuses to apply it implicitlyagain because of type inference issues8

Because of this type inference limitations we failed to provide support for reifying code like((a b) c)9

Error messages for implicit conversions The enrich-my-library pattern has the declaredgoal to allow to extend existing libraries transparently However implementation details shinehowever through when a user program using this feature contains a type error When invokinga method would require an implicit conversion which is not applicable the compiler often justreports that the method is not available The recommended approach to debugging such errors isto manually add the missing implicit conversion and investigating the type error [Odersky et al2011 Ch 218] but this destroys the transparency of the approach when creating or modifying codeWe believe this could be solved in principle by research on error reporting the compiler couldautomatically insert all implicit conversions enabling the method calls and report correspondingerrors even if at some performance cost

724 Lessons for language embeddersVarious domains such as the one considered in our case study allow powerful domain-specicoptimizations Such optimizations often are hard to express in a compositional way hence theycannot be performed while building the query but must be expressed as global optimizations passesFor those domains deep embedding is key to allow signicant optimizations On the other handdeep embedding requires to implement an interpreter or a compiler

On the one hand interpretation overhead is signicant in Scala even when using HOAS to takeadvantage of the metalevel implementation of argument access

Instead of interpreting a program one can compile a EDSL program to Scala and load it asdone by Rompf et al [2011] while we are using this approach the disadvantage is the compilationdelay especially for Scala whose compilation process is complex and time-consuming Possiblealternatives include generating bytecode directly or combining interpretation and compilationssimilarly to tiered JIT compilers where only code which is executed often is compiled We plan toinvestigate such alternatives in future work

8hpsissuesscala-langorgbrowseSI-5651 reported by us9One could of course write a specic implicit conversions for this case however (a (b c)) requires already a dierent

implicit conversion and there are innite ways to nest pairs let alone tuples of bounded arity

Chapter 8

Related work

This chapter builds on prior work on language-integrated queries query optimization techniquesfor DSL embedding and other works on code querying

81 Language-Integrated QueriesMicrosoftrsquos Language-Integrated Query technology (Linq) [Meijer et al 2006 Bierman et al 2007]is similar to our work in that it also reies queries on collections to enable analysis and optimizationSuch queries can be executed against a variety of backends (such as SQL databases or in-memoryobjects) and adding new back-ends is supported Its implementation uses expression trees a compiler-supported implicit conversion between expressions and their reication as a syntax tree Thereare various major dierences though First the support for expression trees is hard-coded intothe compiler This means that the techniques are not applicable in languages that do not explicitlysupport expression trees More importantly the way expression trees are created in Linq is genericand xed For instance it is not possible to create dierent tree nodes for method calls that arerelevant to an analysis (such as the map method) than for method calls that are irrelevant forthe analysis (such as the toString method) For this reason expression trees in Linq cannot becustomized to the task at hand and contain too much low-level information It is well-known thatthis makes it quite hard to implement programs operating on expression trees [Eini 2011]

Linq queries can also not easily be decomposed and modularized For instance consider thetask of refactoring the lter in the query from x in y where xz == 1 select x into a functionDening this function as bool comp(int v) return v == 1 would destroy the possibilityof analyzing the lter for optimization since the resulting expression tree would only containa reference to an opaque function The function could be declared as returning an expressiontree instead but then this function could not be used in the original query anymore since thecompiler expects an expression of type bool and not an expression tree of type bool It could onlybe integrated if the expression tree of the original query is created by hand without using thebuilt-in support for expression trees

Although queries against in-memory collections could theoretically also be optimized in Linqthe standard implementation Linq2Objects performs no optimizations

A few optimized embedded DSLs allow executing queries or computations on distributed clustersDryadLINQ [Yu et al 2008] based on Linq optimizes queries for distributed execution It inheritsLinqrsquos limitations and thus does not support decomposing queries in dierent modules Modulariz-ing queries is supported instead by FlumeJava [Chambers et al 2010] another library (in Java) fordistributed query execution However FlumeJava cannot express many optimizations because its

48 Chapter 8 Related work

representation of expressions is more limited also its query language is more cumbersome Bothproblems are rooted in Javarsquos limited support for embedded DSLs Other embedded DSLs supportparallel platforms such as GPUs or many-core CPUs such as Delite [Rompf et al 2013]

Willis et al [2006 2008] add rst-class queries to Java through a source-to-source translatorand implement a few selected optimizations including join order optimization and incrementalmaintenance of query results They investigate how well their techniques apply to Java programsand they suggest that programmers use manual optimizations to avoid expensive constructs likenested loops While the goal of these works is similar to ours their implementation as an externalsource-to-source-translator makes the adoption extensibility and composability of their techniquedicult

There have been many approaches for a closer integration of SQL queries into programs such asHaskellDB [Leijen and Meijer 1999] (which also inspired Linq) or Ferry [Grust et al 2009] (whichmoves part of a program execution to a database) In Scala there are also APIs which integrate SQLqueries more closely such as Slick1 Its frontend allows to dene and combine type-safe queriessimilarly to ours (also in the way it is implemented) However the language for dening queriesmaps to SQL so it does not support nesting collections in other collections (a feature which simpliedour example in Sec 22) nor distinguishes statically between dierent kinds of collections such asSet or Seq Based on Ferry ScalaQL [Garcia et al 2010] extends Scala with a compiler-plugin tointegrate a query language on top of a relational database The work by Spiewak and Zhao [2009] isunrelated to [Garcia et al 2010] but also called ScalaQL It is similar to our approach in that it alsoproposes to reify queries based on for-comprehensions but it is not clear from their paper how thereication works2

82 Query OptimizationQuery optimization on relational data is a long-standing issue in the database community butthere are also many works on query optimization on objects [Fegaras and Maier 2000 Grust 1999]Compared to these works we have only implemented a few simple query optimizations so there ispotential for further improvement of our work by incorporating more advanced optimizations

Henglein [2010] and Henglein and Larsen [2010] embed relational algebra in Haskell Queriesare not reied in their approach but due to a particularly sophisticated representation of multisetsit is possible to execute some queries containing cross-products using faster equi-joins While theirapproach appears to not rely on non-conuent rewriting rules and hence appears more robust it isnot yet clear how to incorporate other optimizations

83 Deep Embeddings in ScalaTechnically our implementation of SOpt is a deep embedding of a part of the Scala collectionsAPI [Odersky and Moors 2009] Deep embeddings were pionereed by Leijen and Meijer [1999] andElliott et al [2003] in Haskell for other applications

We regard the Scala collections API [Odersky and Moors 2009] as a shallowly embedded queryDSL Query operators are eager that is they immediately perform collection operations when calledso that it is not possible to optimize queries before execution

Scala collections also provide views which are closer to SOpt Unlike standard queryoperators they create lazy collections Like SOpt views reify query operators as data structuresand interpret them later Unlike SOpt views are not used for automatic query optimization

1hpslicktypesafecom2We contacted the authors they were not willing to provide more details or the sources of their approach

Chapter 8 Related work 49

but for explicitly changing the evaluation order of collection processing Moreover they cannotbe reused by SOpt as they reify too little Views embed deeply only the outermost pipeline ofcollection operators while they embed shallowly nested collection operators and other Scala codein queries such as arguments of filter map and flatMap Deep embedding of the whole query isnecessary for many optimizations as discussed in Chapter 3

831 LMSOur deep embedding is inspired by some of the Scala techniques presented by Lightweight ModularStaging (LMS) [Rompf and Odersky 2010] for using implicits and for adding inx operators to atype Like Rompf and Odersky [2010] we also generate and compile Scala code on-the-y reusingthe Scala compiler A plausible alternative backend for SOpt would have been to use LMS andDelite [Rompf et al 2011] a framework for building highly ecient DSLs in Scala

An alternative to SOpt based on LMS and Delite named OptiQL was indeed built andpresented in concurrent [Rompf et al 2013] and subsequent work [Sujeeth et al 2013b] LikeSOpt OptiQL enables writing and optimizing collection queries in Scala On the minus sideOptiQL supports fewer collections types (ArrayList and HashMap)

On the plus side OptiQL supports queries containing eects and can reuse support for automaticparallelization and multiple platforms present in Delite While LMS allows embedding eectfulqueries it is unclear how many of the implemented optimizations keep being sound on such queriesand how many of those have been extended to such queries

OptiQL achieves impressive speedups by fusing collection operations and transforming (orlowering) high-level bulk operators to highly optimized imperative programs using while loopsThis gains signicant advantages because it can avoid boxing intermediate allocations and becauseinlining heuristics in the Hotspot JIT compiler have never been tuned to bulk operators in functionalprograms3 We did not investigate such lowerings It is unclear how well these optimizations wouldextend to other collections which intrinsically carry further overhead Moreover it is unclear howto execute incrementally the result of these lowerings as we discuss in Sec 71

Those works did not support embedding arbitrary libraries in an automated or semi-automatedway this was only addressed later in Forge [Sujeeth et al 2013a] and Yin-Yang [Jovanovic et al2014]

Ackermann et al [2012] present Jet which also optimizes collection queries but targets MapReduce-style computations in a distributed environment Like OptiQL Jet does not apply typical databaseoptimizations such as indexing or lter hoisting

84 Code QueryingIn our evaluation we explore the usage of SOpt to express queries on code and re-implement asubset of the FindBugs [Hovemeyer and Pugh 2004] analyses There are various other specializedcode query languages such as CodeQuest [Hajiyev et al 2006] or D-CUBED [Wegrzynowicz andStencel 2009] Since these are special-purpose query languages that are not embedded into a hostlanguage they are not directly comparable to our approach

3See discussion by Cli Click at hpwwwazulsystemscomblogcli2011-04-04-fixing-the-inlining-problem

Chapter 9

Conclusions

We have illustrated the tradeo between performance and modularity for queries on in-memorycollections We have shown that it is possible to design a deep embedding of a version of thecollections API which reies queries and can optimize them at runtime Writing queries using thisframework is except minor syntactic details the same as writing queries using the collection libraryhence the adoption barrier to using our optimizer is low

Our evaluation shows that using abstractions in queries introduces a signicant performanceoverhead with native Scala code while SOpt in most cases makes the overhead much moretolerable or removes it completely Optimizations are not sucient on some queries but sinceour optimizer is a proof-of-concept with many opportunities for improvement we believe a moreelaborate version will achieve even better performance and reduce these limitations

91 Future workTo make our DSL more convenient to use it would be useful to use the virtualized pattern matcherof Scala 210 when it will be more robust to add support for pattern matching in our virtualizedqueries

Finally while our optimizations are type-safe as they rewrite an expression tree to anotherof the same type currently the Scala type-checker cannot verify this statically because of itslimited support for GADTs Solving this problem conveniently would allow checking statically thattransformations are safe and make developing them easier

52 Chapter 9 Conclusions

Part II

Incremental λ-Calculus

Chapter 10

Introduction to dierentiation

Incremental computation (or incrementalization) has a long-standing history in computer sci-ence [Ramalingam and Reps 1993] Often a program needs to update quickly the output of somenontrivial function f when the input to the computation changes In this scenario we assume wehave computed y1 = f x1 and we need to compute y2 that equals f x2 In this scenario programmerstypically have to choose between a few undesirable options

bull Programmers can call again function f on the updated input x2 and repeat the computationfrom scratch This choice guarantees correct results and is easy to implement but typicallywastes computation time Often if the updated input is close to the original input the sameresult can be computed much faster

bull Programmers can write by hand a new function df that updates the output based on inputchanges using various techniques Running a hand-written function df can be much moreecient than rerunning f but writing df requires signicant developer eort is error-proneand requires updating df by hand to keep it consistent with f whenever f is modied Inpractice this complicates code maintenance signicantly [Salvaneschi and Mezini 2013]

bull Programmers can write f using domain-specic languages that support incrementalizationfor tasks where such languages are available For instance build scripts (our f ) are written indomain-specic languages that support (coarse-grained) incremental builds Database querylanguages also have often support for incrementalization

bull Programmers can attempt using general-purpose techniques for incrementalizing programssuch as self-adjusting computation and variants such as Adapton Self-adjusting computationapplies to arbitrary purely functional programs and has been extended to imperative programshowever it only guarantees ecient incrementalization when applied to base programs thatare designed for ecient incrementalization Nevertheless self-adjusting computation enabledincrementalizing programs that had never been incrementalized by hand before

No approach guarantees automatic ecient incrementalization for arbitrary programs Wepropose instead to design domain-specic languages (DSLs) that can be eciently incrementalizedthat we call incremental DSLs (IDSLs)

To incrementalize IDSL programs we use a transformation that we call (nite) dierentiationDierentiation produces programs in the same language called derivatives that can be optimizedfurther and compiled to ecient code Derivatives represent changes to values through furthervalues that we call simply changes

56 Chapter 10 Introduction to dierentiation

For primitives IDSL designers must specify the result of dierentiation IDSL designers are tochoose primitives that encapsulate eciently incrementalizable computation schemes while IDSLusers are to express their computation using the primitives provided by the IDSL

Helping IDSL designers to incrementalize primitives automatically is a desirable goal thoughone that we leave open In our setting incrementalizing primitives becomes a problem of programsynthesis and we agree with Shah and Bodik [2017] that it should be treated as such Among othersLiu [2000] develops a systematic approach to this synthesis problem for rst-order programs basedon equational reasoning but it is unclear how scalable this approach is We provide foundations forusing equational reasoning and sketch an IDSL for handling dierent sorts of collections We alsodiscuss avenues at providing language plugins for more fundamental primitives such as algebraicdatatypes with structural recursion

In the IDSLs we consider similarly to database languages we use primitives for high-leveloperations of complexity similar to SQL operators On the one hand IDSL designers wish to designfew general primitives to limit the eort needed for manual incrementalization On the other handoverly general primitives can be harder to incrementalize eciently Nevertheless we also providesome support for more general building blocks such as product or sum types and even (in somesituations) recursive types Other approaches provide more support for incrementalizing primitivesbut even then ensuring ecient incrementalization is not necessarily easy Dynamic approachesto incrementalization are most powerful they can nd work that can be reused at runtime usingmemoization as long as the computation is structured so that memoization matches will occurMoreover for some programs it seems more ecient to detect that some output can be reusedthanks to a description of the input changes rather than through runtime detection

We propose that IDSLs be higher-order so that primitives can be parameterized over functionsand hence highly exible and purely functional to enable more powerful optimizations both beforeand after dierentiation Hence an incremental DSL is a higher-order purely functional languagecomposed of a λ-calculus core extended with base types and primitives Various database querylanguages support forms of nite dierentiation (see Sec 1921) but only over rst-order languageswhich provide only restricted forms of operators such as map lter or aggregation

To support higher-order IDSLs we dene the rst form of dierentiation that supports higher-order functional languages to this end we introduce the concept of function changes which containchanges to either the code of a function value or the values it closes over While higher-orderprograms can be transformed to rst-order ones incrementalizing resulting programs is still beyondreach for previous approaches to dierentiation (see Sec 1921 for earlier work and Sec 1922 forlater approaches) In Chapter 17 and Appendix D we transform higher-order programs to rst-orderones by closure conversion or defunctionalization but we incrementalize defunctionalized programsusing similar ideas including changes to (defunctionalized) functions

Our primary example will be DSLs for operations on collections as discussed earlier (Chapter 2)we favor higher-order collection DSLs over relational databases

We build our incremental DSLs based on simply-typed λ-calculus (STLC) extended with languageplugins to dene the domain-specic parts as discussed in Appendix A2 and summarized in Fig A1We call our approach ILC for Incremental Lambda Calculus

The rest of this chapter is organized as follows In Sec 101 we explain that dierentiationgeneralizes the calculus of nite dierences a relative of dierential calculus In Sec 102 we showa motivating example for our approach In Sec 103 we introduce informally the concept of changesas values and in Sec 104 we introduce changes to functions In Sec 105 we dene dierentiationand motivate it informally In Sec 106 we apply dierentiation to our motivating example

Correctness of ILC is far from obvious In Chapters 12 and 13 we introduce a formal theory ofchanges and we use it to formalize dierentiation and prove it correct

Chapter 10 Introduction to dierentiation 57

101 Generalizing the calculus of nite dierencesOur theory of changes generalizes an existing eld of mathematics called the calculus of nitedierence If f is a real function one can dene its nite dierence that is a function ∆f such that∆f a da = f (a + da) minus f a Readers might notice the similarity (and the dierences) between thenite dierence and the derivative of f since the latter is dened as

f prime(a) = limdararr0

f (a + da) minus f (a)

The calculus of nite dierences helps computing a closed formula for ∆f given a closed formulafor f For instance if function f is dened by f x = 2 middot x one can prove its nite dierence is∆f x dx = 2 middot (x + dx) minus 2 middot x = 2 middot dx

Finite dierences are helpful for incrementalization because they allow computing functionson updated inputs based on results on base inputs if we know how inputs change Take againfor instance f x = 2 middot x if x is a base input and x + dx is an updated input we can computef (x + dx) = f x + ∆f x dx If we already computed y = f x and reuse the result we can computef (x + dx) = y + ∆f x Here the input change is dx and the output change is ∆f x dx

However the calculus of nite dierences is usually dened for real functions Since it is basedon operators + and minus it can be directly extended to commutative groups Incrementalization basedon nite dierences for groups and rst-order programs has already been researched [Paige andKoenig 1982 Gluche et al 1997] most recently and spectacularly with DBToaster [Koch 2010Koch et al 2016]

But it is not immediate how to generalize nite dierencing beyond groups And many usefultypes do not form a group for instance lists of integers donrsquot form a group but only a monoidMoreover itrsquos hard to represent list changes simply through a list how do we specify which elementswere inserted (and where) which were removed and which were subjected to change themselves

In ILC we generalize the calculus of nite dierences by using distinct types for base valuesand changes and adapting the surrounding theory ILC generalizes operators + and minus as operatorsoplus (pronounced ldquooplusrdquo or ldquoupdaterdquo) and (pronounced ldquoominusrdquo or ldquodierencerdquo) We show howILC subsumes groups in Sec 1311

102 A motivating exampleIn this section we illustrate informally incrementalization on a small example

In the following program grandTotal xs ys sums integer numbers in input collections xs and ys

grandTotal Bag Zrarr Bag Zrarr Zs ZgrandTotal xs ys = sum (merge xs ys)s = grandTotal xs ys

This program computes output s from input collections xs and ys These collections are multisetsor bags that is collections that are unordered (like sets) where elements are allowed to appear morethan once (unlike sets) In this example we assume a language plugin that supports a base type ofintegers Z and a family of base types of bags Bag τ for any type τ

We can run this program on specic inputs xs1 = 1 2 3 and ys1 = 4 to obtain outputs1 Here double braces denote a bag containing the elements among the double braces

s1 = grandTotal xs1 ys1= sum 1 2 3 4 = 10

This example uses small inputs for simplicity but in practice they are typically much bigger weuse n to denote the input size In this case the asymptotic complexity of computing s is Θ(n)

Consider computing updated output s2 from updated inputs xs2 = 1 1 2 3 and ys2 = 4 5 We could recompute s2 from scratch as

s2 = grandTotal xs2 ys2= sum 1 1 2 3 4 5 = 16

But if the size of the updated inputs is Θ(n) recomputation also takes time Θ(n) and we would liketo obtain our result asymptotically faster

To compute the updated output s2 faster we assume the changes to the inputs have a descriptionof size dn that is asymptotically smaller than the input size n that is dn = o(n) All approaches toincrementalization require small input changes Incremental computation will then process theinput changes rather than just the new inputs

103 Introducing changesTo talk about how the dierences between old values and new values we introduce a few conceptsfor now without full denitions In our approach to incrementalization we describe changes tovalues as values themselves We call such descriptions simply changes Incremental programsexamine changes to inputs to understand how to produce changes to outputs Just like in STLCwe have terms (programs) that evaluates to values we also have change terms which evaluate tochange values We require that going from old values to new values preserves types That is if anold value v1 has type τ then also its corresponding new value v2 must have type τ To each typeτ we associate a type of changes or change type ∆τ a change between v1 and v2 must be a valueof type ∆τ Similarly environments can change to typing context Γ we associate change typingcontexts ∆Γ such that we can have an environment change dρ n∆Γ o from ρ1 n Γ o to ρ2 n Γ o

Not all descriptions of changes are meaningful so we also talk about valid changes Validchanges satisfy additional invariants that are useful during incrementalization A change value dvcan be a valid change from v1 to v2 We can also consider a valid change as an edge from v1 to v2 ina graph associated to τ (where the vertexes are values of type τ ) and we call v1 the source of dv andv2 the destination of dv We only talk of source and destination for valid changes so a change fromv1 to v2 is (implicitly) valid Wersquoll discuss examples of valid and invalid changes in Examples 1031and 1032

We also introduce an operator oplus on values and changes if dv is a valid change from v1 to v2then v1 oplus dv (read as ldquov1 updated by dvrdquo) is guaranteed to return v2 If dv is not a valid change fromv1 then v1 oplus dv can be dened to some arbitrary value or not without any eect on correctness Inpractice if oplus detects an invalid input change it can trigger an error or return a dummy value in ourformalization we assume for simplicity that oplus is total Again if dv is not valid from v1 to v1 oplus dvthen we do not talk of the source and destination of dv

We also introduce operator given two values v1 v2 for the same type v2 v1 is a valid changefrom v1 to v2

Finally we introduce change composition if dv1 is a valid change from v1 to v2 and dv2 is avalid change from v2 to v3 then dv1 dv2 is a valid change from v1 to v3

Change operators are overloaded over dierent types Coherent denitions of validity and ofoperators oplus and for a type τ form a change structure over values of type τ (Denition 1311)For each type τ wersquoll dene a change structure (Denition 1342) and operators will have typesoplus τ rarr ∆τ rarr τ τ rarr τ rarr ∆τ ∆τ rarr ∆τ rarr ∆τ

Example 1031 (Changes on integers and bags)To show how incrementalization aects our example we next describe valid changes for integersand bags For now a change das to a bag as1 simply contains all elements to be added to the initialbag as1 to obtain the updated bag as2 (wersquoll ignore removing elements for this section and discussit later) In our example the change from xs1 (that is 1 2 3 ) to xs2 (that is 1 1 2 3 ) isdxs = 1 while the change from ys1 (that is 4 ) to ys2 (that is 4 5 ) is dys = 5 To represent the output change ds from s1 to s2 we need integer changes For now we representinteger changes as integers and dene oplus on integers as addition v1 oplus dv = v1 + dv

For both bags and integers a change dv is always valid between v1 and v2 = v1 oplus dv for otherchanges however validity will be more restrictive

Example 1032 (Changes on naturals)For instance say we want to dene changes on a type of natural numbers and we still want to havev1 oplus dv = v1 + dv A change from 3 to 2 should still be minus1 so the type of changes must be Z Butthe result of oplus should still be a natural that is an integer gt 0 to ensure that v1 oplus dv gt 0 we need torequire that dv gt minusv1 We use this requirement to dene validity on naturals dv B v1 rarr v1+dv Nis dened as equivalent to dv gt minusv1 We can guarantee equation v1 oplus dv = v1 + dv not for allchanges but only for valid changes Conversely if a change dv is invalid for v1 then v1 + dv lt 0We then dene v1 oplus dv to be 0 though any other denition on invalid changes would work1

1031 Incrementalizing with changesAfter introducing changes and related notions we describe how we incrementalize our exampleprogram

We consider again the scenario of Sec 102 we need to compute the updated output s2 theresult of calling our program grandTotal on updated inputs xs2 and ys2 And we have the initialoutput s1 from calling our program on initial inputs xs1 and ys1 In this scenario we can computes2 non-incrementally by calling grandTotal on the updated inputs but we would like to obtain thesame result faster Hence we compute s2 incrementally that is we rst compute the output changeds from s1 to s2 then we update the old output s1 by change ds Successful incremental computationmust compute the correct s2 asymptotically faster than non-incremental computation This speedupis possible because we take advantage of the computation already done to compute s1

To compute the output change ds from s1 to s2 we propose to transform our base programgrandTotal to a new program dgrandTotal that we call the derivative of grandTotal to computeds we call dgrandTotal on initial inputs and their respective changes Unlike other approaches toincrementalization dgrandTotal is a regular program in the same language as grandTotal hencecan be further optimized with existing technology

Below we give the code for dgrandTotal and show that in this example incremental computationcomputes s2 correctly

For ease of reference we recall inputs changes and outputs

xs1 = 1 2 3 dxs = 1 xs2 = 1 1 2 3 ys1 = 4 dys = 5

1In fact we could leave oplus undened on invalid changes Our original presentation [Cai et al 2014] in essence restrictedoplus to valid changes through dependent types by ensuring that applying it to invalid changes would be ill-typed LaterHuesca [2015] in similar developments simply made oplus partial on its domain instead of restricting the domain achievingsimilar results

ys2 = 4 5 s1 = grandTotal xs1 ys1= 10

s2 = grandTotal xs2 ys2= 16

Incremental computation uses the following denitions to compute s2 correctly and with fewersteps as desired

dgrandTotal xs dxs ys dys = sum (merge dxs dys)ds = dgrandTotal xs1 dxs ys1 dys =

= sum 1 5 = 6s2 = s1 oplus ds = s1 + ds

= 10 + 6 = 16

Incremental computation should be asymptotically faster than non-incremental computationhence the derivative we run should be asymptotically faster than the base program Here derivativedgrandTotal is faster simply because it ignores initial inputs altogether Therefore its time complexitydepends only on the total size of changes dn In particular the complexity of dgrandTotal isΘ(dn) = o(n)

We generate derivatives through a program transformation from terms to terms which we calldierentiation (or sometimes simply derivation) We write D n t o for the result of dierentiatingterm t We apply D n ndash o on terms of our non-incremental programs or base terms such as grandTotalTo dene dierentiation we assume that we already have derivatives for primitive functions theyuse we discuss later how to write such derivatives by hand

We dene dierentiation in Denition 1051 some readers might prefer to peek ahead but weprefer to rst explain what dierentiation is supposed to do

A derivative of a function can be applied to initial inputs and changes from initial inputs toupdated inputs and returns a change from an initial output to an updated output For instancetake derivative dgrandTotal initial inputs xs1 and ys1 and changes dxs and dys from initial inputsto updated inputs Then change dgrandTotal xs1 dxs ys1 dys that is ds goes from initial outputgrandTotal xs1 ys1 that is s1 to updated output grandTotal xs2 ys2 that is s2 And because ds goesfrom s1 to s2 it follows as a corollary that s2 = s1 oplus ds Hence we can compute s2 incrementallythrough s1 oplus ds as we have shown rather than by evaluating grandTotal xs2 ys2

We often just say that a derivative of function f maps changes to the inputs of f to changes tothe outputs of f leaving the initial inputs implicit In short

Slogan 1033Term D n t o maps input changes to output changes That is D n t o applied to initial base inputs andvalid input changes (from initial inputs to updated inputs) gives a valid output change from t appliedon old inputs to t applied on new inputs

For a generic unary function f Ararr B the behavior of D n f o can be described as

f a2 f a1 oplus D n f o a1 da (101)

or asf (a1 oplus da) f a1 oplus D n f o a1 da (102)

where da is a metavariable standing for a valid change from a1 to a2 (with a1 a2 A) and where denotes denotational equivalence (Denition A25) Moreover D n f o a1 da is also a valid changeand can be hence used as an argument for operations that require valid changes These equations

follow from Theorem 1222 and Corollary 1345 we iron out the few remaining details to obtainthese equations in Sec 1412

In our example we have applied D n ndash o to grandTotal and simplify the result via β-reduction toproduce dgrandTotal as we show in Sec 106 Correctness of D n ndash o guarantees that sum (merge dxs dys)evaluates to a change from sum (merge xs ys) evaluated on old inputs xs1 and ys1 to sum (merge xs ys)evaluated on new inputs xs2 and ys2

In this section we have sketched the meaning of dierentiation informally We discuss incre-mentalization on higher-order terms in Sec 104 and actually dene dierentiation in Sec 105

104 Dierentiation on open terms and functionsWe have shown that applying D n ndash o on closed functions produces their derivatives HoweverD n ndash o is dened for all terms hence also for open terms and for non-function types

Open terms Γ ` t τ are evaluated with respect to an environment for Γ and when thisenvironment changes the result of t changes as well D n t o computes the change to trsquos output If Γ `t τ evaluating term D n t o requires as input a change environment dρ n∆Γ o containing changesfrom the initial environment ρ1 n Γ o to the updated environment ρ2 n Γ o The (environment)input change dρ is mapped by D n t o to output change dv = nD n t o o dρ a change from initialoutput n t o ρ1 to updated output n t o ρ2 If t is a function dv maps in turn changes to the functionarguments to changes to the function result All this behavior again follows our slogan

Environment changes contains changes for each variable in Γ More precisely if variable xappears with type τ in Γ and hence in ρ1 ρ2 then dx appears with type ∆τ in ∆Γ and hence in dρMoreover dρ extends ρ1 to provide D n t o with initial inputs and not just with their changes

The two environments ρ1 and ρ2 can share entries mdash in particular environment change dρ canbe a nil change from ρ to ρ For instance Γ can be empty then ρ1 and ρ2 are also empty (since theymatch Γ) and equal so dρ is a nil change Alternatively some or all the change entries in dρ can benil changes Whenever dρ is a nil change nD n t o o dρ is also a nil change

If t is a function D n t o will be a function change Changes to functions in turn map inputchanges to output changes following our Slogan 1033 If a change df from f1 to f2 is applied (viadf a1 da) to an input change da from a1 to a2 then df will produce a change dv = df a1 da fromv1 = f1 a1 to v2 = f2 a2 The denition of function changes is recursive on types that is dv can inturn be a function change mapping input changes to output changes

Derivatives are a special case of function changes a derivative df is simply a change from f to fitself which maps input changes da from a1 to a2 to output changes dv = df a1 da from f a1 to f a2This denition coincides with the earlier denition of derivatives and it also coincides with thedenition of function changes for the special case where f1 = f2 = f That is why D n t o producesderivatives if t is a closed function term we can only evaluate D n t o against an nil environmentchange producing a nil function change

Since the concept of function changes can be surprising we examine it more closely next

1041 Producing function changesA rst-class function can close over free variables that can change hence functions values themselvescan change hence we introduce function changes to describe these changes

For instance term tf = λx rarr x + y is a function that closes over y so dierent values v for ygive rise to dierent values for f =

tf(y = v) Take a change dv from v1 = 5 to v2 = 6 dierent

2Nitpick if da is read as an object variable denotational equivalence will detect that these terms are not equivalent ifda maps to an invalid change Hence we said that da is a metavariable Later we dene denotational equivalence for validchanges (Denition 1412) which gives a less cumbersome way to state such equations

inputs v1 and v2 for y give rise to dierent outputs f1 =tf(y = v1) and f2 =

tf(y = v2) We

describe the dierence between outputs f1 and f2 through a function change df from f1 to f2Consider again Slogan 1033 and how it applies to term f

Since y is free in tf the value for y is an input of tf So continuing our example dtf = Dtf

must map a valid input change dv from v1 to v2 for variable y to a valid output change df from f1 tof2 more precisely we must have df =

(y = v1 dy = dv)

1042 Consuming function changesFunction changes can not only be produced but also be consumed in programs obtained from D n ndash oWe discuss next how

As discussed we consider the value for y as an input to tf = λx rarr x + y However we alsochoose to consider the argument for x as an input (of a dierent sort) to tf = λx rarr x + y andwe require our Slogan 1033 to apply to input x too While this might sound surprising it worksout well Specically since df =

is a change from f1 to f2 we require df a1 da to be achange from f1 a1 to f2 a2 so df maps base input a1 and input change da to output change df a1 dasatisfying the slogan

More in general any valid function change df from f1 to f2 (where f1 f2 nσ rarr τ o) must inturn be a function that takes an input a1 and a change da valid from a1 to a2 to a valid changedf a1 da from f1 a1 to f2 a2

This way to satisfy our slogan on application t = tf x we can simply dene D n ndash o so thatDtf x

tfx dx Then

(y = v1 dy = dv x = a1 dx = da) =

a1 da = df a1 da

As required thatrsquos a change from f1 a1 to f2 a2Overall valid function changes preserve validity just like D n t o in Slogan 1033 and map valid

input changes to valid output changes In turn output changes can be function changes since theyare valid they in turn map valid changes to their inputs to valid output changes (as wersquoll see inLemma 12110) Wersquoll later formalize this and dene validity by recursion on types that is as alogical relation (see Sec 1214)

1043 Pointwise function changesIt might seem more natural to describe a function change df prime between f1 and f2 via df prime x = λx rarrf2 x f1 x We call such a df prime a pointwise change While some might nd pointwise changes amore natural concept the output of dierentiation needs often to compute f2 x2 f1 x1 using achange df from f1 to f2 and a change dx from x1 to x2 Function changes perform this job directlyWe discuss this point further in Sec 153

1044 Passing change targetsIt would be more symmetric to make function changes also take updated input a2 that is havedf a1 da a2 computes a change from f1 a1 to f2 a2 However passing a2 explicitly adds noinformation the value a2 can be computed from a1 and da as a1 oplus da Indeed in various cases

a function change can compute its required output without actually computing a1 oplus da Sincewe expect the size of a1 and a2 is asymptotically larger than da actually computing a2 could beexpensive3 Hence we stick to our asymmetric form of function changes

105 Dierentiation informallyNext we dene dierentiation and explain informally why it does what we want We then give anexample of how dierentiation applies to our example A formal proof will follow soon in Sec 122justifying more formally why this denition is correct but we proceed more gently

Denition 1051 (Dierentiation)Dierentiation is the following term transformation

D n λ(x σ ) rarr t o = λ(x σ ) (dx ∆σ ) rarr D n t oD n s t o = D n s o t D n t o

D n x o = dx

D n c o = DC n c o

where DC n c o denes dierentiation on primitives and is provided by language plugins (seeAppendix A25) and dx stands for a variable generated by prexing xrsquos name with d so thatD n y o = dy and so on

If we extend the language with (non-recursive) let-bindings we can give derived rules for itsuch as

D n let x = t1 in t2 o = let x = t1dx = D n t1 o

in D n t2 oIn Sec 151 we will explain that the same transformation rules apply for recursive let-bindings

If t contains occurrences of both (say) x and dx capture issues arise in D n t o We defer theseissues to Sec 1234 and assume throughout that such issues can be avoided by α-renaming and theBarendregt convention [Barendregt 1984]

This transformation might seem deceptively simple Indeed pure λ-calculus only handlesbinding and higher-order functions leaving ldquoreal workrdquo to primitives Similarly our transformationincrementalizes binding and higher-order functions leaving ldquoreal workrdquo to derivatives of primitivesHowever our support of λ-calculus allows to glue primitives together Wersquoll discuss later how weadd support to various primitives and families of primitives

Now we try to motivate the transformation informally We claimed that D n ndash o must satisfySlogan 1033 which reads

Letrsquos analyze the denition of D n ndash o by case analysis of input term u In each case we assumethat our slogan applies to any subterms of u and sketch why it applies to u itself

3We show later ecient change structures where oplus reuses part of a1 to output a2 in logarithmic time

bull if u = x by our slogan D n x o must evaluate to the change of x when inputs change so weset D n x o = dx

bull if u = c we simply delegate dierentiation to DC n c o which is dened by plugins Sinceplugins can dene arbitrary primitives they need provide their derivatives

bull if u = λx rarr t then u introduces a function Assume for simplicity that u is a closed termThen D n t o evaluates to the change of the result of this function u evaluated in a contextbinding x and its change dx Then because of how function changes are dened the changeof u is the change of output t as a function of the base input x and its change dx that isD n u o = λx dx rarr D n t o

bull if u = s t then s is a function Assume for simplicity that u is a closed term Then D n s oevaluates to the change of s as a function of D n s orsquos base input and input change So weapply D n s o to its actual base input t and actual input change D n t o and obtain D n s t o =D n s o t D n t o

This is not quite a correct proof sketch because of many issues but we x these issues with ourformal treatment in Sec 122 In particular in the case for abstraction u = λx rarr t D n t o dependsnot only on x and dx but also on other free variables of u and their changes Similarly we mustdeal with free variables also in the case for application u = s t But rst we apply dierentiation toour earlier example

106 Dierentiation on our exampleTo exemplify the behavior of dierentiation concretely and help x ideas for later discussion inthis section we show how the derivative of grandTotal looks like

grandTotal = λxs ysrarr sum (merge xs ys)s = grandTotal 1 2 3 4 = 11

Dierentiation is a structurally recursive program transformation so we rst compute D nmerge xs ys oTo compute its change we simply call the derivative of merge that is dmerge and apply it to thebase inputs and their changes hence we write

D nmerge xs ys o = dmerge xs dxs ys dys

As wersquoll better see later we can dene function dmerge as

dmerge = λxs dxs ys dysrarr merge dxs dys

so D nmerge xs ys o can be simplied by β-reduction to merge dxs dys

D nmerge xs ys o= dmerge xs dxs ys dys=β (λxs dxs ys dysrarr merge dxs dys) xs dxs ys dys=β merge dxs dys

Letrsquos next derive sum (merge xs ys) First like above the derivative of sum zs would bedsum zs dzs which depends on base input zs and its change dzs As wersquoll see dsum zs dzs cansimply call sum on dzs so dsum zs dzs = sum dzs To derive sum (merge xs ys) we must call

the derivative of sum that is dsum on its base argument and its change so on merge xs ys andD nmerge xs ys o We can later simplify again by β-reduction and obtain

D n sum (merge xs ys) o= dsum (merge xs ys) D nmerge xs ys o=β sum D nmerge xs ys o= sum (dmerge xs dxs ys dys)=β sum (merge dxs dys)

Here we see the output of dierentiation is dened in a bigger typing context while merge xs ysonly depends on base inputs xs and ys D nmerge xs ys o also depends on their changes Thisproperty extends beyond the examples we just saw if a term t is dened in context Γ then theoutput of derivation D n t o is dened in context Γ∆Γ where ∆Γ is a context that binds a changedx for each base input x bound in the context Γ

Next we consider λxs ysrarr sum (merge xs ys) Since xs dxs ys dys are free in D n sum (merge xs ys) o(ignoring later optimizations) term

D n λxs ysrarr sum (merge xs ys) omust bind all those variables

D n λxs ysrarr sum (merge xs ys) o= λxs dxs ys dysrarr D n sum (merge xs ys) o=β λxs dxs ys dysrarr sum (merge dxs dys)

Next we need to transform the binding of grandTotal2 to its body b = λxs ysrarr sum (merge xs ys)We copy this binding and add a new additional binding from dgrandTotal2 to the derivative of b

grandTotal = λxs ys rarr sum (merge xs ys)dgrandTotal = λxs dxs ys dysrarr sum (merge dxs dys)

Finally we need to transform the binding of output and its body By iterating similar steps inthe end we get

grandTotal = λxs ys rarr sum (merge xs ys)dgrandTotal = λxs dxs ys dysrarr sum (merge dxs dys)s = grandTotal 1 2 3 4 ds = dgrandTotal 1 2 3 1 4 5

=β sum (merge 1 5 )

Self-maintainability Dierentiation does not always produce ecient derivatives without fur-ther program transformations in particular derivatives might need to recompute results producedby the base program In the above example if we donrsquot inline derivatives and use β-reduction tosimplify programs D n sum (merge xs ys) o is just dsum (merge xs ys) D nmerge xs ys o A directexecution of this program will compute merge xs ys which would waste time linear in the baseinputs

Wersquoll show how to avoid such recomputations in general in Sec 172 but here we can avoidcomputing merge xs ys simply because dsum does not use its base argument that is it is self-maintainable Without the approach described in Chapter 17 we are restricted to self-maintainablederivatives

107 Conclusion and contributionsIn this chapter we have seen how a correct dierentiation transform allows us to incrementalizeprograms

1071 Navigating this thesis partDierentiation This chapter and Chapters 11 to 14 form the core of incrementalization theorywith other chapters building on top of them We study incrementalization for STLC we summarizeour formalization of STLC in Appendix A Together with Chapter 10 Chapter 11 introduces theoverall approach to incrementalization informally Chapters 12 and 13 show that incrementalizationusing ILC is correct Building on a set-theoretic semantics these chapters also develop the theoryunderlying this correctness proofs and further results Equational reasoning on terms is thendeveloped in Chapter 14

Chapters 12 to 14 contain full formal proofs readers are welcome to skip or skim those proofswhere appropriate For ease of reference and to help navigation and skimming these highlightand number all denitions notations theorem statements and so on and we strive not to hideimportant denitions inside theorems

Later chapters build on this core but are again independent from each other

Extensions and theoretical discussion Chapter 15 discusses a few assorted aspects of thetheory that do not t elsewhere and do not suce for standalone chapters We show how todierentiation general recursion Sec 151 we exhibit a function change that is not valid for anyfunction (Sec 152) we contrast our representation of function changes with pointwise functionchanges (Sec 153) and we compare our formalization with the one presented in [Cai et al 2014](Sec 154)

Performance of dierentiated programs Chapter 16 studies how to apply dierentiation(as introduced in previous chapters) to incrementalize a case study and empirically evaluatesperformance speedups

Dierentiation with cache-transfer-style conversion Chapter 17 is self-contained eventhough it builds on the rest of the material it summarizes the basics of ILC in Sec 1721 Terminologyin that chapter is sometimes subtly dierent from the rest of the thesis

Towards dierentiation for System F Chapter 18 outlines how to extend dierentiation toSystem F and suggests a proof avenue Dierentiation for System F requires a generalization ofchange structures to changes across dierent types (Sec 184) that appears of independent interestthough further research remains to be done

Related work conclusions and appendixes Finally Sec 176 and Chapter 19 discuss relatedwork and Chapter 20 concludes this part of the thesis

1072 ContributionsIn Chapters 11 to 14 and Chapter 16 we make the following contributions

bull We present a novel mathematical theory of changes and derivatives which is more generalthan other work in the eld because changes are rst-class entities they are distinct frombase values and they are dened also for functions

bull We present the rst approach to incremental computation for pure λ-calculi by a source-to-source transformation D that requires no run-time support The transformation producesan incremental program in the same language all optimization techniques for the originalprogram are applicable to the incremental program as well

bull We prove that our incrementalizing transformation D is correct by a novel machine-checkedlogical relation proof mechanized in Agda

bull While we focus mainly on the theory of changes and derivatives we also perform a per-formance case study We implement the derivation transformation in Scala with a plug-inarchitecture that can be extended with new base types and primitives We dene a pluginwith support for dierent collection types and use the plugin to incrementalize a variant ofthe MapReduce programming model [Laumlmmel 2007] Benchmarks show that on this programincrementalization can reduce asymptotic complexity and can turn O(n) performance intoO(1) improving running time by over 4 orders of magnitude on realistic inputs (Chapter 16)

In Chapter 17 we make the following contributions

bull via examples we motivate extending ILC to remember intermediate results (Sec 172)

bull we give a novel proof of correctness for ILC for untyped λ-calculus based on step-indexedlogical relations (Sec 1733)

bull building on top of ILC-style dierentiation we show how to transform untyped higher-orderprograms to cache-transfer-style (CTS) (Sec 1735)

bull we show through formal proofs that programs and derivatives in cache-transfer style simulatecorrectly their non-CTS variants (Sec 1736)

bull we perform performance case studies (in Sec 174) applying (by hand) extension of thistechnique to Haskell programs and incrementalize eciently also programs that do not admitself-maintainable derivatives

Chapter 18 describes how to extend dierentiation to System F To this end we extend changestructure to allow from changes where source and destination have dierent types and enabledening more powerful combinators for change structures to be more powerful While the resultsin this chapter call for further research we consider them exciting

Appendix C proposes the rst correctness proofs for ILC via operational methods and (step-indexed) logical relations for simply-typed λ-calculus (without and with general recursion) andfor untyped λ-calculus A later variant of this proof adapted for use of cache-transfer-style ispresented in Chapter 17 but we believe the presentation in Appendix C might also be interesting fortheorists as it explains how to extend ILC correctness proofs with fewer extraneous complicationsNevertheless given the similarity between proofs in Appendix C and Chapter 17 we relegate thelatter one to an appendix

Finally Appendix D shows how to implement all change operations on function changeseciently by defunctionalizing functions and function changes rather than using mere closureconversion

Chapter 11

A tour of dierentiation examples

Before formalizing ILC we show more example of change structures and primitives to show (a)designs for reusable primitives and their derivatives (b) to what extent we can incrementalize basicbuilding blocks such as recursive functions and algebraic data types and (c) to sketch how we canincrementalize collections eciently We make no attempt at incrementalizing a complete collectionAPI here we discuss briey more complete implementations in Chapter 16 and Sec 174

To describe these examples informally we use Haskell notation and let polymorphism asappropriate (see Appendix A2)

We also motivate a few extensions to dierentiation that we describe later As wersquoll see inthis chapter wersquoll need to enable some forms of introspection on function changes to manipulatethe embedded environments as we discuss in Appendix D We will also need ways to rememberintermediate results which we will discuss in Chapter 17 We will also use overly simplied changestructures to illustrate a few points

111 Change structures as type-class instances

We encode change structures as sketched earlier in Sec 103 through a type class namedChangeStructAn instance ChangeStruct t denes a change type ∆t as an associated type and operations oplus and are dened as methods We also dene method oreplace such that oreplace v2 produces areplacement change from any source to v2 by default v2 v1 is simply an alias for oreplace v2

class ChangeStruct t wheretype ∆t(oplus) t rarr ∆t rarr toreplace t rarr ∆t() t rarr t rarr ∆tv2 v1 = oreplace v2() ∆t rarr ∆t rarr ∆t0 t rarr ∆t

In this chapter we will often show change structures where only some methods are dened inactual implementations we use a type class hierarchy to encode what operations are available butwe collapse this hierarchy here to simplify presentation

70 Chapter 11 A tour of dierentiation examples

112 How to design a language pluginWhen adding support for a datatype T we will strive to dene both a change structure and derivativesof introduction and elimination forms for T since such forms constitute a complete API for usingthat datatype However we will sometimes have to restrict elimination forms to scenarios that canbe incrementalized eciently

In general to dierentiate a primitive f Ararr B once we have dened a change structure for Awe can start by dening

df a1 da = f (a1 oplus da) f a1 (111)

where da is a valid change from a1 to a2 We then try to simplify and rewrite the expression usingequational reasoning so that it does not refer to any more as far as possible We can assume thatall argument changes are valid especially if that allows producing faster derivatives we formalizeequational reasoning for valid changes in Sec 1411 In fact instead of dening and simplifyingf a2 f a1 to not use it it is sucient to produce a change from f a1 to f a2 even a dierent oneWe write da1 =∆ da2 to mean that changes da1 and da2 are equivalent that is they have the samesource and destination We dene this concept properly in Sec 142

We try to avoid running on arguments of non-constant size since it might easily take timelinear or superlinear in the argument sizes if produces replacement values it completes inconstant time but derivatives invoked on the produced changes are not ecient

113 Incrementalizing a collection APIIn this section we describe a collection API that we incrementalize (partially) in this chapter

To avoid notation conicts we represent lists via datatype List a dened as follows

data List a = Nil | Cons a (List a)

We also consider as primitive operation a standard mapping function map We also support tworestricted forms of aggregation (a) folding over an abelian group via fold similar to how one usuallyfolds over a monoid1 (b) list concatenation via concat We will not discuss how to dierentiateconcat as we reuse existing solutions by Firsov and Jeltsch [2016]

singleton ararr List asingleton x = Cons x Nilmap (ararr b) rarr List ararr List bmap f Nil = Nilmap f (Cons x xs) = Cons (f x) (map f xs)fold AbelianGroupChangeStruct brArr List brarr bfold Nil = memptyfold (Cons x xs) = x fold xs -- Where is inx for mappendconcat List (List a) rarr List aconcat =

While usually fold requires only an instance Monoid b of type class Monoid to aggregate collectionelements our variant of fold requires an instance of type class GroupChangeStruct a subclass ofMonoid This type class is not used by fold itself but only by its derivative as we explain inSec 1132 nevertheless we add this stronger constraint to fold itself because we forbid derivatives

1hpshackagehaskellorgpackagebase-4910docsData-Foldablehtml

Chapter 11 A tour of dierentiation examples 71

with stronger type-class constraints With this approach all clients of fold can be incrementalizedusing dierentiation

Using those primitives one can dene further higher-order functions on collections such asconcatMap lter foldMap In turn these functions form the kernel of a collection API as studiedfor instance by work on the monoid comprehension calculus [Grust and Scholl 1996a Fegaras andMaier 1995 Fegaras 1999] even if they are not complete by themselves

concatMap (ararr List b) rarr List ararr List bconcatMap f = concat map flter (ararr Bool) rarr List ararr List alter p = concatMap (λx rarr if p x then singleton x else Nil)foldMap AbelianGroupChangeStruct brArr (ararr b) rarr List ararr bfoldMap f = fold map f

In rst-order DSLs such as SQL such functionality must typically be added through separateprimitives (consider for instance lter) while here we can simply dene for instance lter on topof concatMap and incrementalize the resulting denitions using dierentiation

Function lter uses conditionals which we havenrsquot discussed yet we show how to incrementalizelter successfully in Sec 116

1131 Changes to type-class instancesIn this whole chapter we assume that type-class instances such as foldrsquos AbelianGroupChangeStructargument do not undergo changes Since type-class instances are closed top-level denitions ofoperations and are canonical for a datatype it is hard to imagine a change to a type-class instanceOn the other hand type-class instances can be encoded as rst-class values We can for instanceimagine a fold taking a unit value and an associative operation as argument In such scenarios oneneeds additional eort to propagate changes to operation arguments similarly to changes to thefunction argument to map

1132 Incrementalizing aggregationLetrsquos now discuss how to incrementalize fold We consider an oversimplied change structure thatallows only two sorts of changes prepending an element to a list or removing the list head of anon-empty list and study how to incrementalize fold for such changes

data ListChange a = Prepend a | Removeinstance ChangeStruct (List a) wheretype ∆(List a) = ListChange axs oplus Prepend x = Cons x xs(Cons x xs) oplus Remove = xsNil oplus Remove = error Invalid change

dfold xs (Prepend x) =

Removing an element from an empty list is an invalid change hence it is safe to give an error inthat scenario as mentioned when introducing oplus (Sec 103)

By using equational reasoning as suggested in Sec 112 starting from Eq (111) one can showformally that dfold xs (Prepend x) should be a change that in a sense ldquoaddsrdquo x to the result usinggroup operations

dfold xs (Prepend x)=∆ fold (xs oplus Prepend x) fold xs= fold (Cons x xs) fold xs= (x fold xs) fold xs

Similarly dfold (Cons x xs) Remove should instead ldquosubtractrdquo x from the result

dfold (Cons x xs) Remove=∆ fold (Cons x xs oplus Remove) fold (Cons x xs)= fold xs fold (Cons x xs)= fold xs (x fold xs)

As discussed using is fast enough on say integers or other primitive types but not in generalTo avoid using we must rewrite its invocation to an equivalent expression In this scenario wecan use group changes for abelian groups and restrict fold to situations where such changes areavailable

dfold AbelianGroupChangeStruct brArr List brarr ∆(List b) rarr ∆bdfold xs (Prepend x) = inject xdfold (Cons x xs) Remove = inject (invert x)dfold Nil Remove = error Invalid change

To support group changes we dene the following type classes to model abelian groups andgroup change structures omitting APIs for more general groups AbelianGroupChangeStruct onlyrequires that group elements of type g can be converted into changes (type ∆g) allowing changetype ∆g to contain other sorts of changes

class Monoid g rArr AbelianGroup g whereinvert g rarr g

class (AbelianGroup aChangeStruct a) rArrAbelianGroupChangeStruct a where

-- Inject group elements into changes Law-- a oplus inject b = a binject ararr ∆a

Chapter 16 discusses how we can use group changes without assuming a single group is denedon elements but here we simply select the canonical group as chosen by type-class resolution Touse a dierent group as usual one denes a dierent but isomorphic type via the Haskell newtypeconstruct As a downside derivatives newtype constructors must convert changes across dierentrepresentations

Rewriting away can also be possible for other specialized folds though sometimes they canbe incrementalized directly for instance map can be written as a fold Incrementalizing map for theinsertion of x into xs requires simplifying map f (Cons x xs) map f xs To avoid we can rewritethis change statically to Insert (f x) indeed we can incrementalize map also for more realisticchange structures

Associative tree folds Other usages of fold over sequences produce result type of small boundedsize (such as integers) In this scenario one can incrementalize the given fold eciently using instead of relying on group operations For such scenarios one can design a primitive

foldMonoid Monoid arArr List arArr a

for associative tree folds that is a function that folds over the input sequence using a monoid (thatis an associative operation with a unit) For ecient incrementalization foldMonoidrsquos intermediateresults should form a balanced tree and updating this tree should take logarithmic time oneapproach to ensure this is to represent the input sequence itself using a balanced tree such as anger tree [Hinze and Paterson 2006]

Various algorithms store intermediate results of folding inside an input balanced tree as describedby Cormen et al [2001 Ch 14] or by Hinze and Paterson [2006] But intermediate results can alsobe stored outside the input tree as is commonly done using self-adjusting computation [Acar 2005Sec 91] or as can be done in our setting While we do not use such folds we describe the existingalgorithms briey and sketch how to integrate them in our setting

Function foldMonoid must record the intermediate results and the derivative dfoldMonoid mustpropagate input changes to aected intermediate results

2 To study time complexity of input change propagation it is useful to consider the dependencygraph of intermediate results in this graph an intermediate result v1 has an arc to intermediate resultv2 if and only if computing v1 depends on v2 To ensure dfoldMonoid is ecient the dependencygraph of intermediate results from foldMonoid must form a balanced tree of logarithmic height sothat changes to a leaf only aect a logarithmic number of intermediate results

In contrast implementing foldMonoid using foldr on a list produces an unbalanced graph ofintermediate results For instance take input list xs = [1 8] containing numbers from 1 to 8 andassume a given monoid Summing them with foldr () mempty xs means evaluating

1 (2 (3 (4 (5 (6 (7 (8 mempty)))))))

Then a change to the last element of input xs aects all intermediate results hence incrementaliza-tion takes at least O(n) In contrast using foldAssoc on xs should evaluate a balanced tree similarto

((1 2) (3 4)) ((5 6) (7 8))

so that any individual change to a leave insertion or deletion only aects O(logn) intermediateresults (where n is the sequence size) Upon modications to the tree one must ensure that thebalancing is stable [Acar 2005 Sec 91] In other words altering the tree (by inserting or removingan element) must only alter O(logn) nodes

We have implemented associative tree folds on very simple but unbalanced tree structures webelieve they could be implemented and incrementalized over balanced trees representing sequencessuch as nger trees or random access zippers [Headley and Hammer 2016] but doing so requirestransforming their implementation of their data structure to cache-transfer style (CTS) (Chapter 17)We leave this for future work together with an automated implementation of CTS transformation

1133 Modifying list elementsIn this section we consider another change structure on lists that allows expressing changes toindividual elements Then we present dmap derivative of map for this change structure Finallywe sketch informally the correctness of dmap which we prove formally in Example 1411

We can then represent changes to a list (List a) as a list of changes (List (∆a)) one for eachelement A list change dxs is valid for source xs if they have the same length and each elementchange is valid for its corresponding element For this change structure we can dene oplus and but not a total such list changes canrsquot express the dierence between two lists of dierentlengths Nevertheless this change structure is sucient to dene derivatives that act correctly

2We discuss in Chapter 17 how base functions communicate results to derivatives

on the changes that can be expressed We can describe this change structure in Haskell using atype-class instance for class ChangeStruct

instance ChangeStruct (List a) wheretype ∆(List a) = List (∆a)Nil oplus Nil = Nil(Cons x xs) oplus (Cons dx dxs) = Cons (x oplus xs) (dx oplus dxs)_ oplus _ = Nil

The following dmap function is a derivative for the standardmap function (included for reference)and the given change structure We discuss derivatives for recursive functions in Sec 151

map (ararr b) rarr List ararr List bmap f Nil = Nilmap f (Cons x xs) = Cons (f x) (map f xs)dmap (ararr b) rarr ∆(ararr b) rarr List ararr ∆List ararr ∆List b

-- A valid list change has the same length as the base listdmap f df Nil Nil = Nildmap f df (Cons x xs) (Cons dx dxs) =

Cons (df x dx) (dmap f df xs dxs)-- Remaining cases deal with invalid changes and a dummy-- result is sucient

dmap f df xs dxs = Nil

Function dmap is a correct derivative of map for this change structure according to Slogan 1033 wesketch an informal argument by induction The equation for dmap f df Nil Nil returns Nil a validchange from initial to updated outputs as required In the equation for dmap f df (Cons x xs) (Cons dx dxs)we compute changes to the head and tail of the result to produce a change from map f (Cons x xs)to map (f oplus df ) (Cons x xs oplus Cons dx dxs) To this end (a) we use df x dx to compute a changeto the head of the result from f x to (f oplus df ) (x oplus dx) (b) we use dmap f df xs dxs recursivelyto compute a change to the tail of the result from map f xs to map (f oplus df ) (xs oplus dxs) (c) weassemble changes to head and tail with Cons into a change from In other words dmap turns inputchanges to output changes correctly according to our Slogan 1033 it is a correct derivative formap according to this change structure We have reasoned informally we formalize this style ofreasoning in Sec 141 Crucially our conclusions only hold if input changes are valid hence termmap f xs oplus dmap f df xs dxs is not denotationally equal to map (f oplus df ) (xs oplus dxs) for arbitrarychange environments these two terms only evaluate to the same result for valid input changes

Since this denition of dmap is a correct derivative we could use it in an incremental DSL forlist manipulation together with other primitives Because of limitations we describe next we willuse instead improved language plugins for sequences

1134 LimitationsWe have shown simplied list changes but they have a few limitations Fixing those requires moresophisticated denitions

As discussed our list changes intentionally forbid changing the length of a list And ourdenition of dmap has further limitations a change to a list of n elements takes size O(n) evenwhen most elements do not change and calling dmap f df on it requires n calls to df This is onlyfaster if df is faster than f but adds no further speedup

We can describe instead a change to an arbitrary list element x in xs by giving the change dxand the position of x in xs A list change is then a sequence of such changes

type ∆(List a) = List (AtomicChange a)data AtomicChange a = Modify Z (∆a)

However fetching the i-th list element still takes time linear in i we need a better representationof sequences In next section we switch to a change structure on sequences dened by ngertrees [Hinze and Paterson 2006] following Firsov and Jeltsch [2016]

114 Ecient sequence changesFirsov and Jeltsch [2016] dene an ecient representation of list changes in a framework similarto ILC and incrementalize selected operations over this change structure They also providecombinators to assemble further operations on top of the provided ones We extend their frameworkto handle function changes and generate derivatives for all functions that can be expressed in termsof the primitives

Conceptually a change for type Sequence a is a sequence of atomic changes Each atomic changeinserts one element at a given position or removes one element or changes an element at oneposition3

data SeqSingleChange a= Insert idx Z x a| Remove idx Z| ChangeAt idx Z dx ∆a

data SeqChange a = Sequence (SeqSingleChange a)type ∆(Sequence a) = SeqChange a

We use Firsov and Jeltschrsquos variant of this change structure in Sec 174

115 ProductsIt is also possible to dene change structures for arbitrary sum and product types and to providederivatives for introduction and elimination forms for such datatypes In this section we discussproducts in the next section sums

We dene a simple change structure for product type A times B from change structures for A and Bsimilar to change structures for environments operations act pointwise on the two components

instance (ChangeStruct aChangeStruct b) rArr ChangeStruct (a b) wheretype ∆(a b) = (∆a∆b)(a b) oplus (da db) = (a oplus da b oplus db)(a2 b2) (a1 b1) = (a2 a1 b2 b1)oreplace (a2 b2) = (oreplace a2 oreplace b2)0ab = (0a 0b)(da1 db1) (da2 db2) = (da1 da2 db1 db2)

Through equational reasoning as in Sec 112 we can also compute derivatives for basic primitiveson product types both the introduction form (that we alias as pair) and the elimination forms fstand snd We just present the resulting denitions

pair a b = (a b)dpair a da b db = (da db)

3Firsov and Jeltsch [2016] and our actual implementation allow changes to multiple elements

fst (a b) = asnd (a b) = bdfst ∆(a b) rarr ∆adfst (da db) = dadsnd ∆(a b) rarr ∆bdsnd (da db) = dbuncurry (ararr brarr c) rarr (a b) rarr cuncurry f (a b) = f a bduncurry (ararr brarr c) rarr ∆(ararr brarr c) rarr (a b) rarr ∆(a b) rarr ∆cduncurry f df (x y) (dx dy) = df x dx y dy

One can also dene n-ary products in a similar way However a product change contains asmany entries as a product

116 Sums pattern matching and conditionalsIn this section we dene change structures for sum types together with the derivative of theirintroduction and elimination forms We also obtain support for booleans (which can be encoded assum type 1 + 1) and conditionals (which can be encoded in terms of elimination for sums) We havemechanically proved correctness of this change structure and derivatives but we do not present thetedious details in this thesis and refer to our Agda formalization

Changes structures for sums are more challenging than ones for products We can dene thembut in many cases we can do better with specialized structures Nevertheless such changes areuseful in some scenarios

In Haskell sum types a+ b are conventionally dened via datatype Either a b with introductionforms Le and Right and elimination form either which will be our primitives

data Either a b = Le a | Right beither (ararr c) rarr (brarr c) rarr Either a brarr ceither f g (Le a) = f aeither f g (Right b) = g b

We can dene the following change structure

data EitherChange a b= LeC (∆a)| RightC (∆b)| EitherReplace (Either a b)

instance (ChangeStruct aChangeStruct b) rArrChangeStruct (Either a b) where

type ∆(Either a b) = EitherChange a bLe a oplus LeC da = Le (a oplus da)Right b oplus RightC db = Right (b oplus db)Le _ oplus RightC _ = error Invalid changeRight _ oplus LeC _ = error Invalid change_ oplus EitherReplace e2 = e2oreplace = EitherReplace0Le a = LeC 0a0Right a = RightC 0a

Changes to a sum value can either keep the same ldquobranchrdquo (left or right) and modify the containedvalue or replace the sum value with another one altogether Specically change LeC da is validfrom Le a1 to Le a2 if da is valid from a1 to a2 Similarly change RightC db is valid from Right b1to Right b2 if db is valid from b1 to b2 Finally replacement change EitherReplace e2 is valid from e1to e2 for any e1

Using Eq (111) we can then obtain denitions for derivatives of primitives Le Right andeither The resulting code is as follows

dLe ararr ∆ararr ∆(Either a b)dLe a da = LeC dadRight brarr ∆brarr ∆(Either a b)dRight b db = RightC dbdeither (NilChangeStruct aNilChangeStruct bDiChangeStruct c) rArr(ararr c) rarr ∆(ararr c) rarr (brarr c) rarr ∆(brarr c) rarrEither a brarr ∆(Either a b) rarr ∆c

deither f df g dg (Le a) (LeC da) = df a dadeither f df g dg (Right b) (RightC db) = dg b dbdeither f df g dg e1 (EitherReplace e2) =either (f oplus df ) (g oplus dg) e2 either f g e1

deither _ _ _ _ _ _ = error Invalid sum change

We show only one case of the derivation of deither as an example

deither f df g dg (Le a) (LeC da)=∆ using variants of Eq (111) for multiple arguments either (f oplus df ) (g oplus dg) (Le a oplus LeC da) either f g (Le a)= simplify oplus either (f oplus df ) (g oplus dg) (Le (a oplus da)) either f g (Le a)= simplify either (f oplus df ) (a oplus da) f a=∆ because df is a valid change for f and da for a df a da

Unfortunately with this change structure a change from Le a1 to Right b2 is simply a replace-ment change so derivatives processing it must recompute results from scratch In general wecannot do better since there need be no shared data between two branches of a datatype We needto nd specialized scenarios where better implementations are possible

Extensions changes for ADTs and future work In some cases we consider changes for typeEither a b where a and b both contain some other type c Take again lists a change from list asto list Cons a2 as should simply say that we prepend a2 to the list In a sense we are just using achange structure from type List a to (a List a) More in general if change das from as1 to as2 issmall a change from list as1 to list Cons a2 as2 should simply ldquosayrdquo that we prepend a2 and that wemodify as1 into as2 and similarly for removals

In Sec 184 we suggest how to construct such change structures based on the concept ofpolymorphic change structure where changes have source and destination of dierent types Basedon initial experiments we believe one could develop these constructions into a powerful combinatorlanguage for change structures In particular it should be possible to build change structures for lists

similar to the ones in Sec 1132 Generalizing beyond lists similar systematic constructions shouldbe able to represent insertions and removals of one-hole contexts [McBride 2001] for arbitraryalgebraic datatypes (ADTs) for ADTs representing balanced data structures such changes couldenable ecient incrementalization in many scenarios However many questions remain open sowe leave this eort for future work

1161 Optimizing lter

In Sec 113 we have dened lter using a conditional and now we have just explained that ingeneral conditionals are inecient This seems pretty unfortunate But luckily we can optimizelter using equational reasoning

Consider again the earlier denition of lter

lter (ararr Bool) rarr List ararr List alter p = concatMap (λx rarr if p x then singleton x else Nil)

As explained we can encode conditionals using either and dierentiate the resulting programHowever if p x changes from True to False or viceversa (that is for all elements x for which dp x dxis a non-nil change) we must compute at runtime However will compare empty list Nil witha singleton list produced by singleton x (in one direction or the other) We can have detect thissituation at runtime But since the implementation of lter is known statically we can optimizethis code at runtime rewriting singleton x Nil to Insert x and Nil singleton x to Remove Toenable this optimization in dlter we need to inline the function that lter passes as argument toconcatMap and all the functions it calls except p Moreover we need to case-split on possible returnvalues for p x and dp x dx We omit the steps because they are both tedious and standard

It appears in principle possible to automate such transformations by adding domain-specicknowledge to a suciently smart compiler though we have made no attempt at an actual implemen-tation It would be rst necessary to investigate further classes of examples where optimizations areapplicable Suciently smart compilers are rare but since our approach produces purely functionalprograms we have access to GHC and HERMIT [Farmer et al 2012] An interesting alternative(which does have some support for side eects) is LMS [Rompf and Odersky 2010] and Delite [Brownet al 2011] We leave further investigation for future work

117 Chapter conclusionIn this chapter we have toured what can and cannot be incrementalized using dierentiation andhow using higher-order functions allows dening generic primitives to incrementalize

Chapter 12

Changes and dierentiationformally

To support incrementalization in this chapter we introduce dierentiation and formally proveit correct That is we prove that nD n t o o produces derivatives As we explain in Sec 1212derivatives transform valid input changes into valid output changes (Slogan 1033) Hence wedene what are valid changes (Sec 121) As wersquoll explain in Sec 1214 validity is a logical relationAs we explain in Sec 1222 our correctness theorem is the fundamental property for the validitylogical relation proved by induction over the structure of terms Crucial denitions or derived factsare summarized in Fig 121 Later in Chapter 13 we study consequences of correctness and changeoperations

All denitions and proofs in this and next chapter is mechanized in Agda except where otherwiseindicated To this writer given these denitions all proofs have become straightforward andunsurprising Nevertheless rst obtaining these proofs took a while So we typically include fullproofs We also believe these proofs clarify the meaning and consequences of our denitions Tomake proofs as accessible as possible we try to provide enough detail that our target readers canfollow along without pencil and paper at the expense of making our proofs look longer than theywould usually be As we target readers procient with STLC (but not necessarily procient withlogical relations) wersquoll still omit routine steps needed to reason on STLC such as typing derivationsor binding issues

121 Changes and validity

In this section we introduce formally (a) a description of changes (b) a denition of which changesare valid We have already introduced informally in Chapter 10 these notions and how they ttogether We next dene the same notions formally and deduce their key properties Languageplugins extend these denitions for base types and constants that they provide

To formalize the notion of changes for elements of a set V we dene the notion of basic changestructure on V

Denition 1211 (Basic change structures)A basic change structure on set V written V comprises

(a) a change set ∆V

80 Chapter 12 Changes and dierentiation formally

∆ι =

∆(σ rarr τ ) = σ rarr ∆σ rarr ∆τ

(a) Change types (Denition 12117)

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

(b) Change contexts (Denition 12123)

D n t o

D n x o = dx

D n c o = DC n c o

(c) Dierentiation (Denition 1051)

Γ ` t τ∆Γ ` D n t o ∆τ

Derive

(d) Dierentiation typing (Lemma 1228)

dv B v1 rarr v2 τ with v1 v2 nτ o dv n∆τ o

dv B v1 rarr v2 ι = df B f1 rarr f2 σ rarr τ = forallda B a1 rarr a2 σ df a1 da B f1 a1 rarr f2 a2 τ

dρ B ρ1 rarr ρ2 Γ with ρ1 ρ2 n Γ o dρ n∆Γ o

B rarr εdρ B ρ1 rarr ρ2 Γ da B a1 rarr a2 τ

(dρx = a1dx = da) B (ρ1x = a1) rarr (ρ2x = a2) Γx τ

(e) Validity (Denitions 12121 and 12124)

If Γ ` t τ then

foralldρ B ρ1 rarr ρ2 Γ nD n t o o dρ B n t o ρ1 rarr n t o ρ2 τ

(f) Correctness of D n ndash o (from Theorem 1222)

Figure 121 Dening dierentiation and proving it correct The rest of this chapter explains andmotivates the above denitions

Chapter 12 Changes and dierentiation formally 81

(b) a ternary validity relation dv B v1 rarr v2 V for v1 v2 isin V and dv isin ∆V that we read as ldquodvis a valid change from source v1 to destination v2 (on set V )rdquo

Example 1212In Examples 1031 and 1032 we exemplied informally change types and validity on naturalsintegers and bags We dene formally basic change structures on naturals and integers Comparedto validity for integers validity for naturals ensures that the destination v1 + dv is again a naturalFor instance given source v1 = 1 change dv = minus2 is valid (with destination v2 = minus1) only onintegers not on naturals

Denition 1213 (Basic change structure on integers)Basic change structure Z on integers has integers as changes (∆Z = Z) and the following validityjudgment

dv B v1 rarr v1 + dv Z

Denition 1214 (Basic change structure on naturals)Basic change structure N on naturals has integers as changes (∆N = Z) and the following validityjudgment

v1 + dv gt 0dv B v1 rarr v1 + dv N

Intuitively we can think of a valid change from v1 to v2 as a graph edge from v1 to v2 so wersquolloften use graph terminology when discussing changes This intuition is robust and can be madefully precise1 More specically a basic change structure on V can be seen as a directed multigraphhaving as vertexes the elements of V and as edges from v1 to v2 the valid changes dv from v1 to v2This is a multigraph because our denition allows multiple edges between v1 and v2

A change dv can be valid from v1 to v2 and from v3 to v4 but wersquoll still want to talk about thesource and the destination of a change When we talk about a change dv valid from v1 to v2 valuev1 is dvrsquos source and v2 is dvrsquos destination Hence wersquoll systematically quantify theorems over validchanges dv with their sources v1 and destination v2 using the following notation2

Notation 1215 (Quantication over valid changes)We write

foralldv B v1 rarr v2 V P

and say ldquofor all (valid) changes dv from v1 to v2 on set V we have Prdquo as a shortcut for

forallv1 v2 isin V dv isin ∆V if dv B v1 rarr v2 V then P

Since we focus on valid changes wersquoll omit the word ldquovalidrdquo when clear from context Inparticular a change from v1 to v2 is necessarily valid

1See for instance Robert Atkeyrsquos blog post [Atkey 2015] or Yufei Cairsquos PhD thesis [Cai 2017]2If you prefer you can tag a change with its source and destination by using a triple and regard the whole triple

(v1 dv v2) as a change Mathematically this gives the correct results but wersquoll typically not use such triples as changes inprograms for performance reasons

We can have multiple basic change structures on the same set

Example 1216 (Replacement changes)For instance for any set V we can talk about replacement changes on V a replacement changedv = v2 for a value v1 V simply species directly a new value v2 V so that v2 B v1 rarr v2 V We read as the ldquobangrdquo operator

A basic change structure can decide to use only replacement changes (which can be appropriatefor primitive types with values of constant size) or to make∆V a sum type allowing both replacementchanges and other ways to describe a change (as long as wersquore using a language plugin that addssum types)

Nil changes Just like integers have a null element 0 among changes there can be nil changes

Denition 1217 (Nil changes)We say that dv ∆V is a nil change for v V if dv B v rarr v V

For instance 0 is a nil change for any integer number n However in general a change might benil for an element but not for another For instance the replacement change 6 is a nil change on 6but not on 5

Wersquoll say more on nil changes in Sec 1212 and 1321

1211 Function spaces

Next we dene a basic change structure that we call Ararr B for an arbitrary function space Ararr Bassuming we have basic change structures for both A and B We claimed earlier that valid functionchanges map valid input changes to valid output changes We make this claim formal through nextdenitionDenition 1218 (Basic change structure on Ararr B)Given basic change structures on A and B we dene a basic change structure on Ararr B that wewrite Ararr B as follows

(a) Change set ∆(Ararr B) is Ararr ∆Ararr ∆B

(b) Function change df is valid from f1 to f2 (that is df B f1 rarr f2 Ararr B) if and only if for allvalid input changes da B a1 rarr a2 A value df a1 da is a valid output change from f1 a1 tof2 a2 (that is df a1 da B f1 a1 rarr f2 a2 B)

Notation 1219 (Applying function changes to changes)When reading out df a1 da wersquoll often talk for brevity about applying df to da leaving darsquos sourcea1 implicit when it can be deduced from context

Wersquoll also consider valid changes df for curried n-ary functions We show what their validitymeans for curried binary functions f Ararr Brarr C We omit similar statements for higher aritiesas they add no new ideas

Lemma 12110 (Validity on Ararr Brarr C)For any basic change structures A B and C function change df ∆(A rarr B rarr C) is valid fromf1 to f2 (that is df B f1 rarr f2 A rarr B rarr C) if and only if applying df to valid input changesda B a1 rarr a2 A and db B b1 rarr b2 B gives a valid output change

df a1 da b1 db B f a1 b1 rarr f a2 b2 C

Proof The equivalence follows from applying the denition of function validity of df twiceThat is function change df is valid (df B f1 rarr f2 Ararr (Brarr C)) if and only if it maps valid

input change da B a1 rarr a2 A to valid output change

df a1 da B f1 a1 rarr f2 a2 Brarr C

In turn df a1 da is a function change which is valid if and only if it maps valid input changedb B b1 rarr b2 B to

as required by the lemma

1212 DerivativesAmong valid function changes derivatives play a central role especially in the statement ofcorrectness of dierentiationDenition 12111 (Derivatives)Given function f Ararr B function df Ararr ∆Ararr ∆B is a derivative for f if for all changes dafrom a1 to a2 on set A change df a1 da is valid from f a1 to f a2

However it follows that derivatives are nil function changes

Lemma 12112 (Derivatives as nil function changes)Given function f A rarr B function df ∆(A rarr B) is a derivative of f if and only if df is a nilchange of f (df B f rarr f Ararr B)

Proof First we show that nil changes are derivatives First a nil change df B f rarr f Ararr B hasthe right type to be a derivative because A rarr ∆A rarr ∆B = ∆(A rarr B) Since df is a nil changefrom f to f by denition it maps valid input changes da B a1 rarr a2 A to valid output changesdf a1 da B f a1 rarr f a2 B Hence df is a derivative as required

In fact all proof steps are equivalences and by tracing them backward we can show thatderivatives are nil changes Since a derivative df maps valid input changes da B a1 rarr a2 A tovalid output changes df a1 da B f a1 rarr f a2 B df is a change from f to f as required

Applying derivatives to nil changes gives again nil changes This fact is useful when reasoningon derivatives The proof is a useful exercise on using validity

Lemma 12113 (Derivatives preserve nil changes)For any basic change structures A and B function change df ∆(Ararr B) is a derivative of f Ararr B(df B f rarr f Ararr B) if and only if applying df to an arbitrary input nil change da B a rarr a Agives a nil change

df a da B f a rarr f a B

Proof Just rewrite the denition of derivatives (Lemma 12112) using the denition of validity ofdf

In detail by denition of validity for function changes (Denition 1218) df B f1 rarr f2 Ararr Bmeans that from da B a1 rarr a2 A follows df a1 da B f1 a1 rarr f2 a2 B Just substitute f1 = f2 = fand a1 = a2 = a to get the required equivalence

Also derivatives of curried n-ary functions f preserve nil changes We only state this formallyfor curried binary functions f Ararr Brarr C higher arities require no new ideas

Lemma 12114 (Derivatives preserve nil changes on Ararr Brarr C)For any basic change structures A B and C Change df ∆(A rarr B rarr C) is a derivative off Ararr Brarr C if and only if applying df to nil changes da B a rarr a A and db B b rarr b B givesa nil change

df a da b db B f a b rarr f a b C

Proof Similarly to validity on Ararr Brarr C (Lemma 12110) the thesis follows by applying twicethe fact that derivatives preserve nil changes (Lemma 12113)

In detail since derivatives preserve nil changes df is a derivative if and only if for all da Ba rarr a A we have df a da B f a rarr f a B rarr C But then df a da is a nil change that is aderivative and since it preserves nil changes df is a derivative if and only if for all da B a rarr a Aand db B b rarr b B we have df a da b db B f a b rarr f a b C

1213 Basic change structures on typesAfter studying basic change structures in the abstract we apply them to the study of our objectlanguage

For each type τ we can dene a basic change structure on domain nτ o which we write τ Language plugins must provide basic change structures for base types To provide basic changestructures for function types σ rarr τ we use the earlier construction for basic change structuresσ rarr τ on function spaces nσ rarr τ o (Denition 1218)Denition 12115 (Basic change structures on types)For each type τ we associate a basic change structure on domain nτ o written τ through thefollowing equations

ι = σ rarr τ = σ rarr τ

Plugin Requirement 12116 (Basic change structures on base types)For each base type ι the plugin denes a basic change structure on n ι o that we write ι

Crucially for each type τ we can dene a type ∆τ of changes that we call change type suchthat the change set ∆ nτ o is just the domain n∆τ o associated to change type ∆τ ∆ nτ o = n∆τ oThis equation allows writing change terms that evaluate directly to change values3

Denition 12117 (Change types)The change type ∆τ of a type τ is dened as follows

∆ι =

Lemma 12118 (∆ and n ndasho commute on types)For each type τ change set ∆ nτ o equals the domain of change type n∆τ o

Proof By induction on types For the case of function types we simply prove equationally that∆ nσ rarr τ o = ∆(nσ o rarr nτ o) = nσ o rarr ∆ nσ o rarr ∆ nτ o = nσ o rarr n∆σ o rarr n∆τ o =nσ rarr ∆σ rarr ∆τ o = n∆(σ rarr τ ) o The case for base types is delegates to plugins (Plugin Require-ment 12119)

3Instead in earlier proofs [Cai et al 2014] the values of change terms were not change values but had to be related tochange values through a logical relation see Sec 154

Plugin Requirement 12119 (Base change types)For each base type ι the plugin denes a change type ∆ι such that ∆ n ι o = n∆ι o

We refer to values of change types as change values or just changes

Notation 12120We write basic change structures for types τ not nτ o and dv B v1 rarr v2 τ not dv B v1 rarr v2 nτ o We also write consistently n∆τ o not ∆ nτ o

1214 Validity as a logical relation

Next we show an equivalent denition of validity for values of terms directly by induction ontypes as a ternary logical relation between a change its source and destination A typical logicalrelation constrains functions to map related input to related outputs In a twist validity constrainsfunction changes to map related inputs to related outputs

Denition 12121 (Change validity)We say that dv is a (valid) change from v1 to v2 (on type τ ) and write dv B v1 rarr v2 τ if dv n∆τ ov1 v2 nτ o and dv is a ldquovalidrdquo description of the dierence from v1 to v2 as we dene in Fig 121e

The key equations for function types are

df B f1 rarr f2 σ rarr τ = forallda B a1 rarr a2 σ df a1 da B f1 a1 rarr f2 a2 τ

Remark 12122We have kept repeating the idea that valid function changes map valid input changes to valid outputchanges As seen in Sec 104 and Lemmas 12110 and 12114 such valid outputs can in turn bevalid function changes Wersquoll see the same idea at work in Lemma 12127 in the correctness proofof D n ndash o

As we have nally seen in this section this denition of validity can be formalized as a logicalrelation dened by induction on types Wersquoll later take for granted the consequences of validitytogether with lemmas such as Lemma 12110

1215 Change structures on typing contexts

To describe changes to the inputs of a term we now also introduce change contexts ∆Γ environmentchanges dρ n∆Γ o and validity for environment changes dρ B ρ1 rarr ρ2 Γ

A valid environment change from ρ1 n Γ o to ρ2 n Γ o is an environment dρ n∆Γ o thatextends environment ρ1 with valid changes for each entry We rst dene the shape of environmentchanges through change contexts

Denition 12123 (Change contexts)For each context Γ we dene change context ∆Γ as follows

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

Then we describe validity of environment changes via a judgment

Denition 12124 (Environment change validity)We dene validity for environment changes through judgment dρ B ρ1 rarr ρ2 Γ pronounced ldquodρis an environment change from ρ1 to ρ2 (at context Γ)rdquo where ρ1 ρ2 n Γ o dρ n∆Γ o via thefollowing inference rules

Denition 12125 (Basic change structures for contexts)To each context Γ we associate a basic change structure on set n Γ o We take n∆Γ o as change setand reuse validity on environment changes (Denition 12124)

Notation 12126We write dρ B ρ1 rarr ρ2 Γ rather than dρ B ρ1 rarr ρ2 n Γ o

Finally to state and prove correctness of dierentiation we are going to need to discuss functionchanges on term semantics The semantics of a term Γ ` t τ is a function n t o from environmentsin n Γ o to values in nτ o To discuss changes to n t o we need a basic change structure on functionspace n Γ orarr nτ o

Lemma 12127The construction of basic change structures on function spaces (Denition 1218) associates to eachcontext Γ and type τ a basic change structure Γ rarr τ on function space n Γ orarr nτ o

Notation 12128As usual we write the change set as ∆(n Γ o rarr nτ o) for validity we write df B f1 rarr f2 Γτrather than df B f1 rarr f2 n Γ orarr nτ o

122 Correctness of dierentiationIn this section we state and prove correctness of dierentiation a term-to-term transformationwritten D n t o that produces incremental programs We recall that all our results apply only towell-typed terms (since we formalize no other ones)

Earlier we described how D n ndash o behaves through Slogan 1033 mdash here is it again for reference

In our slogan we do not specify what we meant by inputs though we gave examples duringthe discussion We have now the notions needed for a more precise statement Term D n t o mustsatisfy our slogan for two sorts of inputs

1 Evaluating D n t o must map an environment change dρ from ρ1 to ρ2 into a valid resultchange nD n t o o dρ going from n t o ρ1 to n t o ρ2

2 As we learned since stating our slogan validity is dened by recursion over types If term t hastype σ rarr τ change nD n t o o dρ can in turn be a (valid) function change (Remark 12122)Function changes map valid changes for their inputs to valid changes for their outputs

Instead of saying that nD n t o o maps dρ B ρ1 rarr ρ2 Γ to a change from n t o ρ1 to n t o ρ2we can say that function n t o∆ = λρ dρ rarr nD n t o o dρ must be a nil change for n t o that is aderivative for n t o We give a name to this function change and state D n ndash orsquos correctness theorem

Denition 1221 (Incremental semantics)We dene the incremental semantics of a well-typed term Γ ` t τ in terms of dierentiation as

n t o∆ = (λρ1 dρ rarr nD n t o o dρ) n Γ orarr n∆Γ orarr n∆τ o

Theorem 1222 (D n ndasho is correct)Function n t o∆ is a derivative of n t o That is if Γ ` t τ and dρ B ρ1 rarr ρ2 Γ then nD n t o o dρ Bn t o ρ1 rarr n t o ρ2 τ

For now we discuss this statement further we defer the proof to Sec 1222

Remark 1223 (Why n ndasho∆ ignores ρ1)You might wonder why n t o∆ = λρ1 dρ rarr nD n t o o dρ appears to ignore ρ1 But for alldρ B ρ1 rarr ρ2 Γ change environment dρ extends ρ1 which hence provides no further informationWe are only interested in applying n t o∆ to valid environment changes dρ so n t o∆ ρ1 dρ cansafely ignore ρ1

Remark 1224 (Term derivatives)In Chapter 10 we suggested that D n t o only produced a derivative for closed terms not for openones But n t o∆ = λρ dρ rarr nD n t o o dρ is always a nil change and derivative of n t o for anyΓ ` t τ There is no contradiction because the value of D n t o is nD n t o o dρ which is only anil change if dρ is a nil change as well In particular for closed terms (Γ = ε) dρ must equal theempty environment hence a nil change If τ is a function type df = nD n t o o dρ accepts furtherinputs since df must be a valid function change it will also map them to valid outputs as requiredby our Slogan 1033 Finally if Γ = ε and τ is a function type then df = nD n t o o is a derivativeof f = n t o

We summarize this remark with the following denition and corollary

Denition 1225 (Derivatives of terms)For all closed terms of function type ` t σ rarr τ we call D n t o the (term) derivative of t

Corollary 1226 (Term derivatives evaluate to derivatives)For all closed terms of function type ` t σ rarr τ function nD n t o o is a derivative of n t o

Proof Because n t o∆ is a derivative (Theorem 1222) and applying derivative n t o∆ to nil change gives a derivative (Lemma 12113)

Remark 1227We typically talk a derivative of a function value f A rarr B not the derivative since multipledierent functions can satisfy the specication of derivatives We talk about the derivative to referto a canonically chosen derivative For terms and their semantics the canonical derivative the oneproduced by dierentiation For language primitives the canonical derivative is the one chosen bythe language plugin under consideration

Theorem 1222 only makes sense if D n ndash o has the right static semantics

Lemma 1228 (Typing of D n ndasho)Typing rule

Derive

is derivable

After wersquoll dene oplus in next chapter wersquoll be able to relate oplus to validity by proving Lemma 1344which we state in advance here

Lemma 1344 (oplus agrees with validity)If dv B v1 rarr v2 τ then v1 oplus dv = v2

Hence updating base result n t o ρ1 by change nD n t o o dρ via oplus gives the updated resultn t o ρ2

Corollary 1345 (D n ndasho is correct corollary)If Γ ` t τ and dρ B ρ1 rarr ρ2 Γ then n t o ρ1 oplus nD n t o o dρ = n t o ρ2

We anticipate the proof of this corollary

Proof First dierentiation is correct (Theorem 1222) so under the hypotheses

nD n t o o dρ B n t o ρ1 rarr n t o ρ2 τ

that judgement implies the thesis

n t o ρ1 oplus nD n t o o dρ = n t o ρ2

because oplus agrees with validty (Lemma 1344)

1221 Plugin requirementsDierentiation is extended by plugins on constants so plugins must prove their extensions correct

Plugin Requirement 1229 (Typing of DC n ndasho)For all `C c τ the plugin denes DC n c o satisfying ` DC n c o ∆τ

Plugin Requirement 12210 (Correctness of DC n ndasho)For all `C c τ

DC n c o is a derivative for n c o

Since constants are typed in the empty context and the only change for an empty environmentis an empty environment Plugin Requirement 12210 means that for all `C c τ we have

DC n c o B n c o rarr n c o τ

1222 Correctness proofWe next recall D n ndash orsquos denition and prove it satises its correctness statement Theorem 1222

D n x o = dx

D n c o = DC n c o

Before correctness we prove Lemma 1228

Derive

is derivable

Proof The thesis can be proven by induction on the typing derivation Γ ` t τ The case forconstants is delegated to plugins in Plugin Requirement 1229

We prove Theorem 1222 using a typical logical relations strategy We proceed by inductionon term t and prove for each case that if D n ndash o preserves validity on subterms of t then alsoD n t o preserves validity Hence if the input environment change dρ is valid then the result ofdierentiation evaluates to valid change nD n t o o dρ

Readers familiar with logical relations proofs should be able to reproduce this proof on theirown as it is rather standard once one uses the given denitions In particular this proof resemblesclosely the proof of the abstraction theorem or relational parametricity (as given by Wadler [1989Sec 6] or by Bernardy and Lasson [2011 Sec 33 Theorem 3]) and the proof of the fundamentaltheorem of logical relations by Statman [1985]

Nevertheless we spell this proof out and use it to motivate how D n ndash o is dened more formallythan we did in Sec 105 For each case we rst give a short proof sketch and then redo the proof inmore detail to make the proof easier to follow

Proof By induction on typing derivation Γ ` t τ

bull Case Γ ` x τ The thesis is that nD n x o o is a derivative for n x o that is nD n x o o dρ Bn x o ρ1 rarr n x o ρ2 τ Since dρ is a valid environment change from ρ1 to ρ2 n dx o dρ is avalid change from n x o ρ1 to n x o ρ2 Hence dening D n x o = dx satises our thesis

bull Case Γ ` s t τ The thesis is that nD n s t o o is a derivative for n s t o that is nD n s t o o dρ Bn s t o ρ1 rarr n s t o ρ2 τ By inversion of typing there is some typeσ such that Γ ` s σ rarr τand Γ ` t σ

To prove the thesis in short you can apply the inductive hypothesis to s and t obtaining respec-tively that nD n s o o and nD n t o o are derivatives for n s o and n t o In particular nD n s o oevaluates to a validity-preserving function change Term D n s t o that is D n s o t D n t oapplies validity-preserving function D n s o to valid input change D n t o and this produces avalid change for s t as requiredIn detail our thesis is that for all dρ B ρ1 rarr ρ2 Γ we have

nD n s t o o dρ B n s t o ρ1 rarr n s t o ρ2 τ

where n s t o ρ = (n s o ρ) (n t o ρ) and

nD n s t o o dρ= nD n s o t D n t o o dρ= (nD n s o o dρ) (n t o dρ) (nD n t o o dρ)= (nD n s o o dρ) (n t o ρ1) (nD n t o o dρ)

The last step relies on n t o dρ = n t o ρ1 Since weakening preserves meaning (Lemma A28)this follows because dρ n∆Γ o extends ρ1 n Γ o and t can be typed in context ΓOur thesis becomes

nD n s o o dρ (n t o ρ1) (nD n t o o dρ) B n s o ρ1 (n t o ρ1) rarr n s o ρ2 (n t o ρ2) τ

By the inductive hypothesis on s and t we have

nD n s o o dρ B n s o ρ1 rarr n s o ρ2 σ rarr τ

nD n t o o dρ B n t o ρ1 rarr n t o ρ2 σ

Since n s o is a function its validity means

forallda B a1 rarr a2 σ nD n s o o dρ a1 da B n s o ρ1 a1 rarr n s o ρ2 a2 τ

Instantiating in this statement the hypothesis da B a1 rarr a2 σ by nD n t o o dρ Bn t o ρ1 rarr n t o ρ2 σ gives the thesis

bull Case Γ ` λx rarr t σ rarr τ By inversion of typing Γ x σ ` t τ By typing of D n ndash o youcan show that

∆Γ x σ dx ∆σ ` D n t o ∆τ

In short our thesis is that n λx rarr t o∆ = λρ1 dρ rarr n λx dx rarr D n t o o dρ is a derivative ofn λx rarr t o After a few simplications our thesis reduces to

nD n t o o (dρ x = a1 dx = da) B n t o (ρ1 x = a1) rarr n t o (ρ2 x = a2) τ

for all dρ B ρ1 rarr ρ2 Γ and da B a1 rarr a2 σ But then the thesis is simply that n t o∆ isthe derivative of n t o which is true by inductive hypothesisMore in detail our thesis is that n λx rarr t o∆ is a derivative for n λx rarr t o that is

foralldρ B ρ1 rarr ρ2 ΓnD n λx rarr t o o dρ B n λx rarr t o ρ1 rarr n λx rarr t o ρ2 σ rarr τ (121)

By simplifying the thesis Eq (121) becomes

foralldρ B ρ1 rarr ρ2 Γλa1 dararr nD n t o o (dρ x = a1 dx = da) B

(λa1 rarr n t o (ρ1 x = a1)) rarr (λa2 rarr n t o (ρ2 x = a2)) σ rarr τ (122)

By denition of validity of function type the thesis Eq (122) becomes

foralldρ B ρ1 rarr ρ2 Γ forallda B a1 rarr a2 σ nD n t o o (dρ x = a1 dx = da) B

n t o (ρ1 x = a1) rarr n t o (ρ2 x = a2) τ (123)

To prove the rewritten thesis Eq (123) take the inductive hypothesis on t it says thatnD n t o o is a derivative for n t o so nD n t o o maps valid environment changes on Γ x σ tovalid changes on τ But by inversion of the validity judgment all valid environment changeson Γ x σ can be written as

(dρx = a1dx = da) B (ρ1x = a1) rarr (ρ2x = a2) Γx σ

for valid changes dρ B ρ1 rarr ρ2 Γ and da B a1 rarr a2 σ So the inductive hypothesis isthat

But that is exactly our thesis Eq (123) so wersquore done

bull Case Γ ` c τ In essence since weakening preserves meaning we can rewrite the thesis tomatch Plugin Requirement 12210In more detail the thesis is that DC n c o is a derivative for c that is if dρ B ρ1 rarr ρ2 Γ thennD n c o o dρ B n c o ρ1 rarr n c o ρ2 τ Since constants donrsquot depend on the environmentand weakening preserves meaning (Lemma A28) and by the denitions of n ndash o and D n ndash oon constants the thesis can be simplied to

DC n c o B n c o rarr n c o τ which is

delegated to plugins in Plugin Requirement 12210

123 Discussion

1231 The correctness statementWe might have asked for the following correctness property

Theorem 1231 (Incorrect correctness statement)If Γ ` t τ and ρ1 oplus dρ = ρ2 then (n t o ρ1) oplus (nD n t o o dρ) = (n t o ρ2)

However this property is not quite right We can only prove correctness if we restrict thestatement to input changes dρ that are valid Moreover to prove this statement by induction weneed to strengthen its conclusion we require that the output change nD n t o o dρ is also valid To

see why consider term (λx rarr s) t Here the output of t is an input of s Similarly in D n (λx rarr s) t othe output of D n t o becomes an input change for subterm D n t o and D n s o behaves correctlyonly if only if D n t o produces a valid change

Typically change types contain values that invalid in some sense but incremental programswill preserve validity In particular valid changes between functions are in turn functions that takevalid input changes to valid output changes This is why we formalize validity as a logical relation

1232 Invalid input changesTo see concretely why invalid changes in general can cause derivatives to produce incorrect resultsconsider again grandTotal = λxs ys rarr sum (merge xs ys) from Sec 102 Suppose a bag changedxs removes an element 20 from input bag xs while dys makes no changes to ys in this case theoutput should decrease so dz = dgrandTotal xs dxs ys dys should be minus20 However that is onlycorrect if 20 is actually an element of xs Otherwise xs oplus dxs will make no change to xs hence thecorrect output change dz would be 0 instead of minus20 Similar but trickier issues apply with functionchanges see also Sec 152

1233 Alternative environment changesEnvironment changes can also be dened dierently We will use this alternative denition later (inAppendix D22)

A change dρ from ρ1 to ρ2 contains a copy of ρ1 Thanks to this copy we can use an environ-ment change as environment for the result of dierentiation that is we can evaluate D n t o withenvironment dρ and Denition 1221 can dene n t o∆ as λρ1 dρ rarr nD n t o o dρ

But we could adapt denitions to omit the copy of ρ1 from dρ by setting

∆ (Γx τ ) = ∆Γ dx ∆τ

and adapting other denitions Evaluating D n t o still requires base inputs we could then set n t o∆ =λρ1 dρ rarr nD n t o o (ρ1 dρ) where ρ1 dρ simply merges the two environments appropriately(we omit a formal denition) This is the approach taken by Cai et al [2014] When provingTheorem 1222 using one or the other denition for environment changes makes little dierenceif we embed the base environment in environment changes we reduce noise because we need notdene environment meging formally

Later (in Appendix D22) we will deal with environment explicitly and manipulate them inprograms Then we will use this alternative denition for environment changes since it will beconvenient to store base environments separately from environment changes

1234 Capture avoidanceDierentiation generates new names so a correct implementation must prevent accidental captureTill now we have ignored this problem

Using de Bruijn indexes Our mechanization has no capture issues because it uses de Bruijnindexes Change context just alternate variables for base inputs and input changes A context suchas Γ = x Z y Bool is encoded as Γ = ZBool its change context is ∆Γ = Z∆ZBool∆Bool Thissolution is correct and robust and is the one we rely on

Alternatively we can mechanize ILC using separate syntax for change terms dt that use separatenamespaces for base variables and change variables

ds dt = dc| λ(x σ ) (dx ∆σ ) rarr dt| ds t dt| dx

In that case change variables live in a separate namespace Example context Γ = ZBool givesrise to a dierent sort of change context ∆Γ = ∆Z∆Bool And a change term in context Γ isevaluted with separate environments for Γ and ∆Γ This is appealing because it allows deningdierentiation and proving it correct without using weakening and applying its proof of soundnessWe still need to use weakening to convert change terms to their equivalents in the base languagebut proving that conversion correct is more straightforward

Using names Next we discuss issues in implementing this transformation with names ratherthan de Bruijn indexes Using names rather than de Bruijn indexes makes terms more readable thisis also why in this thesis we use names in our on-paper formalization

Unlike the rest of this chapter we keep this discussion informal also because we have notmechanized any denitions using names (as it may be possible using nominal logic) nor attemptedformal proofs The rest of the thesis does not depend on this material so readers might want toskip to next section

Using names introduces the risk of capture as it is common for name-generating transfor-mations [Erdweg et al 2014] For instance dierentiating term t = λx rarr f dx gives D n t o =λx dx rarr df dx ddx Here variable dx represents a base input and is free in t yet it is incorrectlycaptured in D n t o by the other variable dx the one representing xrsquos change Dierentiation givesinstead a correct result if we α-rename x in t to any other name (more on that in a moment)

A few workarounds and xes are possible

bull As a workaround we can forbid names starting with the letter d for variables in base termsas we do in our examples thatrsquos formally correct but pretty unsatisfactory and inelegantWith this approach our term t = λx rarr f dx is simply forbidden

bull As a better workaround instead of prexing variable names with d we can add changevariables as a separate construct to the syntax of variables and forbid dierentiation on termsthat containing change variables This is a variant of the earlier approach using separatechange terms While we used this approach in our prototype implementation in Scala [Caiet al 2014] it makes our output language annoyingly non-standard Converting to a standardlanguage using names (not de Bruijn indexes) raises again capture issues

bull We can try to α-rename existing bound variables as in the implementation of capture-avoiding substitution As mentioned in our case we can rename bound variable x to y andget t prime = λy rarr f dx Dierentiation gives the correct result D n t prime o = λy dy rarr df dx ddxIn general we can dene D n λx rarr t o = λy dy rarr D n t [x = y ] o where neither y nor dyappears free in t that is we search for a fresh variable y (which being fresh does not appearanywhere else) such that also dy does not appear free in tThis solution is however subtle it reuses ideas from capture-avoiding substitution which iswell-known to be subtle We have not attempted to formally prove such a solution correct (oreven test it) and have no plan to do so

bull Finally and most easily we can α-rename new bound variables the ones used to refer tochanges or rather only pick them fresh But if we pick say fresh variable dx1 to refer tothe change of variable x we must use dx1 consistently for every occurrence of x so that

D n λx rarr x o is not λdx1 rarr dx2 Hence we extend D n ndash o to also take a map from names tonames as follows

D n λ(x σ ) rarr tm o = λ(x σ ) (dx ∆σ ) rarr D n t (m [x rarr dx ]) oD n s tm o = D n sm o t D n tm o

D n xm o = m(x)

D n cm o = DC n c o

where m(x) represents lookup of x in map m dx is now a fresh variable that does not appearin t and m [x rarr dx ] extend m with a new mapping from x to dx

But this approach that is using a map from base variables to change variables aects theinterface of dierentiation In particular it aects which variables are free in output termshence we must also update the denition of ∆Γ and derived typing rule Derive With this ap-proach if term s is closed then D n s emptyMap o gives a result α-equivalent to the old D n s oas long as s triggers no capture issues But if instead s is open invoking D n s emptyMap o isnot meaningful we must pass an initial map m containing mappings from srsquos free variablesto fresh variables for their changes These change variables appear free in D n sm o hencewe must update typing rule Derive and modify ∆Γ to use m

We dene ∆mΓ by adding m as a parameter to ∆Γ and use it in a modied rule Deriversquo

∆mε = ε

∆m (Γx τ ) = ∆mΓx τ m(x) ∆τ

Γ ` t τ∆mΓ ` D n tm o ∆τ

Deriversquo

We conjecture that Deriversquo holds and that D n tm o is correct but we have attempted noformal proof

124 Plugin requirement summary

For reference we repeat here plugin requirements spread through the chapter

125 Chapter conclusionIn this chapter we have formally dened changes for values and environments of our languageand when changes are valid Through these denitions we have explained that D n t o is correctthat is that it maps changes to the input environment to changes to the output environment All ofthis assumes among other things that language plugins dene valid changes for their base typesand derivatives for their primitives

In next chapter we discuss operations we provide to construct and use changes These operationswill be especially useful to provide derivatives of primitives

Chapter 13

Change structures

In the previous chapter we have shown that evaluating the result of dierentiation produces a validchange dv from the old output v1 to the new one v2 To compute v2 from v1 and dv in this chapterwe introduce formally the operator oplus mentioned earlier

To dene dierentiation on primitives plugins need a few operations on changes not just oplus and 0

To formalize these operators and specify their behavior we extend the notion of basic changestructure into the notion of change structure in Sec 131 The change structure for function spaces isnot entirely intuitive so we motivate it in Sec 132 Then we show how to take change structureson A and B and dene new ones on A rarr B in Sec 133 Using these structures we nally showthat starting from change structures for base types we dene change structures for all types τ andcontexts Γ in Sec 134 completing the core theory of changes

131 Formally dening oplus and change structures

In this section we dene what is a change structure on a set V A change structure V extends a basicchange structure V with change operators oplus and 0 Change structures also require changeoperators to respect validity as described below Key properties of change structures follow inSec 1312

As usual wersquoll use metavariables v v1 v2 will range over elements ofV while dv dv1 dv2 will range over elements of ∆V

Letrsquos rst recall change operators Operator oplus (ldquooplusrdquo) updates a value with a change If dv is avalid change from v1 to v2 then v1 oplus dv (read as ldquov1 updated by dvrdquo or ldquov1 oplus dvrdquo) is guaranteedto return v2 Operator (ldquoominusrdquo) produces a dierence between two values v2 v1 is a validchange from v1 to v2 Operator 0 (ldquonilrdquo) produces nil changes 0v is a nil change for v Finallychange composition (ldquoocomposerdquo) ldquopastes changes togetherrdquo if dv1 is a valid change from v1 tov2 and dv2 is a valid change from v2 to v3 then dv1 dv2 (read ldquodv1 composed with dv2rdquo) is a validchange from v1 to v3

We summarize these descriptions in the following denition

Denition 1311A change structure V over a set V requires

(a) A basic change structure for V (hence change set ∆V and validity dv B v1 rarr v2 V )

98 Chapter 13 Change structures

(b) An update operation oplus V rarr ∆V rarr V that updates a value with a change Update mustagree with validity for all dv B v1 rarr v2 V we require v1 oplus dv = v2

(c) A nil change operation 0 V rarr ∆V that must produce nil changes for all v V we require0v B v rarr v V

(d) a dierence operation V rarr V rarr ∆V that produces a change across two values for allv1 v2 V we require v2 v1 B v1 rarr v2 V

(e) a change composition operation ∆V rarr ∆V rarr ∆V that composes together two changesrelative to a base value Change composition must preserve validity for all dv1 B v1 rarr v2 Vand dv2 B v2 rarr v3 V we require dv1 dv2 B v1 rarr v3 V

Notation 1312Operators oplus and can be subscripted to highlight their base set but we will usually omit suchsubscripts Moreover oplus is left-associative so that v oplus dv1 oplus dv2 means (v oplus dv1) oplus dv2

Finally whenever we have a change structure such as A B V and so on we write respectivelyA B V to refer to its base set

1311 Example Group changesAs an example we show next that each group induces a change structure on its carrier This changestructure also subsumes basic change structures we saw earlier on integersDenition 1313 (Change structure on groups G)Given any group (G e+minus) we can dene a change structure G on carrier set G as follows

(a) The change set is G

(b) Validity is dened as dg B g1 rarr g2 G if and only if g2 = g1 + dg

(c) Change update coincides with + g1 oplus dg = g1 + dg Hence oplus agrees perfectly with validityfor all g1 isin G all changes dg are valid from g1 to g1 oplus dg (that is dg B g1 rarr g1 oplus dg G)

(d) We dene dierence as g2 g1 = (minusg1) + g2 Verifying g2 g1 B g1 rarr g2 G reduces toverifying g1 + ((minusg1) + g2) = g2 which follows from group axioms

(e) The only nil change is the identity element 0g = e Validity 0g B g rarr g G reduces tog + e = g which follows from group axioms

(f) Change composition also coincides with + dg1dg2 = dg1+dg2 Letrsquos prove that compositionrespects validity Our hypothesis is dg1 B g1 rarr g2 G and dg2 B g2 rarr g3 G Because oplusagrees perfectly with validity the hypothesis is equivalent to g2 = g1 oplus dg1 and

g3 = g2 oplus dg2 = (g1 oplus dg1) oplus dg2

Our thesis is dg1 dg2 B g1 rarr g3 G that is

g3 = g1 oplus (dg1 dg2)

Hence the thesis reduces to

(g1 oplus dg1) oplus dg2 = g1 oplus (dg1 dg2)

hence to g1 + (dg1 + dg2) = (g1 + dg1) + dg2 which is just the associative law for group G As we show later group change structures are useful to support aggregation

Chapter 13 Change structures 99

1312 Properties of change structuresTo understand better the denition of change structures we present next a few lemmas followingfrom this denitionLemma 1314 ( inverts oplus)oplus inverts that is

v1 oplus (v2 v1) = v2

for change structure V and any values v1 v2 V

Proof For change structures we know v2 v1 B v1 rarr v2 V and v1 oplus (v2 v1) = v2 followsMore in detail Change dv = v2 v1 is a valid change from v1 to v2 (because produces valid

changes v2 v1 B v1 rarr v2 V ) so updating dvrsquos source v1 with dv produces dvrsquos destination v2(because oplus agrees with validity that is if dv B v1 rarr v2 V then v1 oplus dv = v2)

Lemma 1315 (A change canrsquot be valid for two destinations with the same source)Given a change dv ∆V and a source v1 V dv can only be valid with v1 as source for a singledestination That is if dv B v1 rarr v2a V and dv B v1 rarr v2b V then v2a = v2b

Proof The proof follows intuitively because oplus also maps change dv and its source v1 to its destina-tion and oplus is a function

More technically since oplus respects validity the hypotheses mean that v2a = v1 oplus dv = v2b asrequired

Beware that changes can be valid for multiple sources and associate them to dierent destinationFor instance integer 0 is a valid change for all integers

If a change dv has source v dvrsquos destination equals v oplus dv So to specify that dv is valid withsource v without mentioning dvrsquos destination we introduce the following denition

Denition 1316 (One-sided validity)We dene relation VA as (v dv) isin A times ∆A | dv B v rarr v oplus dv V

We use this denition right away

Lemma 1317 ( and oplus interact correctly)If (v1 dv1) isin VV and (v1 oplus dv1 dv2) isin VV then v1 oplus (dv1 dv2) = v1 oplus dv1 oplus dv2

Proof We know that preserves validity so under the hypotheses (v1 dv1) isin VV and (v1 oplusdv1 dv2) isin VV we get that dv = dv1 dv2 is a valid change from v1 to v1 oplus dv1 oplus dv2

dv1 dv2 B v1 rarr v1 V oplus dv1 oplus dv2

Hence updating dvrsquos source v1 with dv produces dvrsquos destination v1 oplus dv1 oplus dv2

v1 oplus (dv1 dv2) = v1 oplus dv1 oplus dv2

We can dene 0 in terms of other operations and prove they satisfy their requirements forchange structures

Lemma 1318 (0 can be derived from )If we dene 0v = v v then 0 produces valid changes as required (0v B v rarr v V ) for any changestructure V and value v V

Proof This follows from validity of (v2 v1 B v1 rarr v2 V ) instantiated with v1 = v and v2 = v

Moreover nil changes are a right identity element for oplus

Corollary 1319 (Nil changes are identity elements)Any nil change dv for a value v is a right identity element for oplus that is we have v oplus dv = v forevery set V with a basic change structure every element v isin V and every nil change dv for v

Proof This follows from Lemma 1344 and the denition of nil changes

The converse of this theorem does not hold there exists changes dv such that v oplus dv = v yetdv is not a valid change from v to v More in general v1 oplus dv = v2 does not imply that dv is avalid change For instance under earlier denitions for changes on naturals if we take v = 0 anddv = minus5 we have v oplus dv = v even though dv is not valid examples of invalid changes on functionsare discussed at Sec 1232 and Sec 152 However we prefer to dene ldquodv is a nil changerdquo as we doto imply that dv is valid not just a neutral element

132 Operations on function changes informally

1321 Examples of nil changes

We have not dened any change structure yet but we can already exhibit nil changes for somevalues including a few functions

Example 1321bull An environment change for an empty environment n ε o must be an environment for the

empty context ∆ε = ε so it must be the empty environment In other words if and only ifdρ B rarr ε then and only then dρ = and in particular dρ is a nil change

bull If all values in an environment ρ have nil changes the environment has a nil change dρ = 0ρassociating each value to a nil change Indeed take a context Γ and a suitable environmentρ n Γ o For each typing assumption x τ in Γ assume we have a nil change 0v for v Thenwe dene 0ρ n∆Γ o by structural recursion on ρ as

0 = 0ρx=v = 0ρ x = v dx = 0v

Then we can see that 0ρ is indeed a nil change for ρ that is 0ρ B ρ rarr ρ Γ

bull We have seen in Theorem 1222 that whenever Γ ` t τ n t o has nil change n t o∆ Moreoverif we have an appropriate nil environment change dρ B ρ rarr ρ Γ (which we often do asdiscussed above) then we also get a nil change n t o∆ ρ dρ for n t o ρ

n t o∆ ρ dρ B n t o ρ rarr n t o ρ τ

In particular for all closed well-typed terms ` t τ we have

n t o∆ B n t o rarr n t o τ

1322 Nil changes on arbitrary functionsWe have discussed how to nd a nil change 0f for a function f if we know the intension of f that isits denition What if we have only its extension that is its behavior Can we still nd 0f Thatrsquosnecessary to implement 0 as an object-language function 0 from f to 0f since such a function doesnot have access to f rsquos implementation Thatrsquos also necessary to dene 0 on metalanguage functionspaces mdash and we look at this case rst

We seek a nil change 0f for an arbitrary metalanguage function f Ararr B where A and B arearbitrary sets we assume a basic change structure on Ararr B and will require A and B to support afew additional operations We require that

0f B f rarr f Ararr B (131)

Equivalently whenever da B a1 rarr a2 A then 0f a1 da B f a1 rarr f a2 B By Lemma 1344 itfollows that

f a1 oplus 0f a1 da = f a2 (132)where a1 oplus da = a2

To dene 0f we solve Eq (132) To understand how we use an analogy oplus and are intendedto resemble + and minus To solve f a1 + 0f a1 da = f a2 we subtract f a1 from both sides and write0f a1 da = f a2 minus f a1

Similarly here we use operator 0f must equal

0f = λa1 dararr f (a1 oplus da) f a1 (133)

Because b2 b1 B b1 rarr b2 B for all b1 b2 B we can verify that 0f as dened by Eq (133) satisesour original requirement Eq (131) not just its weaker consequence Eq (132)

We have shown that to dene 0 on functions f A rarr B we can use at type B Withoutusing f rsquos intension we are aware of no alternative To ensure 0f is dened for all f we require thatchange structures dene We can then dene 0 as a derived operation via 0v = v v and verifythis derived denition satises requirements for 0ndash

Next we show how to dene on functions We seek a valid function change f2 f1 from f1 tof2 We have just sought and found a valid change from f to f generalizing the reasoning we usedwe obtain that whenever da B a1 rarr a2 A then we need to have (f2 f1) a1 da B f1 a1 rarr f2 a2 Bsince a2 = a1 oplus da we can dene

f2 f1 = λa1 dararr f2 (a1 oplus da) f1 a1 (134)One can verify that Eq (134) denes f2 f1 as a valid function from f1 to f2 as desired And

after dening f2 f1 we need no more to dene 0f separately using Eq (133) We can just dene0f = f f simplify through the denition of in Eq (134) and reobtain Eq (133) as a derivedequation

0f = f f = λa1 dararr f (a1 oplus da) f a1

We dened f2 f1 on metalanguage functions We can also internalize change operators in ourobject language that is dene for each type τ object-level terms oplusτ τ and so on with the samebehavior We can dene object-language change operators such as using the same equationsHowever the produced function change df is slow because it recomputes the old output f1 a1computes the new output f2 a2 and takes the dierence

However we can implement σrarrτ using replacement changes if they are supported by thechange structure on type τ Let us dene τ as b2 b1 = b2 and simplify Eq (134) we obtain

f2 f1 = λa1 dararr (f2 (a1 oplus da))

We could even imagine allowing replacement changes on functions themselves However herethe bang operator needs to be dened to produce a function change that can be applied hence

f2 = λa1 dararr (f2 (a1 oplus da))

Alternatively as we see in Appendix D we could represent function changes not as functionsbut as data through defunctionalization and provide a function applying defunctionalized functionchanges df ∆(σ rarr τ ) to inputs t1 σ and dt ∆σ In this case f2 would simply be another way toproduce defunctionalized function changes

1323 Constraining oplus on functionsNext we discuss how oplus must be dened on functions and show informally why we must denef1oplusdf = λv rarr f1 xoplusdf v 0v to prove that oplus on functions agrees with validity (that is Lemma 1344)

We know that a valid function change df B f1 rarr f2 A rarr B takes valid input changesdv B v1 rarr v2 A to a valid output change df v1 dv B f1 v1 rarr f2 v2 B We require that oplus agreeswith validity (Lemma 1344) so f2 = f1 oplus df v2 = v1 oplus dv and

f2 v2 = (f1 oplus df ) (v1 oplus dv) = f1 v1 oplus df v1 dv (135)

Instantiating dv with 0v gives equation

(f1 oplus df ) v1 = (f1 oplus df ) (v1 oplus 0v) = f1 v1 oplus df v1 0v

which is not only a requirement on oplus for functions but also denes oplus eectively

133 Families of change structuresIn this section we derive change structures for A rarr B and A times B from two change structuresA and B The change structure on A rarr B enables dening change structures for function typesSimilarly the change structure on A times B enables dening a change structure for product types in alanguage plugin as described informally in Sec 115 In Sec 116 we also discuss informally changestructures for disjoint sums Formally we can derive a change structure for disjoint union of setsA + B (from change structures for A and B) and this enables dening change structures for sumtypes we have mechanized the required proofs but omit the tedious details here

1331 Change structures for function spacesSec 132 introduces informally how to dene change operations on Ararr B Next we dene formallychange structures on function spaces and then prove its operations respect their constraints

Denition 1331 (Change structure for Ararr B)Given change structures A and B we dene a change structure on their function space A rarr Bwritten Ararr B where

(a) The change set is dened as ∆(Ararr B) = Ararr ∆Ararr ∆B

(b) Validity is dened as

df B f1 rarr f2 Ararr B = foralla1 da a2 (da B a1 rarr a2 A)implies (df a1 da B f1 a1 rarr f2 a2 B)

(c) We dene change update by

f1 oplus df = λararr f1 a oplus df a 0a

(d) We dene dierence by

f2 f1 = λa dararr f2 (a oplus da) f1 a

(e) We dene 0 like in Lemma 1318 as

0f = f f

(f) We dene change composition as

df 1 df 2 = λa dararr df 1 a 0a df 2 a da

Lemma 1332Denition 1331 denes a correct change structure Ararr B

Proof bull We prove that oplus agrees with validity on A rarr B Consider f1 f2 A rarr B anddf B f1 rarr f2 A rarr B we must show that f1 oplus df = f2 By functional extensionality weonly need prove that (f1 oplus df ) a = f2 a that is that f1 a oplus df a 0a = f2 a Since oplus agrees withvalidity on B we just need to show that df a 0a B f1 a rarr f2 a B which follows because 0ais a valid change from a to a and because df is a valid change from f1 to f2

bull We prove that produces valid changes on Ararr B Consider df = f2 f1 for f1 f2 Ararr BFor any valid input da B a1 rarr a2 A we must show that df produces a valid output withthe correct vertexes that is that df a1 da B f1 a1 rarr f2 a2 B Since oplus agrees with validitya2 equals a1 oplus da By substituting away a2 and df the thesis becomes f2 (a1 oplus da) f1 a1 Bf1 a1 rarr f2 (a1 oplus da) B which is true because produces valid changes on B

bull 0 produces valid changes as proved in Lemma 1318

bull We prove that change composition preserves validity on Ararr B That is we must prove

df 1 a1 0a1 df 2 a1 da B f1 a1 rarr f3 a2 B

for every f1 f2 f3 df 1 df 2 a1 da a2 satifsfying df 1 B f1 rarr f2 Ararr B df 2 B f2 rarr f3 ArarrB and da B a1 rarr a2 A

Because change composition preserves validity on B itrsquos enough to prove that (1) df 1 a1 0a1 Bf1 a1 rarr f2 a1 B (2) df 2 a1 da B f2 a1 rarr f3 a2 B That is intuitively we create a compositechange using and it goes from f1 a1 to f3 a2 passing through f2 a1 Part (1) follows becausedf 1 is a valid function change from f1 to f2 applied to a valid change 0a1 from a1 to a1 Part(2) follows because df 2 is a valid function change from f2 to f3 applied to a valid change dafrom a1 to a2

1332 Change structures for productsWe can dene change structures on products A times B given change structures on A and B a changeon pairs is just a pair of changes all other change structure denitions distribute componentwisethe same way and their correctness reduce to the correctness on components

Change structures on n-ary products or records present no additional diculty

Denition 1333 (Change structure for A times B)Given change structures A and B we dene a change structure A times B on product A times B

(a) The change set is dened as ∆(A times B) = ∆A times ∆B

(da db) B (a1 b1) rarr (a2 b2) A times B =(da B a1 rarr a2 A) and (db B b1 rarr b2 B)

In other words validity distributes componentwise a product change is valid if each compo-nent is valid

(a1 b1) oplus (da db) = (a1 oplus da b1 oplus db)

(a2 b2) (a1 b1) = (a2 a1 b2 b1)

(e) We dene 0 to distribute componentwise

0ab = (0a 0b)

(f) We dene change composition to distribute componentwise

(da1 db1) (da2 db2) = (da1 da2 db1 db2)

Lemma 1334Denition 1333 denes a correct change structure A times B

Proof Since all these proofs are similar and spelling out their details does not make them clearerwe only give the rst such proof in full

bull oplus agrees with validity on A times B because oplus agrees with validity on both A and B For thisproperty we give a full proofFor each p1 p2 A times B and dp B p1 rarr p2 A times B we must show that p1 oplus dp = p2Instead of quantifying over pairs p A times B we can quantify equivalently over componentsa A b B Hence consider a1 a2 A b1 b2 B and changes da db that are valid that isda B a1 rarr a2 A and db B b1 rarr b2 B We must show that

(a1 b1) oplus (da db) = (a2 b2)

That follows from a1 oplus da = a2 (which follows from da B a1 rarr a2 A) and b1 oplus db = b2(which follows from db B b1 rarr b2 B)

bull produces valid changes on A times B because produces valid changes on both A and B Weomit a full proof the key step reduces the thesis

(a2 b2) (a1 b1) B (a1 b1) rarr (a2 b2) A times B

to a2 a1 B a1 rarr a2 A and b2 b1 B b1 rarr b2 B (where free variables range on suitabledomains)

bull 0ab is correct that is (0a 0b) B (a b) rarr (a b) AtimesB because 0 is correct on each component

bull Change composition is correct on A times B that is

(da1 da2 db1 db2) B (a1 b1) rarr (a3 b3) A times B

whenever (da1 db1) B (a1 b1) rarr (a2 b2) A times B and (da2 db2) B (a2 b2) rarr (a3 b3) A times Bin essence because change composition is correct on both A and B We omit details

134 Change structures for types and contextsAs promised given change structures for base types we can provide change structures for all types

Plugin Requirement 1341 (Change structures for base types)For each base type ι we must have a change structure ι dened on base set n ι o based on the basicchange structures dened earlier

Denition 1342 (Change structure for types)For each type τ we dene a change structure τ on base set nτ o

Lemma 1343Change sets and validity as dened in Denition 1342 give rise to the same basic change structuresas the ones dened earlier in Denition 12115 and to the change operations described in Fig 131a

Proof This can be veried by induction on types For each case it is sucient to compare deni-tions

Proof Because τ is a change structure and in change structures oplus agrees with validity

As shortly proved in Sec 122 since oplus agrees with validity (Lemma 1344) and D n ndash o is correct(Theorem 1222) we get Corollary 1345

We can also dene a change structure for environments Recall that change structures forproducts dene their operations to act on each component Each change structure operation forenvironments acts ldquovariable-wiserdquo Recall that a typing context Γ is a list of variable assignmentx τ For each such entry environments ρ and environment changes dρ contain a base entry x = vwhere v nτ o and possibly a change dx = dv where dv n∆τ o

Denition 1346 (Change structure for environments)To each context Γ we associate a change structure Γ that extends the basic change structure fromDenition 12125 Operations are dened as shown in Fig 131b

Base values v prime in environment changes are redundant with base values v in base environmentsbecause for valid changes v = v prime So when consuming an environment change we choose arbitrarilyto use v instead of v prime Alternatively we could also use v prime and get the same results for valid inputsWhen producing an environment change they are created to ensure validity of the resultingenvironment change

Lemma 1347Denition 1346 denes a correct change structure Γ for each context Γ

Proof All proofs are by structural induction on contexts Most details are analogous to the ones forproducts and add no details so we refer to our mechanization for most proofs

However we show by induction that oplus agrees with validity For the empty context therersquosa single environment n ε o so oplus returns the correct environment For the inductive caseΓprime x τ inversion on the validity judgment reduces our hypothesis to dv B v1 rarr v2 τ anddρ B ρ1 rarr ρ2 Γ and our thesis to (ρ1 x = v1) oplus (dρ x = v1 dx = dv) = (ρ2 x = v2) where v1appears both in the base environment and the environment change The thesis follows because oplusagrees with validity on both Γ and τ

We summarize denitions on types in Fig 131Finally we can lift change operators from the semantic level to the syntactic level so that their

meaning is coherent

Denition 1348 (Term-level change operators)We dene type-indexed families of change operators at the term level with the following signatures

oplusτ τ rarr ∆τ rarr ττ τ rarr τ rarr ∆τ0τ ndash τ rarr ∆ττ ∆τ rarr ∆τ rarr ∆τ

and denitions

tf 1 oplusσrarrτ dtf = λx rarr tf 1 x oplus dtf x 0xtf 2 σrarrτ tf 1 = λx dx rarr tf 2 (x oplus dx) tf 1 x0σrarrτ tf = tf σrarrτ tfdtf 1 σrarrτ dtf 2 = λx dx rarr dtf 1 x 0x dtf 2 x dxtf 1 oplusι dtf = tf 2 ι tf 1 = 0ι tf = dtf 1 ι dtf 2 =

Lemma 1349 (Term-level change operators agree with change structures)The following equations hold for all types τ contexts Γ well-typed terms Γ ` t1 t2 τ ∆Γ `dt dt1 dt2 ∆τ and environments ρ n Γ o dρ n∆Γ o such that all expressions are dened

n t1 oplusτ dt o dρ = n t1 o dρ oplus n dt o dρn t2 τ t1 o ρ = n t2 o ρ oplus n t1 o ρ0τ t

ρ = 0n t o ρ

n dt1 τ dt2 o dρ = n dt1 o dρ n dt2 o dρ

Proof By induction on types and simplifying both sides of the equalities The proofs for oplus and must be done by simultaneous induction

At the time of writing we have not mechanized the proof for To dene the lifting and prove it coherent on base types we must add a further plugin require-

mentPlugin Requirement 13410 (Term-level change operators for base types)For each base type ι we dene change operators as required by Denition 1348 and satisfyingrequirements for Lemma 1349

135 Development historyThe proof presented in this and the previous chapter is an signicant evolution of the original oneby Cai et al [2014] While this formalization and the mechanization are both original with thisthesis some ideas were suggested by other (currently unpublished) developments by Yufei Cai andby Yann Reacutegis-Gianas Yufei Cai gives a simpler pen-and-paper set-theoretic proof by separatingvalidity while we noticed separating validity works equally well in a mechanized type theory andsimplies the mechanization The rst to use a two-sided validity relation was Yann Reacutegis-Gianasbut using a big-step operational semantics while we were collaborating on an ILC correctness prooffor untyped λ-calculus (as in Appendix C) I gave the rst complete and mechanized ILC correctnessproof using two-sided validity again for a simply-typed λ-calculus with a denotational semanticsBased on two-sided validity I also reconstructed the rest of the theory of changes

136 Chapter conclusionIn this chapter we have seen how to dene change operators both on semantic values nad onterms and what are their guarantees via the notion of change structures We have also denedchange structures for groups function spaces and products Finally we have explained and shownCorollary 1345 We continue in next chapter by discussing how to reason syntactically aboutchanges before nally showing how to dene some language plugins

oplusτ nτ rarr ∆τ rarr τ oτ nτ rarr τ rarr ∆τ o

0ndash nτ rarr ∆τ oτ n∆τ rarr ∆τ rarr ∆τ o

f1 oplusσrarrτ df = λv rarr f1 v oplus df v 0vv1 oplusι dv = f2 σrarrτ f1 = λv dv rarr f2 (v oplus dv) f1 vv2 ι v1 = 0v = v τ vdv1 ι dv2 = df 1 σrarrτ df 2 = λv dv rarr df 1 v 0v df 2 v dv

(a) Change structure operations on types (see Denition 1342)oplusΓ n Γ rarr ∆Γ rarr Γ oΓ n Γ rarr Γ rarr ∆Γ o

0ndash n Γ rarr ∆Γ oΓ n∆Γ rarr ∆Γ rarr ∆Γ o

oplus =

(ρ x = v) oplus (dρ x = v prime dx = dv) = (ρ oplus dρ x = v oplus dv) =

(ρ2 x = v2) (ρ1 x = v1) = (ρ2 ρ1 x = v1 dx = v2 v1)0 =

0ρx=v = (0ρ x = v dx = 0v) = (dρ1 x = v1 dx = dv1) (dρ2 x = v2 dx = dv2) =(dρ1 dρ2 x = v1 dx = dv1 dv2)

(b) Change structure operations on environments (see Denition 1346)

Figure 131 Dening change structures

Chapter 14

Equational reasoning on changes

In this chapter we formalize equational reasoning directly on terms rather than on semantic values(Sec 141) and we discuss when two changes can be considered equivalent (Sec 142) We also showas an example a simple change structure on lists and a derivative of map for it (Example 1411)

To reason on terms instead of describing the updated value of a term t by using an updatedenvironment ρ2 we substitute in t each variable xi with expression xi oplus dxi to produce a term thatcomputes the updated value of t so that we can say that dx is a change from x to x oplus dx or thatdf x dx is a change from f x to (f oplus df ) (x oplus dx) Lots of the work in this chapter is needed tomodify denitions and go from using an updated environment to using substitution in this fashion

To compare for equivalence terms that use changes we canrsquot use denotational equivalence butmust restrict to consider valid changes

Comparing changes is trickier most often we are not interested in whether two changes producethe same value but whether two changes have the same source and destination Hence if twochanges share source and destination we say they are equivalent As we show in this chapteroperations that preserve validity also respect change equivalence because for all those operationsthe source and destination of any output changes only depend on source and destination of inputchanges Among the same source and destination there often are multiple changes and the dierenceamong them can aect how long a derivative takes but not whether the result is correct

We also show that change equivalence is a particular form of logical relation a logical partialequivalence relation (PER) PERs are well-known to semanticists and often used to study sets withinvalid elements together with the appropriate equivalence on these sets

The rest of the chapter expands on the details even if they are not especially surprising

141 Reasoning on changes syntacticallyTo dene derivatives of primitives we will often discuss validity of changes directly on programsfor instance saying that dx is a valid change from x to x oplus dx or that f x oplus df x dx is equivalent tof (x oplus dx) if all changes in scope are valid

In this section we formalize these notions We have not mechanized proofs involving substitu-tions but we include them here even though they are not especially interesting

But rst we exemplify informally these notions

Example 1411 (Derivingmap on lists)Letrsquos consider again the example from Sec 1133 in particular dmap Recall that a list changedxs is valid for source xs if they have the same length and each element change is valid for its

110 Chapter 14 Equational reasoning on changes

corresponding element

In our example one can show that dmap is a correct derivative for map As a consequenceterms map (f oplus df ) (xs oplus dxs) and map f xs oplus dmap f df xs dxs are interchangeable in all validcontexts that is contexts that bind df and dxs to valid changes respectively for f and xs

We sketch an informal proof directly on terms

Proof sketch We must show that dy = dmap f df xs dxs is a change change from initial outputy1 = map f xs to updated output y2 = map (f oplus df ) (xs oplus dxs) for valid inputs df and dxs

We proceed by structural induction on xs and dxs (technically on a proof of validity of dxs)Since dxs is valid those two lists have to be of the same length If xs and dxs are both emptyy1 = dy = y2 = Nil so dy is a valid change as required

For the inductive step both lists are Cons nodes so we need to show that output change

dy = dmap f df (Cons x xs) (Cons dx dxs)

is a valid change fromy1 = map f (Cons x xs)

toy2 = map (f oplus df ) (Cons (x oplus dx) (xs oplus dxs))

To restate validity we name heads and tails of dy y1 y2 If we write dy = Cons dh dt y1 =Cons h1 t1 and y2 = Cons h2 t2 we need to show that dh is a change from h1 to h2 and dt is a changefrom t1 to t2

Indeed head change dh = df x dx is a valid change from h1 = f x to h2 = f (x oplus dx) And tailchange dt = dmap f df xs dxs is a valid change from t1 = map f xs to t2 = map (f oplus df ) (xs oplus dxs)by induction Hence dy is a valid change from y1 to y2

Hopefully this proof is already convincing but it relies on undened concepts On a metalevelfunction we could already make this proof formal but not so on terms yet In this section wedene the required concepts

1411 Denotational equivalence for valid changesThis example uses the notion of denotational equivalence for valid changes We now proceed toformalize it For reference we recall denotational equivalence of terms and then introduce itsrestriction

Chapter 14 Equational reasoning on changes 111

Denition A25 (Denotational equivalence)We say that two terms Γ ` t1 τ and Γ ` t2 τ are denotationally equal and write Γ t1 t2 τ (orsometimes t1 t2) if for all environments ρ n Γ o we have that n t1 o ρ = n t2 o ρ

For open terms t1 t2 that depend on changes denotational equivalence is too restrictive since itrequires t1 and t2 to also be equal when the changes they depend on are not valid By restrictingdenotational equivalence to valid environment changes and terms to depend on contexts we obtainthe following denitionDenition 1412 (Denotational equivalence for valid changes)For any context Γ and type τ we say that two terms ∆Γ ` t1 τ and ∆Γ ` t2 τ are denotationallyequal for valid changes and write ∆Γ t1 ∆ t2 τ if for all valid environment changes dρ B ρ1 rarrρ2 Γ we have that t1 and t2 evaluate in environment dρ to the same value that is n t1 o dρ =n t2 o dρ

Example 1413Terms f x oplus df x dx and f (x oplus dx) are denotationally equal for valid changes (for any types σ τ )∆(f σ rarr τ x σ ) f x oplus df x dx ∆ f (x oplus dx) τ

Example 1414One of our claims in Example 1411 can now be written as

∆Γ map (f oplus df ) (xs oplus dxs) ∆ map f xs oplus dmap f df xs dxs List b

for a suitable context Γ = f List σ rarr List τ xs List σ map (σ rarr τ ) rarr List σ rarr List τ (andfor any types σ τ )

Arguably the need for a special equivalence is a defect in our semantics of change programs itmight be more preferable to make the type of changes abstract throughout the program (except forderivatives of primitives which must inspect derivatives) but this is not immediate especially in amodule system like Haskell Other possible alternatives are discussed in Sec 154

1412 Syntactic validityNext we dene syntactic validity that is when a change term dt is a (valid) change from sourceterm t1 to destination t2 Intuitively dt is valid from t1 to t2 if dt t1 and t2 evaluated all against thesame valid environment change dρ produce a valid change its source and destination FormallyDenition 1415 (Syntactic validity)We say that term ∆Γ ` dt ∆τ is a (syntactic) change from ∆Γ ` t1 τ to ∆Γ ` t2 τ and writeΓ |= dt I t1 rarr t2 τ if

foralldρ B ρ1 rarr ρ2 Γ n dt o dρ B n t1 o dρ rarr n t2 o dρ τ

Notation 1416We often simply say that dt is a change from t1 to t2 leaving everything else implicit when notimportant

Using syntactic validity we can show for instance that dx is a change from x to x oplus dx thatdf x dx is a change from f x to (f oplusdf ) (xoplusdx) both examples follow from a general statement aboutD n ndash o (Theorem 1419) Our earlier informal proof of the correctness of dmap (Example 1411)can also be justied in terms of syntactic validity

Just like (semantic) oplus agrees with validity term-level (or syntactic) oplus agrees with syntacticvalidity up to denotational equivalence for valid changes

Lemma 1417 (Term-level oplus agrees with syntactic validity)If dt is a change from t1 to t2 (Γ |= dt I t1 rarr t2 τ ) then t1 oplus dt and t2 are denotationally equal forvalid changes (∆Γ t1 oplus dt ∆ t2 τ )

Proof Follows because term-level oplus agrees with semantic oplus (Lemma 1349) and oplus agrees withvalidity In more detail x an arbitrary valid environment change dρ B ρ1 rarr ρ2 Γ First we haven dt o dρ B n t1 o dρ rarr n t2 o dρ τ because of syntactic validity Then we conclude with thiscalculation

n t1 oplus dt o dρ= term-level oplus agrees with oplus (Lemma 1349)

n t1 o dρ oplus n dt o dρ= oplus agrees with validity

n t2 o dρ

Beware that the denition of Γ |= dt I t1 rarr t2 τ evaluates all terms against changeenvironment dρ containing separately base values and changes In contrast if we use validity inthe change structure for n Γ orarr nτ o we evaluate dierent terms against dierent environmentsThat is why we have that dx is a change from x to x oplus dx (where x oplus dx is evaluated in environmentdρ) while n dx o is a valid change from n x o to n x o (where the destination n x o gets applied toenvironment ρ2)

Is syntactic validity trivial Without needing syntactic validity based on earlier chapters onecan show that dv is a valid change from v to v oplus dv or that df v dv is a valid change from f vto (f oplus df ) (v oplus dv) or further examples But thatrsquos all about values In this section we are justtranslating these notions to the level of terms and formalize them

Our semantics is arguably (intuitively) a trivial one similar to a metainterpreter interpretingobject-language functions in terms of metalanguage functions our semantics simply embeds anobject-level λ-calculus into a meta-level and more expressive λ-calculus mapping for instanceλf x rarr f x (an AST) into λf v rarr f v (syntax for a metalevel function) Hence proofs in thissection about syntactic validity deal mostly with this translation We donrsquot expect the proofs togive special insights and we expect most development would keep such issues informal (which iscertainly reasonable)

Nevertheless we write out the statements to help readers refer to them and write out (mostly)full proofs to help ourselves (and readers) verify them Proofs are mostly standard but with a fewtwists since we must often consider and relate three computations the computation on initialvalues and the ones on the change and on updated values

Wersquore also paying a proof debt Had we used substitution and small step semantics wersquod havedirectly simple statements on terms instead of trickier ones involving terms and environments Weproduce those statements now

Dierentiation and syntactic validity

We can also show that D n t o produces a syntactically valid change from t but we need to specifyits destination In general D n t o is not a change from t to t The destination must evaluate to theupdated value of t to produce a term that evaluates to the right value we use substitution If theonly free variable in t is x then D n t o is a syntactic change from t to t [x = x oplus dx ] To repeat thesame for all variables in context Γ we use the following notation

Notation 1418We write t [Γ = Γ oplus ∆Γ ] to mean t [x1 = x1 oplus dx1 x2 = x2 oplus dx2 xn = xn oplus dxn ]

Theorem 1419 (D n ndasho is correct syntactically)For any well-typed term Γ ` t τ term D n t o is a syntactic change from t to t [Γ = Γ oplus ∆Γ ]

We present the following straightforward (if tedious) proof (formal but not mechanized)

Proof Let t2 = t [Γ = Γ oplus ∆Γ ] Take any dρ B ρ1 rarr ρ2 Γ We must show that nD n t o o dρ Bn t o dρ rarr n t2 o dρ τ

Because dρ extend ρ1 and t only needs entries in ρ1 we can show that n t o dρ = n t o ρ1 soour thesis becomes nD n t o o dρ B n t o ρ1 rarr n t2 o dρ τ

Because D n ndash o is correct (Theorem 1222) we know that nD n t o o dρ B n t o ρ1 rarr n t o ρ2 τ thatrsquos almost our thesis so we must only show that n t2 o dρ = n t o ρ2 Since oplus agrees with validityand dρ is valid we have that ρ2 = ρ1 oplus dρ so our thesis is now the following equation which weleave to Lemma 14110

n t o [Γ = Γ oplus ∆Γ ] dρ = n t o (ρ1 oplus dρ)

Herersquos the technical lemma to complete the proof

Lemma 14110For any Γ ` t τ and dρ B ρ1 rarr ρ2 Γ we have

Proof This follows from the structure of valid environment changes because term-level oplus (used onthe left-hand side) agrees with value-level oplus (used on the right-hand side) by Lemma 1349 andbecause of the substitution lemma

More formally we can show the thesis by induction over environments for empty environmentsthe equations reduces to n t o = n t o For the case of Γ x σ (where x lt Γ) the thesis can berewritten as

n (t [Γ = Γ oplus ∆Γ ]) [x = x oplus dx ] o (dρ x = v1 dx = dv) = n t o (ρ1 oplus dρ x = v1 oplus dv)

We prove it via the following calculation

n (t [Γ = Γ oplus ∆Γ ]) [x = x oplus dx ] o (dρ x = v1 dx = dv)= Substitution lemma on x

n (t [Γ = Γ oplus ∆Γ ]) o (dρ x = (n x oplus dx o (dρ x = v1 dx = dv)) dx = dv)= Term-level oplus agrees with oplus (Lemma 1349)

Then simplify n x o and n dx o n (t [Γ = Γ oplus ∆Γ ]) o (dρ x = v1 oplus dv dx = dv)

= t [Γ = Γ oplus ∆Γ ] does not mention dx So we can modify the environment entry for dx This makes the environment into a valid environment change

n (t [Γ = Γ oplus ∆Γ ]) o (dρ x = v1 oplus dv dx = 0v1oplusdv)= By applying the induction hypothesis and simplifying 0 away

n t o (ρ1 oplus dρ x = v1 oplus dv)

142 Change equivalenceTo optimize programs manipulate changes we often want to replace a change-producing term byanother one while preserving the overall program meaning Hence we dene an equivalence onvalid changes that is preserved by change operations that is (in spirit) a congruence We call thisrelation (change) equivalence and refrain from using other equivalences on changes

Earlier (say in Sec 153) we have sometimes required that changes be equal but thatrsquos often toorestrictive

Change equivalence is dened in terms of validity to ensure that validity-preserving operationspreserve change equivalence If two changes dv1 and dv2 are equivalent one can be substituted forthe other in a validity-preserving context We can dene this once for all change structures hencein particular for values and environmentsDenition 1421 (Change equivalence)Given a change structure V changes dva dvb ∆V are equivalent relative to source v1 V (writtendva =∆ dvb B v1 rarr v2 V ) if and only if there exists v2 such that both dva and dvb are valid fromv1 to v2 (that is dva B v1 rarr v2 V dvb B v1 rarr v2 V )

Notation 1422When not ambiguous we often abbreviate dva =∆ dvb B v1 rarr v2 V as dva =v1∆ dvb or dva =∆ dvb

Two changes are often equivalent relative to a source but not others Hence dva =∆ dvb isalways an abbreviation for change equivalence for a specic source

Example 1423For instance we later use a change structure for integers using both replacement changes anddierences (Example 1216) In this structure change 0 is nil for all numbers while change 5 (ldquobang5rdquo) replaces any number with 5 Hence changes 0 and 5 are equivalent only relative to source 5and we write 0 =5∆5

By applying denitions one can verify that change equivalence relative to a source v is asymmetric and transitive relation on ∆V However it is not an equivalence relation on ∆V becauseit is only reexive on changes valid for source v Using the set-theoretic concept of subset we canthen state the following lemma (whose proof we omit as it is brief)Lemma 1424 (=∆ is an equivalence on valid changes)For each set V and source v isin V change equivalence relative to source v is an equivalence relationover the set of changes

dv isin ∆V | dv is valid with source v

We elaborate on this peculiar sort of equivalence in Sec 1423

1421 Preserving change equivalenceChange equivalence relative to a source v is respected in an appropriate sense by all validity-preserving expression contexts that accept changes with source v To explain what this means westudy an example lemma we show that because valid function changes preserve validity they alsorespect change equivalence At rst we use ldquo(expression) contextrdquo informally to refer to expressioncontexts in the metalanguage Later wersquoll extend our discussion to actual expression contexts inthe object languageLemma 1425 (Valid function changes respect change equivalence)Any valid function change

df B f1 rarr f2 Ararr B

respects change equivalence if dva =∆ dvb B v1 rarr v2 A then df v1 dva =∆ df v1 dvb B f1 v1 rarrf2 v2 B We also say that (expression) context df v1 ndash respects change equivalence

Proof The thesis means that df v1 dva B f1 v1 rarr f2 v2 B and df v1 dvb B f1 v1 rarr f2 v2 B Bothequivalences follow in one step from validity of df dva and dvb

This lemma holds because the source and destination of df v1 dv donrsquot depend on dv onlyon its source and destination Source and destination are shared by equivalent changes Hencevalidity-preserving functions map equivalent changes to equivalent changes

In general all operations that preserve validity also respect change equivalence because for allthose operations the source and destination of any output changes and the resulting value onlydepend on source and destination of input changes

However Lemma 1425 does not mean that df v1 dva = df v1 dvb because there can bemultiple changes with the same source and destination For instance say that dva is a list changethat removes an element and readds it and dvb is a list change that describes no modication Theyare both nil changes but a function change might handle them dierently

Moreover we only proved that context df v1 ndash respects change equivalence relative to sourcev1 If value v3 diers from v1 df v3 dva and df v3 dvb need not be equivalent Hence we saythat context df v1 accepts changes with source v1 More in general a context accepts changes withsource v1 if it preserves validity for changes with source v1 we can say informally that all suchcontexts also respect change equivalence

Another example context v1oplusndash also accepts changes with source v1 Since this context producesa base value and not a change it maps equivalent changes to equal resultsLemma 1426 (oplus respects change equivalence)If dva =∆ dvb B v1 rarr v2 V then v1 oplus ndash respects the equivalence between dva and dvb that isv1 oplus dva = v1 oplus dvb

Proof v1 oplus dva = v2 = v1 oplus dvb

There are more contexts that preserve equivalence As discussed function changes preserve con-texts and D n ndash o produces functions changes so D n t o preserves equivalence on its environmentand on any of its free variablesLemma 1427 (D n ndasho preserves change equivalence)For any term Γ ` t τ D n t o preserves change equivalence of environments for all dρa =∆ dρb Bρ1 rarr ρ2 Γ we have nD n t o o dρa =∆ nD n t o o dρb B n t o ρ1 rarr n t o ρ2 Γ rarr τ

Proof To verify this just apply correctness of dierentiation to both changes dρa and dρb

To show more formally in what sense change equivalence is a congruence we rst lift changeequivalence to terms (Denition 1429) similarly to syntactic change validity in Sec 1412 To doso we rst need a notation for one-sided or source-only validityNotation 1428 (One-sided validity)We write dv B v1 V to mean there exists v2 such that dv B v1 rarr v2 V We will reuseexisting conventions and write dv B v1 τ instead of dv B v1 nτ o and dρ B ρ1 Γ instead ofdρ B ρ1 n Γ o

Denition 1429 (Syntactic change equivalence)Two change terms ∆Γ ` dta ∆τ and ∆Γ ` dtb ∆τ are change equivalent relative to sourceΓ ` t τ if for all valid environment changes dρ B ρ1 rarr ρ2 Γ we have that

n dta o dρ =∆ n dtb o dρ B n t o ρ1 τ

We write then Γ |= dta =∆ dtb I t τ or dta =t∆ dtb or simply dta =∆ dtb

Saying that dta and dtb are equivalent relative to t does not specify the destination of dta and dtb only their source The only reason is to simplify the statement and proof of Theorem 14210

If two change terms are change equivalent with respect to the right source we can replace onefor the other in an expression context to optimize a program as long as the expression context isvalidity-preserving and accepts the change

In particular substituting into D n t o preserves syntactic change equivalence according to thefollowing theorem (for which we have only a pen-and-paper formal proof)

Theorem 14210 (D n ndasho preserves syntactic change equivalence)For any equivalent changes Γ |= s I dsa =∆ dsb σ and for any term t typed as Γ x σ ` t τ wecan produce equivalent results by substituting into D n t o either s and dsa or s and dsb

Γ |= D n t o [x = s dx = dsa ] =∆ D n t o [x = s dx = dsb ] I t [x = s ] τ

Proof sketch The thesis holds because D n ndash o preserves change equivalence Lemma 1427 A formalproof follows through routine (and tedious) manipulations of bindings In essence we can extend achange environment dρ for context Γ to equivalent environment changes for context Γ x σ withthe values of dsa and dsb The tedious calculations follow

Proof Assume dρ B ρ1 rarr ρ2 Γ Because dsa and dsb are change-equivalent we have

n dsa o dρ =∆ n dsb o dρ B n s o ρ1 σ

Moreover n s o ρ1 = n s o dρ because dρ extends ρ1 Wersquoll use this equality without explicitmention

Hence we can construct change-equivalent environments for evaluating D n t o by combiningdρ and the values of respectively dsa and dsb

(dρ x = n s o dρ dx = n dsa o dρ) =∆

(dρ x = n s o dρ dx = n dsb o dρ)B (ρ1 x = n s o ρ1) (Γ x σ ) (141)

This environment change equivalence is respected by D n t o hence

nD n t o o (dρ x = n s o dρ dx = n dsa o dρ) =∆

nD n t o o (dρ x = n s o dρ dx = n dsb o dρ)B n t o (ρ1 x = n s o ρ1) Γ rarr τ (142)

We want to deduce the thesis by applying to this statement the substitution lemma for denotationalsemantics n t o (ρ x = n s o ρ) = n t [x = s ] o ρ

To apply the substitution lemma to the substitution of dx we must adjust Eq (142) usingsoundness of weakening We get

nD n t o o (dρ x = n s o dρ dx = n dsa o (dρ x = n s o dρ)) =∆

nD n t o o (dρ x = n s o dρ dx = n dsb o (dρ x = n s o dρ))B n t o (ρ1 x = n s o ρ1) Γ rarr τ (143)

This equation can now be rewritten (by applying the substitution lemma to the substitutions ofdx and x) to the following one

n (D n t o [dx = dsa ] [x = s ]) o dρ =∆

n (D n t o [dx = dsb ] [x = s ]) o dρB n t [x = s ] o ρ1 Γ rarr τ (144)

Since x is not in scope in s dsa dsb we can permute substitutions to conclude that

as required

In this theorem if x appears once in t then dx appears once in D n t o (this follows by induction ont) hence D n t o [x = s dx = ndash] produces a one-hole expression context

Further validity-preserving contexts There are further operations that preserve validity Torepresent terms with ldquoholesrdquo where other terms can be inserted we can dene one-level contexts F and contexts E as is commonly done

F = [ ] t dt | ds t [ ] | λx dx rarr [ ] | t oplus [ ] | dt1 [ ] | [ ] dt2E = [ ] | F [E ]

If dt1 =∆ dt2 B t1 rarr t2 τ and our context E accepts changes from t1 then F [dt1 ] and F [dt2 ] arechange equivalent It is easy to prove such a lemma for each possible shape of one-level context F both on values (like Lemmas 1425 and 1426) and on terms We have been unable to state a moregeneral theorem because itrsquos not clear how to formalize the notion of a context accepting a changein general the syntax of a context does not always hint at the validity proofs embedded

Cai et al [2014] solve this problem for metalevel contexts by typing them with dependenttypes but as discussed the overall proof is more awkward Alternatively it appears that the use ofdependent types in Chapter 18 also ensures that change equivalence is a congruence (though atpresent this is still a conjecture) without overly complicating correctness proofs However it is notclear whether such a type system can be expressive enough without requiring additional coercionsConsider a change dv1 from v1 to v1 oplus dv1 a value v2 which is known to be (propositionally) equalto v1 oplus dv1 and a change dv2 from v2 to v3 Then term dv1 dv2 is not type correct (for instancein Agda) the typechecker will complain that dv1 has destination v1 oplus dv1 while dv2 has source v2When working in Agda to solve this problem we can explicitly coerce terms through propositionalequalities and can use Agda to prove such equalities in the rst place We leave the design of asuciently expressive object language where change equivalence is a congruence for future work

1422 Sketching an alternative syntaxIf we exclude composition we can sketch an alternative syntax which helps construct a congruenceon changes The idea is to manipulate instead of changes alone pairs of sources v isin V and validchanges dv | dv B v V

Adapting dierentiation to this syntax is easy

D n x o = dxD n s t o = D n s o D n t oD n λx rarr t o = D n λdx rarr dt o

Derivatives of primitives only need to use dst dt instead of t oplus dt and src dt instead of t wheneverdt is a change for t

With this syntax we can dene change expression contexts which can be lled in by changeexpressions

We conjecture change equivalence is a congruence with respect to contexts dE and that contextsE map change-equivalent changes to results that are denotationally equivalent for valid changes Weleave a proof to future work but we expect it to be straightforward It also appears straightforwardto provide an isomorphism between this syntax and the standard one

However extending such contexts with composition does not appear trivial contexts such asdt dE or dE dt only respect validity when the changes sources and destinations align correctly

We make no further use of this alternative syntax in this work

1423 Change equivalence is a PERReaders with relevant experience will recognize that change equivalence is a partial equivalencerelation (PER) [Mitchell 1996 Ch 5] It is standard to use PERs to identify valid elements in amodel [Harper 1992] In this section we state the connection showing that change equivalence isnot an ad-hoc construction so that mathematical constructions using PERs can be adapted to usechange equivalence

We recall the denition of a PERDenition 14211 (Partial equivalence relation (PER))A relation R sube StimesS is a partial equivalence relation if it is symmetric (if aRb then bRa) and transitive(if aRb and bRc then aRc)

Elements related to another are also related to themselves If aRb then aRa (by transitivity aRbbRa hence aRa) So a PER on S identies a subset of valid elements of S Since PERs are equivalencerelations on that subset they also induce a (partial) partition of elements of S into equivalenceclasses of change-equivalent elements

Lemma 14212 (=∆ is a PER)Change equivalence relative to a source a A is a PER on set ∆A

Proof A restatement of Lemma 1424

Typically one studies logical PERs which are logical relations and PERs at the same time [Mitchell1996 Ch 8] In particular with a logical PER two functions are related if they map related inputs torelated outputs This helps showing that a PERs is a congruence Luckily our PER is equivalent tosuch a denitionLemma 14213 (Alternative denition for =∆)Change equivalence is equivalent to the following logical relation

dva =∆ dvb B v1 rarr v2 ι =defdva B v1 rarr v2 ι and dva B v1 rarr v2 ι

df a =∆ df b B f1 rarr f2 σ rarr τ =defforalldva =∆ dvb B v1 rarr v2 σ df a v1 dva =∆ df b v2 dvb B f1 v1 rarr f2 v2 τ

Proof By induction on types

143 Chapter conclusionIn this chapter we have put on a more solid foundation forms of reasoning about changes on termsand dened an appropriate equivalence on changes

Chapter 15

Extensions and theoreticaldiscussion

In this chapter we collect discussion of a few additional topics related to ILC that do not sucefor standalone chapters We show how to dierentiation general recursion Sec 151 we exhibita function change that is not valid for any function (Sec 152) we contrast our representation offunction changes with pointwise function changes (Sec 153) and we compare our formalizationwith the one presented in [Cai et al 2014] (Sec 154)

151 General recursionThis section discusses informally how to dierentiate terms using general recursion and what is thebehavior of the resulting terms

1511 Dierentiating general recursionEarlier we gave a rule for deriving (non-recursive) let

in D n t2 oIt turns out that we can use the same rule also for recursive let-bindings which we write here (andonly here) letrec for distinction

D n letrec x = t1 in t2 o = letrec x = t1dx = D n t1 o

in D n t2 oExample 1511In Example 1411 we presented a derivative dmap for map By using rules for dierentiatingrecursive functions we obtain dmap as presented from map

dmap f df Nil Nil = Nildmap f df (Cons x xs) (Cons dx dxs) =Cons (df x dx) (dmap f df xs dxs)

122 Chapter 15 Extensions and theoretical discussion

However derivative dmap is not asymptotically faster than map Even when we consider lesstrivial change structures derivatives of recursive functions produced using this rule are often notasymptotically faster Deriving letrec x = t1 in t2 can still be useful if D n t1 o andor D n t2 ois faster than its base term but during our work we focus mostly on using structural recursionAlternatively in Chapter 11 and Sec 112 we have shown how to incrementalize functions (includingrecursive ones) using equational reasoning

In general when we invoke dmap on a change dxs from xs1 to xs2 it is important that xs1 and xs2are similar enough that enough computation can be reused Say that xs1 = Cons 2 (Cons 3 (Cons 4 Nil))and xs2 = Cons 1 (Cons 2 (Cons 3 (Cons 4 Nil))) in this case a change modifying each elementof xs1 and then replacing Nil by Cons 4 Nil would be inecient to process and naive incremen-talization would produce this scenario In this case it is clear that a preferable change shouldsimply insert 1 at the beginning of the list as illustrated in Sec 1132 (though we have omitted thestraightforward denition of dmap for such a change structure) In approaches like self-adjustingcomputation this is ensured by using memoization In our approach instead we rely on changesthat are nil or small to detect when a derivative can reuse input computation

The same problem aects naive attempts to incrementalize for instance a simple factorialfunction we omit details Because of these issues we focus on incrementalization of structurally re-cursive functions and on incrementalizing generally recursive primitives using equational reasoningWe return to this issue in Chapter 20

1512 JusticationHere we justify informally the rule for dierentiating recursive functions using xpoint operators

Letrsquos consider STLC extended with a family of standard xpoint combinators xτ (τ rarr τ ) rarr τ with x-reduction dened by equation x f rarr f (x f ) we search for a denition of D nx f o

Using informal equational reasoning if a correct denition of D nx f o exists it must satisfy

D nx f o x (D n f o (x f ))

We can proceed as follows

D nx f o= imposing that D n ndash o respects x-reduction here

D n f (x f ) o= using rules for D n ndash o on application

D n f o (x f ) D nx f o

This is a recursive equation in D nx f o so we can try to solve it using x itself

D nx f o = x (λdxf rarr D n f o (x f ) dxf )

Indeed this rule gives a correct derivative Formalizing our reasoning using denotationalsemantics would presumably require the use of domain theory Instead we prove correct a variantof x in Appendix C but using operational semantics

152 Completely invalid changesIn some change sets some changes might not be valid relative to any source In particular we canconstruct examples in ∆(Zrarr Z)

Chapter 15 Extensions and theoretical discussion 123

To understand why this is plausible we recall that as described in Sec 153 df can be decomposedinto a derivative and a pointwise function change that is independent of da While pointwisechanges can be dened arbitrarily the behavior of the derivative of f on changes is determined bythe behavior of f

Example 1521We search for a function change df ∆(Zrarr Z) such that there exist no f1 f2 Zrarr Z for whichdf B f1 rarr f2 Zrarr Z To nd df we assume that there are f1 f2 such that df B f1 rarr f2 Zrarr Zprove a few consequences and construct df that cannot satisfy them Alternatively we could pickthe desired denition for df right away and prove by contradiction that there exist no f1 f2 suchthat df B f1 rarr f2 Zrarr Z

Recall that on integers a1oplusda = a1+da and that da B a1 rarr a2 Zmeans a2 = a1oplusda = a1+daSo for any numbers a1 da a2 such that a1 + da = a2 validity of df implies that

f2 (a1 + da) = f1 a1 + df a1 da

For any two numbers b1 db such that b1 + db = a1 + da we have that

f1 a1 + df a1 da = f2 (a1 + da) = f2 (b1 + db) = f1 b1 + df b1 db

Rearranging terms we have

df a1 da minus df b1 db = f1 b1 minus f1 a1

that is df a1 da minus df b1 db does not depend on da and dbFor concreteness let us x a1 = 0 b1 = 1 and a1 + da = b1 + db = s We have then that

df 0 s minus df 1 (s minus 1) = f1 1 minus f1 0

Once we set h = f1 1 minus f1 0 we have df 0 s minus df 1 (s minus 1) = h Because s is just the sum of twoarbitrary numbers while h only depends on f1 this equation must hold for a xed h and for allintegers s

To sum up we assumed for a given df there exists f1 f2 such that df B f1 rarr f2 Zrarr Z andconcluded that there exists h = f1 1 minus f1 0 such that for all s

df 0 s minus df 1 (s minus 1) = h

At this point we can try concrete families of functions df to obtain a contradiction Substitutinga linear polynomial df a da = c1 middot a + c2 middot da fails to obtain a contradiction in fact we can constructvarious f1 f2 such that df B f1 rarr f2 Z rarr Z So we try quadratic polynomials Substitutingdf a da = c middot da2 succeeds we have that there is h such that for all integers s

c middot(s2 minus (s minus 1)2

However c middot(s2 minus (s minus 1)2

)= 2 middot c middot s minus c which isnrsquot constant so there can be no such h

153 Pointwise function changesWe can also decompose function changes into orthogonal (and possibly easier to understand)concepts

Consider two functions f1 f2 Ararr B and two inputs a1 a2 A The dierence between f2 a2and f1 a1 is due to changes to both the function and its argument We can compute the whole

change at once via a function change df as df a1 da Or we can compute separately the eects ofthe function change and of the argument change We can account for changes from f1 a1 to f2 a2using f prime1 a derivative of f1 f prime1 a1 da = f1 a2 f1 a2 = f1 (a1 oplus da) f a11

We can account for changes from f1 to f2 using the pointwise dierence of two functions nablaf1 =λ(a A) rarr f2 a f1 a in particular f2 (a1 oplus da) f1 (a1 oplus da) = nablaf (a1 oplus da) Hence a functionchange simply combines a derivative with a pointwise change using change composition

df a1 da = f2 a2 f1 a1= (f1 a2 f1 a1) (f2 a2 f1 a2)= f prime1 a1 da nablaf (a1 oplus da)

One can also compute a pointwise change from a function change

nabla f a = df a 0a

While some might nd pointwise changes a more natural concept we nd it easier to use ourdenitions of function changes which combines both pointwise changes and derivatives into asingle concept Some related works explore the use of pointwise changes we discuss them inSec 1922

154 Modeling only valid changesIn this section we contrast briey the formalization of ILC in this thesis (for short ILCrsquo17) with theone we used in our rst formalization [Cai et al 2014] (for short ILCrsquo14) We keep the discussionsomewhat informal we have sketched proofs of our claims and mechanized some but we omit allproofs here We discuss both formalizations using our current notation and terminology except forconcepts that are not present here

Both formalizations model function changes semantically but the two models we present aredierent Overall ILCrsquo17 uses simpler machinery and seems easier to extend to more general baselanguages and its mechanization of ILCrsquo17 appears simpler and smaller Instead ILCrsquo14 studiesadditional entities but better behaved entities

In ILCrsquo17 input and output domains of function changes contain invalid changes while inILCrsquo14 these domains are restricted to valid changes via dependent types ILCrsquo14 also considers thedenotation of D n t o whose domains include invalid changes but such denotations are studied onlyindirectly In both cases function changes must map valid changes to valid changes But ILCrsquo14application df v1 dv is only well-typed is dv is a change valid from v1 hence we can simply say thatdf v1 respects change equivalence As discussed in Sec 142 in ILCrsquo17 the analogous property hasa trickier statement we can write df v1 and apply it to arbitrary equivalent changes dv1 =∆ dv2even if their source is not v1 but such change equivalences are not preserved

We can relate the two models by dening a logical relation called erasure (similar to the onedescribed by Cai et al) an ILCrsquo14 function change df erases to an ILCrsquo17 function change df prime

relative to source f A rarr B if given any change da that erases to daprime relative to source a1 Aoutput change df a1 da erases to df prime a1 daprime relative to source f a1 For base types erasure simplyconnects corresponding da (with source) with daprime in a manner dependent from the base type (oftenjust throwing away any embedded proofs of validity) In all cases one can show that if and only if

1For simplicity we use equality on changes even though equality is too restrictive Later (in Sec 142) wersquoll dene anequivalence relation on changes called change equivalence and written =∆ and use it systematically to relate changes inplace of equality For instance wersquoll write that f prime1 a1 da =∆ f1 (a1 oplus da) f1 a1 But for the present discussion equality willdo

dv erases to dv prime with source v1 then v1 oplus dv = v2 oplus dv prime (for suitable variants of oplus) in other wordsdv and dv prime share source and destination (technically ILCrsquo17 changes have no xed source so wesay that they are changes from v1 to v2 for some v2)

In ILCrsquo14 there is a dierent incremental semantics n t o∆ for terms t but it is still a valid ILCrsquo14change One can show that n t o∆ (as dened in ILCrsquo14) erases to nD n t o o∆ (as dened in ILCrsquo17)relative to source n t o in fact the needed proof is sketched by Cai et al through in disguise

It seems clear there is no isomorphism between ILCrsquo14 changes and ILCrsquo17 changes An ILCrsquo17function change also accepts invalid changes and the behavior on those changes canrsquot be preservedby an isomorphism Worse it seems hard to dene a non-isomorphic mapping to map an ILCrsquo14change df to an an ILCrsquo17 change erase df we have to dene behavior for (erase df ) a da evenwhen da is invalid As long as we work in a constructive setting we cannot decide whether da isvalid in general because da can be a function change with innite domain

We can give however a denition that does not need to detect such invalid changes Just extractsource and destination from a function change using valid change 0v and take dierence of sourceand destination using in the target system

unerase (σ rarr τ ) df prime = let f = λv rarr df prime v 0v in (f oplus df prime) funerase _ dv prime = erase (σ rarr τ ) df = let f = λv rarr df v 0v in (f oplus df ) ferase _ dv =

We dene these function by induction on types (for elements of ∆τ not arbitrary change structures)and we overload for ILCrsquo14 and ILCrsquo17 We conjecture that for all types τ and for all ILCrsquo17changes dv prime (of the right type) unerase τ dv prime erases to dv prime and for all ILCrsquo14 changes dv dv erasesto erase τ dv prime

Erasure is a well-behaved logical relation similar to the ones relating source and destinationlanguage of a compiler and to partial equivalence relations In particular it also induces partialequivalence relations (PER) (see Sec 1423) both on ILCrsquo14 changes and on ILCrsquo17 changes twoILCrsquo14 changes are equivalent if they erase to the same ILCrsquo17 change and two ILCrsquo17 changes areequivalent if the same ILCrsquo14 change erases to both Both relations are partial equivalence relations(and total on valid changes) Because changes that erase to each other share source and destinationthese induced equivalences coincide again with change equivalence That both relations are PERsalso means that erasure is a so-called quasi-PER [Krishnaswami and Dreyer 2013] Quasi-PERs area natural (though not obvious) generalization of PERs for relations among dierent sets R sube S1 times S2such relations cannot be either symmetric or transitive However we make no use of additionalproperties of quasi-PERs hence we donrsquot discuss them in further detail

1541 One-sided vs two-sided validity

There are also further supercial dierences among the two denitions In ILCrsquo14 changes validwith soure a have dependent type ∆a This dependent type is indexed by the source but not by thedestination Dependent function changes with source f Ararr B have type (a A) rarr ∆ararr ∆(f a)relating the behavior of function change df with the behavior of f on original inputs But this ishalf of function validity to relate the behavior of df with the behavior of df on updated inputs inILCrsquo14 valid function changes have to satisfy an additional equation called preservation of future2

f1 a1 oplus df a1 da = (f1 oplus df ) (a1 oplus da)

2Name suggested by Yufei Cai

This equation appears inelegant and mechanized proofs were often complicated by the need toperform rewritings using it Worse to show that a function change is valid we have to use dierentapproaches to prove it has the correct source and the correct destination

This dierence is however supercial If we replace f1 oplus df with f2 and a1 oplus da with a2 thisequation becomes f1 a1 oplus df a1 da = f2 a2 a consequence of f2 B df rarr ndash f1 So one might suspectthat ILCrsquo17 valid function changes also satisfy this equation This is indeed the case

Lemma 1541A valid function change df B f1 rarr f2 Ararr B satises equation

on any valid input da B a1 rarr a2 Ararr B

Conversely one can also show that ILCrsquo14 function changes also satisfy two-sided validity asdened in ILCrsquo17 Hence the only true dierence between ILCrsquo14 and ILCrsquo17 models is the one wediscussed earlier namely whether function changes can be applied to invalid inputs or not

We believe it could be possible to formalize the ILCrsquo14 model using two-sided validity bydening a dependent type of valid changes ∆2 (A rarr B) f1 f2 = (a1 a2 A) rarr ∆2 A a1 a2 rarr∆2 B (f1 a1) (f2 a2) We provide more details on such a transformation in Chapter 18

Models restricted to valid changes (like ILCrsquo14) are related to models based on directed graphsand reexive graphs where values are graphs vertexes changes are edges between change sourceand change destination (as hinted earlier) In graph language validity preservation means thatfunction changes are graph homomorphisms

Based on similar insights Atkey [2015] suggests modeling ILC using reexive graphs whichhave been used to construct parametric models for System F and extensions and calls for researchon the relation between ILC and parametricity As follow-up work Cai [2017] studies models ofILC based on directed and reexive graphs

Chapter 16

Dierentiation in practice

In practice successful incrementalization requires both correctness and performance of the deriva-tives Correctness of derivatives is guaranteed by the theoretical development the previous sectionstogether with the interface for dierentiation and proof plugins whereas performance of derivativeshas to come from careful design and implementation of dierentiation plugins

161 The role of dierentiation pluginsUsers of our approach need to (1) choose which base types and primitives they need (2) implementsuitable dierentiation plugins for these base types and primitives (3) rewrite (relevant parts of)their programs in terms of these primitives and (4) arrange for their program to be called on changesinstead of updated inputs

As discussed in Sec 105 dierentiation supports abstraction application and variables butsince computation on base types is performed by primitives for those types ecient derivatives forprimitives are essential for good performance

To make such derivatives ecient change types must also have ecient implementations andallow describing precisely what changed The ecient derivative of sum in Chapter 10 is possibleonly if bag changes can describe deletions and insertions and integer changes can describe additivedierences

For many conceivable base types we do not have to design the dierentiation plugins fromscratch Instead we can reuse the large body of existing research on incrementalization in rst-orderand domain-specic settings For instance we reuse the approach from Gluche et al [1997] tosupport incremental bags and maps By wrapping a domain-specic incrementalization result in adierentiation plugin we adapt it to be usable in the context of a higher-order and general-purposeprogramming language and in interaction with other dierentiation plugins for the other basetypes of that language

For base types with no known incrementalization strategy the precise interfaces for dierentia-tion and proof plugins can guide the implementation eort These interfaces could also from thebasis for a library of dierentiation plugins that work well together

Rewriting whole programs in our language would be an excessive requirements Instead weembed our object language as an EDSL in some more expressive meta-language (Scala in our casestudy) so that embedded programs are reied The embedded language can be made to resemblethe metalanguage [Rompf and Odersky 2010] To incrementalize a part of a computation we writeit in our embedded object language invoke D on the embedded program optionally optimize theresulting programs and nally invoke them The metalanguage also acts as a macro system for the

128 Chapter 16 Dierentiation in practice

histogram Map Int (Bag word)rarr Map word Inthistogram = mapReduce groupOnBags additiveGroupOnIntegers histogramMap histogramReduce

where additiveGroupOnIntegers = Group (+) (λn rarr minusn) 0histogramMap = foldBag groupOnBags (λn rarr singletonBag (n 1))histogramReduce = foldBag additiveGroupOnIntegers id

-- Precondition-- For every key1 k1 and key2 k2 the terms mapper key1 and reducer key2 are homomorphismsmapReduce Group v1 rarr Group v3 rarr (k1 rarr v1 rarr Bag (k2 v2))rarr (k2 rarr Bag v2 rarr v3)rarr Map k1 v1 rarr Map k2 v3mapReduce group1 group3 mapper reducer = reducePerKey groupByKey mapPerKey

where mapPerKey = foldMap group1 groupOnBags mappergroupByKey = foldBag (groupOnMaps groupOnBags) (λ(key val)rarr singletonMap key (singletonBag val))reducePerKey = foldMap groupOnBags (groupOnMaps group3) (λkey bag rarr singletonMap key (reducer key bag))

Figure 161 The λ-term histogram with Haskell-like syntactic sugar additiveGroupOnIntegers is theabelian group induced on integers by addition (Z+ 0minus)

object language as usual This allows us to simulate polymorphic collections such as (Bag ι) eventhough the object language is simply-typed technically our plugin exposes a family of base typesto the object language

162 Predicting nil changesHandling changes to all inputs can induce excessive overhead in incremental programs [Acar2009] It is also often unnecessary for instance the function argument of fold in Chapter 10 doesnot change since it is a closed subterm of the program so fold will receive a nil change for it A(conservative) static analysis can detect changes that are guaranteed to be nil at runtime We canthen specialize derivatives that receive this change so that they need not inspect the change atruntime

For our case study we have implemented a simple static analysis which detects and propagatesinformation about closed terms The analysis is not interesting and we omit details for lack of space

1621 Self-maintainabilityIn databases a self-maintainable view [Gupta and Mumick 1999] is a function that can update itsresult from input changes alone without looking at the actual input By analogy we call a derivativeself-maintainable if it uses no base parameters only their changes Self-maintainable derivativesdescribe ecient incremental computations since they do not use their base input their runningtime does not have to depend on the input size

Example 1621D nmerge o = λx dx y dy rarr merge dx dy is self-maintainable with the change structure Bag Sdescribed in Example 1031 because it does not use the base inputs x and y Other derivatives areself-maintainable only in certain contexts The derivative of element-wise function application(map f xs) ignores the original value of the bag xs if the changes to f are always nil because theunderlying primitive foldBag is self-maintainable in this case (as discussed in next section) We takeadvantage of this by implementing a specialized derivative for foldBag

Similarly to what we have seen in Sec 106 that dgrandTotal needlessly recomputes merge xs yswithout optimizations However the result is a base input to fold prime In next section wersquoll replacefold prime by a self-maintainable derivative (based again on foldBag) and will avoid this recomputation

To conservatively predict whether a derivative is going to be self-maintainable (and thus ecient)one can inspect whether the program restricts itself to (conditionally) self-maintainable primitives

Chapter 16 Dierentiation in practice 129

like merge (always) or map f (only if df is nil which is guaranteed when f is a closed term)To avoid recomputing base arguments for self-maintainable derivatives (which never need

them) we currently employ lazy evaluation Since we could use standard techniques for dead-codeelimination [Appel and Jim 1997] instead laziness is not central to our approach

A signicant restriction is that not-self-maintainable derivatives can require expensive compu-tations to supply their base arguments which can be expensive to compute Since they are alsocomputed while running the base program one could reuse the previously computed value throughmemoization or extensions of static caching (as discussed in Sec 1923) We leave implementingthese optimizations for future work As a consequence our current implementation delivers goodresults only if most derivatives are self-maintainable

163 Case studyWe perform a case study on a nontrivial realistic program to demonstrate that ILC can speed it upWe take the MapReduce-based skeleton of the word-count example [Laumlmmel 2007] We dene asuitable dierentiation plugin adapt the program to use it and show that incremental computationis faster than recomputation We designed and implemented the dierentiation plugin following therequirements of the corresponding proof plugin even though we did not formalize the proof plugin(eg in Agda) For lack of space we focus on base types which are crucial for our example andits performance that is collections The plugin also implements tuples tagged unions Booleansand integers with the usual introduction and elimination forms with few optimizations for theirderivatives

wordcount takes a map from document IDs to documents and produces a map from wordsappearing in the input to the count of their appearances that is a histogram

wordcount Map ID Documentrarr Map Word Int

For simplicity instead of modeling strings we model documents as bags of words and documentIDs as integers Hence what we implement is

histogram Map Int (Bag a) rarr Map a Int

We model words by integers (a = Int ) but treat them parametrically Other than that we adaptdirectly Laumlmmelrsquos code to our language Figure 161 shows the λ-term histogram

Figure 162 shows a simplied Scala implementation of the primitives used in Fig 161 Asbag primitives we provide constructors and a fold operation following Gluche et al [1997] Theconstructors for bags are empty (constructing the empty bag) singleton (constructing a bag with oneelement) merge (constructing the merge of two bags) and negate (negate b constructs a bag with thesame elements as b but negated multiplicities) all but singleton represent abelian group operationsUnlike for usual ADT constructors the same bag can be constructed in dierent ways which areequivalent by the equations dening abelian groups for instance since merge is commutativemerge x y = merge y x Folding on a bag will represent the bag through constructors in an arbitraryway and then replace constructors with arguments to ensure a well-dened result the argumentsof fold should respect the same equations that is they should form an abelian group for instancethe binary operator should be commutative Hence the fold operator foldBag can be dened to takea function (corresponding to singleton) and an abelian group (for the other constructors) foldBag isthen dened by equations

foldBag Group τ rarr (σ rarr τ ) rarr Bag σ rarr τ

foldBag д(_ e) f empty = e

Abelian groupsabstract class Group[A]

def merge( value1 A value2 A) Adef inverse(value A) Adef zero A

Bagstype Bag[A] = collectionimmutableHashMap[A Int]

def groupOnBags[A] = new Group[Bag[A]] def merge( bag1 Bag[A] bag2 Bag[A]) = def inverse(bag Bag[A]) = bagmap(

case (value count) rArr (value -count))def zero = collectionimmutableHashMap ()

def foldBag[A B](group Group[B] f A rArr B bag Bag[A]) B =bagflatMap (

case (x c) if c ge 0 rArr Seqfill(c)(f(x))case (x c) if c lt 0 rArr Seqfill(-c)(groupinverse(f(x)))

) fold(groupzero)(groupmerge)

Mapstype Map[K A] = collectionimmutableHashMap[K A]

def groupOnMaps[K A](group Group[A]) = new Group[Map[K A]] def merge( dict1 Map[K A] dict2 Map[K A]) Map[K A] =

dict1 merged( dict2 )(case ((k v1 ) (_ v2 )) rArr (k groupmerge( v1 v2 ))

) filter (case (k v) rArr v groupzero

def inverse(dict Map[K A]) Map[K A] = dictmap(case (k v) rArr (k groupinverse(v))

def zero = collectionimmutableHashMap ()

The general map folddef foldMapGen[K A B](zero B merge (B B) rArr B)

(f (K A) rArr B dict Map[K A]) B =dictmap(Functiontupled(f)) fold(zero)(merge)

By using foldMap instead of foldMapGen the user promises that f k is a homomorphism from groupA to groupB for each k Kdef foldMap[K A B]( groupA Group[A] groupB Group[B])

(f (K A) rArr B dict Map[K A]) B =foldMapGen(groupBzero groupBmerge)(f dict)

Figure 162 A Scala implementation of primitives for bags and maps In the code we call and erespectively merge inverse and zero We also omit the relatively standard primitives

foldBag д(_ e) f (merge b1 b2) = foldBag д f b1

foldBag д f b1

foldBag д(_ e) f (negate b) = (foldBag д f b)

foldBag д(_ e) f (singleton v) = f v

If д is a group these equations specify foldBag д precisely [Gluche et al 1997] Moreover the rstthree equations mean that foldBag д f is the abelian group homomorphism between the abeliangroup on bags and the group д (because those equations coincide with the denition) Figure 162shows an implementation of foldBag as specied above Moreover all functions which deconstructa bag can be expressed in terms of foldBag with suitable arguments For instance we can sum theelements of a bag of integers with foldBag gZ (λx rarr x) where gZ is the abelian group inducedon integers by addition (Z+ 0minus) Users of foldBag can dene dierent abelian groups to specifydierent operations (for instance to multiply oating-point numbers)

If д and f do not change foldBag д f has a self-maintainable derivative By the equationsabove

foldBag д f (b oplus db)= foldBag д f (merge b db)= foldBag д f b foldBag д f db= foldBag д f b oplus GroupChange д (foldBag д f db)

We will describe the GroupChange change constructor in a moment Before that we note that as aconsequence the derivative of foldBag д f is

λb db rarr GroupChange д (foldBag д f db)

and we can see it does not use b as desired it is self-maintainable Additional restrictions arerequire to make foldMaprsquos derivative self-maintainable Those restrictions require the preconditionon mapReduce in Fig 161 foldMapGen has the same implementation but without those restrictionsas a consequence its derivative is not self-maintainable but it is more generally applicable Lack ofspace prevents us from giving more details

To dene GroupChange we need a suitable erased change structure on τ such that oplus will beequivalent to Since there might be multiple groups on τ we allow the changes to specify a groupand have oplus delegate to (which we extract by pattern-matching on the group)

∆τ = Replace τ | GroupChange (AbelianGroup τ )τv oplus (Replace u) = uv oplus (GroupChange ( inverse zero) dv) = v dvv u = Replace v

That is a change between two values is either simply the new value (which replaces the old onetriggering recomputation) or their dierence (computed with abelian group operations like in thechanges structures for groups from Sec 1311 The operator does not know which group to useso it does not take advantage of the group structure However foldBag is now able to generate agroup change

We rewrite grandTotal in terms of foldBag to take advantage of group-based changes

id = λx rarr x

G+ = (Z+minus 0)grandTotal = λxsrarr λysrarr foldBag G+ id (merge xs ys)D n grandTotal o =

λxsrarr λdxsrarr λysrarr λdysrarrfoldBagprime G+ G prime+ id id prime

(merge xs ys)(mergeprime xs dxs ys dys)

It is now possible to write down the derivative of foldBag

(if static analysis detects that dG and df are nil changes)foldBagprime = D n foldBag o =

λG rarr λdG rarr λ f rarr λdf rarr λzsrarr λdzsrarrGroupChange G (foldBag G f dzs)

We know from Sec 106 that

mergeprime = λu rarr λdurarr λv rarr λdv rarr merge du dv

Inlining foldBagprime and mergeprime gives us a more readable term β-equivalent to the derivative ofgrandTotal

D n grandTotal o =λxsrarr λdxsrarr λysrarr λdysrarr foldBag G+ id (merge dxs dys)

164 Benchmarking setupWe run object language programs by generating corresponding Scala code To ensure rigorousbenchmarking [Georges et al 2007] we use the Scalameter benchmarking library To show thatthe performance dierence from the baseline is statistically signicant we show 99-condenceintervals in graphs

We verify Eq (101) experimentally by checking that the two sides of the equation alwaysevaluate to the same value

We ran benchmarks on an 8-core Intel Core i7-2600 CPU running at 34 GHz with 8GB of RAMrunning Scientic Linux release 64 While the benchmark code is single-threaded the JVM ooadsgarbage collection to other cores We use the preinstalled OpenJDK 170_25 and Scala 2102

Input generation Inputs are randomly generated to resemble English words over all webpages onthe internet The vocabulary size and the average length of a webpage stay relatively the same whilethe number of webpages grows day by day To generate a size-n input of type (Map Int (Bag Int))we generate n random numbers between 1 and 1000 and distribute them randomly in n1000 bagsChanges are randomly generated to resemble edits A change has 50 probability to delete a randomexisting number and has 50 probability to insert a random number at a random location

Experimental units Thanks to Eq (101) both recomputation f (a oplus da) and incremental com-putation f a oplus D n f o a da produce the same result Both programs are written in our objectlanguage To show that derivatives are faster we compare these two computations To comparewith recomputation we measure the aggregated running time for running the derivative on thechange and for updating the original output with the result of the derivative

165 Benchmark resultsOur results show (Fig 163) that our program reacts to input changes in essentially constant time asexpected hence orders of magnitude faster than recomputation Constant factors are small enoughthat the speedup is apparent on realistic input sizes

We present our results in Fig 163 As expected the runtime of incremental computation isessentially constant in the size of the input while the runtime of recomputation is linear in the inputsize Hence on our biggest inputs incremental computation is over 104 times faster

Derivative time is in fact slightly irregular for the rst few inputs but this irregularity decreasesslowly with increasing warmup cycles In the end for derivatives we use 104 warmup cycles Withfewer warmup cycles running time for derivatives decreases signicantly during execution goingfrom 26ms for n = 1000 to 02ms for n = 512000 Hence we believe extended warmup is appropriateand the changes do not aect our general conclusions Considering condence intervals in ourexperiments the running time for derivatives varies between 0139ms and 0378ms

In our current implementation the code of the generated derivatives can become quite big Forthe histogram example (which is around 1KB of code) a pretty-print of its derivative is around40KB of code The function application case in Fig 121c can lead to a quadratic growth in the worstcase More importantly we realized that our home-grown transformation system in some casesperforms overly aggressive inlining especially for derivatives even though this is not required forincrementalization and believe this explains a signicant part of the problem Indeed code blowupproblems do not currently appear in later experiments (see Sec 174)

166 Chapter conclusionOur results show that the incrementalized program runs in essentially constant time and henceorders of magnitude faster than the alternative of recomputation from scratch

An important lessons from the evaluations is that as anticipated in Sec 1621 to achieve goodperformance our current implementation requires some form of dead code elimination such aslaziness

100013

1000013

1k13 2k13 4k13 8k13 16k13 32k13 64k13 128k13 256k13 512k13 1024k13 2048k13 4096k13

Run$me13 (ms)13

Input13 size13

Incremental13 Recomputaon13

Figure 163 Performance results in logndashlog scale with input size on the x-axis and runtime in mson the y-axis Condence intervals are shown by the whiskers most whiskers are too small to bevisible

Chapter 17

Cache-transfer-style conversion

171 IntroductionIncremental computation is often desirable after computing an output from some input we oftenneed to produce new outputs corresponding to changed inputs One can simply rerun the samebase program on the new input but instead incremental computation transforms the input changeto an output change This can be desirable because more ecient

Incremental computation could also be desirable because the changes themselves are of interestImagine a typechecker explaining how some change to input source propagates to changes to thetypechecking result More generally imagine a program explaining how some change to its inputpropagates through a computation into changes to the output

ILC (Incremental λ-Calculus) [Cai et al 2014] is a recent approach to incremental computationfor higher-order languages ILC represents changes from an old value v1 to an updated value v2 as arst-class value written dv ILC also transforms statically base programs to incremental programs orderivatives derivatives produce output changes from changes to all their inputs Since functions arerst-class values ILC introduces a notion of function changes

However as mentioned by Cai et al and as we explain below ILC as currently dened doesnot allow reusing intermediate results created by the base computation during the incrementalcomputation That restricts ILC to supporting eciently only self-maintainable computations arather restrictive class for instance mapping self-maintainable functions on a sequence is self-maintainable but dividing numbers isnrsquot In this paper we remove this limitation

To remember intermediate results many incrementalization approaches rely on forms of memo-ization one uses hashtables to memoize function results or dynamic dependence graphs [Acar2005] to remember the computation trace However such data structures often remember resultsthat might not be reused moreover the data structures themselves (as opposed to their values)occupy additional memory looking up intermediate results has a cost in time and typical general-purpose optimizers cannot predict results from memoization lookups Instead ILC aims to producepurely functional programs that are suitable for further optimizations

We eschew memoization instead we transform programs to cache-transfer style (CTS) followingideas from Liu and Teitelbaum [1995] CTS functions output caches of intermediate results alongwith their primary results Caches are just nested tuples whose structure is derived from code andaccessing them does not involve looking up keys depending on inputs We also extend dierentiationto produce CTS derivatives which can extract from caches any intermediate results they need Thisapproach was inspired and pioneered by Liu and Teitelbaum for untyped rst-order functionallanguages but we integrate it with ILC and extend it to higher-order typed languages

136 Chapter 17 Cache-transfer-style conversion

While CTS functions still produce additional intermediate data structures produced programscan be subject to further optimizations We believe static analysis of a CTS function and its CTSderivative can identify and remove unneeded state (similar to what has been done by Liu andTeitelbaum) as we discuss in Sec 1756 We leave a more careful analysis to future work

We prove most of our formalization correct in Coq To support non-simply-typed programs allour proofs are for untyped λ-calculus while previous ILC correctness proofs were restricted tosimply-typed λ-calculus Unlike previous ILC proofs we simply dene which changes are valid viaa logical relation then show the fundamental property for this logical relation (see Sec 1721) Toextend this proof to untyped λ-calculus we switch to step-indexed logical relations

To support dierentiation on our case studies we also represent function changes as closuresthat can be inspected to support manipulating them more eciently and detecting at runtime whena function change is nil hence need not be applied To show this representation is correct we alsouse closures in our mechanized proof

Unlike plain ILC typing programs in CTS is challenging because the shape of caches for afunction depends on the function implementation Our case studies show how to non-triviallyembed resulting programs in typed languages at least for our case studies but our proofs supportan untyped target language

In sum we present the following contributions

bull we show that programs and derivatives in cache-transfer style simulate correctly their non-CTS variants (Sec 1736)

bull we mechanize in Coq most of our proofs

The rest of the paper is organized as follows Sec 172 summarizes ILC and motivates theextension to cache-transfer style Sec 173 presents our formalization and proofs Sec 174 presentsour case studies and benchmarks Sec 175 discusses limitations and future work Sec 176 discussesrelated work and Sec 177 concludes

172 Introducing Cache-Transfer StyleIn this section we motivate cache-transfer style (CTS) Sec 1721 summarizes a reformulation ofILC so we recommend it also to readers familiar with Cai et al [2014] In Sec 1722 we consider aminimal rst-order example namely an average function We incrementalize it using ILC explainwhy the result is inecient and show that remembering results via cache-transfer style enablesecient incrementalization with asymptotic speedups In Sec 1723 we consider a higher-orderexample that requires not just cache-transfer style but also ecient support for both nil and non-nilfunction changes together with the ability to detect nil changes

Chapter 17 Cache-transfer-style conversion 137

1721 Background

ILC considers simply-typed programs and assumes that base types and primitives come withsupport for incrementalization

The ILC framework describes changes as rst-class values and types them using dependent typesTo each type A we associate a type ∆A of changes for A and an update operator oplus Ararr ∆Ararr Athat updates an initial value with a change to compute an updated value We also consider changesfor evaluation environments which contain changes for each environment entry

A change da ∆A can be valid from a1 A to a2 A and we write then da B a1 rarr a2 AThen a1 is the source or initial value for da and a2 is the destination or updated value for da Fromda B a1 rarr a2 A follows that a2 coincides with a1 oplus da but validity imposes additional invariantsthat are useful during incrementalization A change can be valid for more than one source but achange da and a source a1 uniquely determine the destination a2 = a1 oplus da To avoid ambiguity wealways consider a change together with a specic source

Each type comes with its denition of validity Validity is a ternary logical relation For functiontypes Ararr B we dene ∆(Ararr B) = Ararr ∆Ararr ∆B and say that a function change df ∆(Ararr B)is valid from f1 A rarr B to f2 A rarr B (that is df B f1 rarr f2 A rarr B) if and only if df mapsvalid input changes to valid output changes by that we mean that if da B a1 rarr a2 A thendf a1 da B f1 a1 rarr f2 a2 B Source and destination of df a1 da that is f1 a1 and f2 a2 are theresult of applying two dierent functions that is f1 and f2

ILC expresses incremental programs as derivatives Generalizing previous usage we simplysay derivative for all terms produced by dierentiation If dE is a valid environment change fromE1 to E2 and term t is well-typed and can be evaluated against environments E1 E2 then termD ιn t o the derivative of t evaluated against dE produces a change from v1 to v2 where v1 is thevalue of t in environment E1 and v2 is the value of t in environment E2 This correctness propertyfollows immediately the fundamental property for the logical relation of validity and can be provenaccordingly we give a step-indexed variant of this proof in Sec 1733 If t is a function and dE is anil change (that is its source E1 and destination E2 coincide) then D ιn t o produces a nil functionchange and is also a derivative according to Cai et al [2014]

To support incrementalization one must dene change types and validity for each base type anda correct derivative for each primitive Functions written in terms of primitives can be dierentiatedautomatically As in all approaches to incrementalization (see Sec 176) one cannot incrementalizeeciently an arbitrary program ILC limits the eort to base types and primitives

1722 A rst-order motivating example computing averages

Suppose we need to compute the average y of a bag of numbers xs Bag Z and that whenever wereceive a change dxs ∆(Bag Z) to this bag we need to compute the change dy to the average y

In fact we expect multiple consecutive updates to our bag Hence we say we have an initial bagxs1 and compute its average y1 as y1 = avg xs1 and then consecutive changes dxs1 dxs2 Changedxs1 goes from xs1 to xs2 dxs2 goes from xs2 to xs3 and so on We need to compute y2 = avg xs2y3 = avg xs3 but more quickly than we would by calling avg again

We can compute the average through the following function (that we present in Haskell)

avg xs =let s = sum xs

n = length xsr = div s n

We write this function in Arsquo-normal form (ArsquoNF) a small variant of A-normal form (ANF) Sabry andFelleisen [1993] that we introduce In ArsquoNF programs bind to a variable the result of each functioncall in avg instead of using it directly unlike plain ANF ArsquoNF also binds the result of tail calls suchas div s n in avg ArsquoNF simplies conversion to cache-transfer style

We can incrementalize eciently both sum and length by generating via ILC their derivativesdsum and dlength assuming a language plugin supporting bags supporting folds

But division is more challenging To be sure we can write the following derivative

ddiv a1 da b1 db =let a2 = a1 oplus da

b2 = b1 oplus dbin div a2 b2 div a1 b1

Function ddiv computes the dierence between the updated and the original result without anyspecial optimization but still takes O(1) for machine integers But unlike other derivatives ddivuses its base inputs a1 and b1 that is it is not self-maintainable [Cai et al 2014]

Because ddiv is not self-maintainable a derivative calling it will not be ecient To wit let uslook at davg the derivative of avg

davg xs dxs =let s = sum xs

ds = dsum xs dxsn = length xsdn = dlength xs dxsr = div s ndr = ddiv s ds n dn

This function recomputes s n and r just like in avg but r is not used so its recomputation caneasily be removed by later optimizations On the other hand derivative ddiv will use the valuesof its base inputs a1 and b1 so derivative davg will need to recompute s and n and save no timeover recomputation If ddiv were instead a self-maintainable derivative the computations of sand n would also be unneeded and could be optimized away Cai et al leave ecient support fornon-self-maintainable derivatives for future work

To avoid recomputation we must simply remember intermediate results as needed Executingdavg xs1 dxs1 will compute exactly the same s and n as executing avg xs1 so we must just save andreuse them without needing to use memoization proper Hence we CTS-convert each function fto a CTS function fC and a CTS derivative dfC CTS function fC produces together with its nalresult a cache that the caller must pass to CTS derivative dfC A cache is just a tuple of valuescontaining information from subcalls mdash either inputs (as we explain in a bit) intermediate resultsor subcaches that is caches returned from further function calls In fact typically only primitivefunctions like div need to recall actual result automatically transformed functions only need toremember subcaches or inputs

CTS conversion is simplied by rst converting to ArsquoNF where all results of subcomputationsare bound to variables we just collect all caches in scope and return them

As an additional step we avoid always passing base inputs to derivatives by dening ∆(ArarrB) = ∆Ararr ∆B Instead of always passing a base input and possibly not using it we can simplyassume that primitives whose derivative needs a base input store the input in the cache

To make the translation uniform we stipulate all functions in the program are transformedto CTS using a (potentially empty) cache Since the type of the cache for a function f A rarr B

depends on implementation of f we introduce for each function f a type for its cache FC so thatCTS function fC has type Ararr (B FC) and CTS derivative dfC has type ∆Ararr FC rarr (∆B FC)

The denition of FC is only needed inside fC and dfC and it can be hidden in other functions tokeep implementation details hidden in transformed code because of limitations of Haskell moduleswe can only hide such denitions from functions in other modules

Since functions of the same type translate to functions of dierent types the translation doesnot preserve well-typedness in a higher-order language in general but it works well in our casestudies (Sec 174) Sec 1741 shows how to map such functions We return to this point brieyin Sec 1751

CTS-converting our example produces the following code

data AvgC = AvgC SumC LengthC DivCavgC Bag Zrarr (ZAvgC)avgC xs =let (s cs1) = sumC xs(n cn1) = lengthC xs(r cr1) = s lsquodivClsquo n

in (rAvgC cs1 cn1 cr1)davgC ∆(Bag Z) rarr AvgC rarr (∆ZAvgC)davgC dxs (AvgC cs1 cn1 cr1) =let (ds cs2) = dsumC dxs cs1(dn cn2) = dlengthC dxs cn1(dr cr2) = ddivC ds dn cr1in (drAvgC cs2 cn2 cr2)

In the above program sumC and lengthC are self-maintainable that is they need no base inputsand can be transformed to use empty caches On the other hand ddiv is not self-maintainable sowe transform it to remember and use its base inputs

divC a b = (a lsquodivlsquo b (a b))ddivC da db (a1 b1) =

let a2 = a1 oplus dab2 = b1 oplus db

in (div a2 b2 div a1 b1 (a2 b2))

Crucially ddivC must return an updated cache to ensure correct incrementalization so that applica-tion of further changes works correctly In general if (y c1) = fC x and (dy c2) = dfC dx c1 then(y oplus dy c2) must equal the result of the base function fC applied to the new input x oplus dx that is(y oplus dy c2) = fC (x oplus dx)

Finally to use these functions we need to adapt the caller Letrsquos rst show how to deal with asequence of changes imagine that dxs1 is a valid change for xs and that dxs2 is a valid change forxs oplus dxs1 To update the average for both changes we can then call the avgC and davgC as follows

-- A simple example caller with two consecutive changesavgDAvgC Bag Zrarr ∆(Bag Z) rarr ∆(Bag Z) rarr(Z∆Z∆ZAvgC)

avgDAvgC xs dxs1 dxs2 =let (res1 cache1) = avgC xs(dres1 cache2) = davgC dxs1 cache1(dres2 cache3) = davgC dxs2 cache2

in (res1 dres1 dres2 cache3)

Incrementalization guarantees that the produced changes update the output correctly in responseto the input changes that is we have res1 oplus dres1 = avg (xs oplus dxs1) and res1 oplus dres1 oplus dres2 =avg (xs oplus dxs1 oplus dxs2) We also return the last cache to allow further updates to be processed

Alternatively we can try writing a caller that gets an initial input and a (lazy) list of changesdoes incremental computation and prints updated outputs

processChangeList (dxsN dxss) yN cacheN = dolet (dy cacheN prime) = avgprime dxsN cacheNyN prime = yN oplus dy

print yN prime

processChangeList dxss yN prime cacheN prime

-- Example caller with multiple consecutive-- changes

someCaller xs1 dxss = dolet (y1 cache1) = avgC xs1processChangeList dxss y1 cache1

More in general we produce both an augmented base function and a derivative where theaugmented base function communicates with the derivative by returning a cache The contentsof this cache are determined statically and can be accessed by tuple projections without dynamiclookups In the rest of the paper we use the above idea to develop a correct transformation thatallows incrementalizing programs using cache-transfer style

Wersquoll return to this example in Sec 1741

1723 A higher-order motivating example nested loopsNext we consider CTS dierentiation on a minimal higher-order example To incrementalize thisexample we enable detecting nil function changes at runtime by representing function changes asclosures that can be inspected by incremental programs Wersquoll return to this example in Sec 1742

We take an example of nested loops over sequences implemented in terms of standard Haskellfunctions map and concat For simplicity we compute the Cartesian product of inputs

cartesianProduct Sequence ararr Sequence brarr Sequence (a b)cartesianProduct xs ys =

concatMap (λx rarr map (λy rarr (x y)) ys) xsconcatMap f xs = concat (map f xs)

Computing cartesianProduct xs ys loops over each element x from sequence xs and y from sequenceys and produces a list of pairs (x y) taking quadratic time O(n2) (we assume for simplicity that|xs | and |ys | are both O(n)) Adding a fresh element to either xs or ys generates an output changecontainingΘ(n) fresh pairs hence derivative dcartesianProduct must take at least linear time Thanksto specialized derivatives dmap and dconcat for primitives map and concat dcartesianProduct hasasymptotically optimal time complexity To achieve this complexity dmap f df must detect whendf is a nil function change and avoid applying it to unchanged input elements

To simplify the transformations we describe we λ-lift programs before dierentiating andtransforming them

cartesianProduct xs ys =concatMap (mapPair ys) xs

mapPair ys = λx rarr map (pair x) ys

pair x = λy rarr (x y)concatMap f xs =let yss = map f xsin concat yss

Suppose we add an element to either xs or ys If change dys adds one element to ys thendmapPair ys dys the argument to dconcatMap is a non-nil function change taking constant timeso dconcatMap must apply it to each element of xs oplus dxs

Suppose next that change dxs adds one element to xs and dys is a nil change for ys ThendmapPair ys dys is a nil function change And we must detect this dynamically If a function changedf ∆(Ararr B) is represented as a function and A is innite one cannot detect dynamically that itis a nil change To enable runtime nil change detection we apply closure conversion on functionchanges a function change df represented as a closure is nil for f only if all environment changesit contains are also nil and if the contained function is a derivative of f rsquos function

173 FormalizationIn this section we formalize cache-transfer-style (CTS) dierentiation and formally prove itssoundness with respect to dierentiation We furthermore present a novel proof of correctness fordierentiation itself

To simplify the proof we encode many invariants of input and output programs by deninginput and output languages with a suitably restricted abstract syntax Restricting the input languagesimplies the transformations while restricting the output languages simplies the semantics Sincebase programs and derivatives have dierent shapes we dene dierent syntactic categories of baseterms and change terms

We dene and prove sound three transformations across two languages rst we present avariant of dierentiation (as dened by Cai et al [2014]) going from base terms of language λAL tochange terms of λAL and we prove it sound with respect to non-incremental evaluation Second wedene CTS conversion as a pair of transformations going from base terms of λAL CTS translationproduces CTS versions of base functions as base terms in iλAL and CTS dierentiation producesCTS derivatives as change terms in iλAL

As source language for CTS dierentiation we use a core language named λAL This language isa common target of a compiler front-end for a functional programming language in this languageterms are written in λ-lifted and Arsquo-normal form (ArsquoNF) so that every intermediate result is namedand can thus be stored in a cache by CTS conversion and reused later (as described in Sec 172)

The target language for CTS dierentiation is named iλAL Programs in this target language arealso λ-lifted and in ArsquoNF But additionally in these programs every toplevel function f produces acache which is to be conveyed to the derivatives of f

The rest of this section is organized as follows Sec 1731 presents syntax and semantics ofsource language λAL Sec 1732 denes dierentiation and Sec 1733 proves it correct Sec 1734presents syntax and semantics of target language λAL Sec 1735 dene CTS conversion andSec 1736 proves it correct Most of the formalization has been mechanized in Coq (the proofs ofsome straightforward lemmas are left for future work)

1731 The source language λALSyntax The syntax of λAL is given in Figure 171 Our source language allows representing bothbase terms t and change terms dt

Base termst = let y = f x in t Call

| let y = (x) in t Tuple| x Result

Change termsdt = let y = f x dy = df x dx in dt Call

| let y = (x) dy = (dx) in dt Tuple| dx Result

Closed valuesv = E[λx t ] Closure

| (v) Tuple| ` Literal| p Primitive

Value environmentsE = E x = v Value binding

| bull EmptyChange values

dv = dE[λx dx dt] Closure| (dv) Tuple| d` Literal| v Replace| nil Nil

Change environmentsdE = dE x = v dx = dv Binding

| bull Emptyn k isin N Step indexes

Figure 171 Source language λAL (syntax)

[SVar]E ` x dArr1 E(x)

[STuple]E y = (E(x)) ` t dArrn v

E ` let y = (x) in t dArrn+1 v

[SPrimitiveCall]E(f ) = p E y = δp(E(x)) ` t dArrn v

E ` let y = f x in t dArrn+1 v

[SClosureCall]E(f ) = Ef [λx tf ] Ef x = E(x) ` tf dArrm vy E y = vy ` t dArrn v

E ` let y = f x in t dArrm+n+1 v

Figure 172 Step-indexed big-step semantics for base terms of source language λAL

Our syntax for base terms t represents λ-lifted programs in Arsquo-normal form (Sec 1722) Wewrite x for a sequence of variables of some unspecied length x1 x2 xm A term can be either abound variable x or a let-binding of y in subterm t to either a new tuple (x) (let y = (x) in t ) orthe result of calling function f on argument x (let y = f x in t ) Both f and x are variables to belooked up in the environment Terms cannot contain λ-abstractions as they have been λ-lifted totop-level denitions which we encode as closures in the initial environments Unlike standard ANFwe add no special syntax for function calls in tail position (see Sec 1753 for a discussion about thislimitation) We often inspect the result of a function call f x which is not a valid term in our syntaxTo enable this we write ldquo(f x)rdquo for ldquolet y = f x in yrdquo where y is chosen fresh

Our syntax for change terms dt mimicks the syntax for base terms except that (i) each functioncall is immediately followed by the evaluation of its derivative and (ii) the nal value returned by achange term is a change variable dx As we will see this ad-hoc syntax of change terms is tailoredto capture the programs produced by dierentiation We only allows α-renamings that maintainthe invariant that the denition of x is immediately followed by the denition of dx if x is renamedto y then dx must be renamed to dy

Semantics A closed value for base terms is either a closure a tuple of values a constants ora primitive A closure is a pair of an evaluation environment E and a λ-abstraction closed withrespect to E The set of available constants is left abstract It may contain usual rst-order constantslike integers We also leave abstract the primitives p like if-then-else or projections of tuplecomponents As usual environments E map variables to closed values With no loss of generalitywe assume that all bound variables are distinct

Figure 172 shows a step-indexed big-step semantics for base terms dened through judgment E `t dArrn v pronounced ldquoUnder environment E base term t evaluates to closed value v in n stepsrdquo Our

step-indexes count the number of ldquonodesrdquo of a big-step derivation1 We explain each rule in turnRule [SVar] looks variable x up in environment E Other rules evaluate let-binding let y = in tin environment E Each rule computes yrsquos new value vy (taking m steps where m can be zero) andevaluates in n steps body t to v using environment E extended by binding y to vy The overalllet-binding evaluates to v in m + n + 1 steps But dierent rules compute yrsquos value dierently[STuple] looks each variable in x up in E to evaluate tuple (x) (in m = 0 steps) [SPrimitiveCall]evaluates function calls where variable f is bound in E to a primitive p How primitives are evaluatedis specied by function δp(ndash) from closed values to closed values To evaluate such a primitive callthis rule applies δp(ndash) to xrsquos value (in m = 0 steps) [SClosureCall] evaluates functions calls wherevariable f is bound in E to closure Ef [λx tf ] this rule evaluates closure body tf inm steps usingclosure environment Ef extended with the value of argument x in E

Change semantics We move on to dene how change terms evaluate to change values We startby required auxiliary denitions

A closed change value is either a closure change a tuple change a literal change a replacementchange or a nil change A closure change is a pair made of an evaluation environment dE anda λ-abstraction expecting a value and a change value as arguments to evaluate a change term intoan output change value An evaluation environment dE follows the same structure as let-bindingsof change terms it binds variables to closed values and each variable x is immediately followed bya binding for its associated change variable dx As with let-bindings of change terms α-renamingsin an environment dE must rename dx into dy if x is renamed into y With no loss of generality weassume that all bound term variables are distinct in these environments

We dene the original environment bdEc1 and the new environment bdEc2 of a change environ-ment dE by induction over dE

bbullci = bull i = 1 2bdE x = v dx = dvc1 = bdEc1 x = vbdE x = v dx = dvc2 = bdEc2 x = v oplus dv

This last rule makes use of an operation oplus to update a value with a change which may fail atruntime Indeed change update is a partial function written ldquov oplus dvrdquo dened as follows

v oplus nil = vv1 oplus v2 = v2` oplus d` = δoplus(`d`)

E[λx t] oplus dE[λx dx dt] = (E oplus dE)[λx t](v1 vn) oplus (dv1 dvn) = (v1 oplus dv1 vn oplus dvn)

where(E x = v) oplus (dE x = v dx = dv) = ((E oplus dE) x = (v oplus dv))

Nil and replacement changes can be used to update all values (constants tuples primitives andclosures) while tuple changes can only update tuples literal changes can only update literals andclosure changes can only update closures A nil change leaves a value unchanged A replacementchange overrides the current value v with a new one v prime On literals oplus is dened via some in-terpretation function δoplus Change update for a closure ignores dt instead of combining it with tsomehow This may seem surprising but we only need oplus to behave well for valid changes (as shownby Lemma 1731) for valid closure changes dt must behave similarly to D ιn t o anyway so onlyenvironment updates matter This denition also avoids having to modify terms at runtime whichwould be dicult to implement safely

1Instead Ahmed [2006] and Acar et al [2008] count the number of steps that small-step evaluation would take (as we didin Appendix C) but this simplies some proof steps and makes a minor dierence in others

[SDVar]dE ` dx dArr dE(dx)

[SDTuple]dE(x dx) = vx dvx

dE y = (vx ) dy = (dvx ) ` dt dArr dv

dE ` let y = (x) dy = (dx) in dt dArr dv

[SDReplaceCall]dE(df ) =vf

bdE c1 ` (f x) dArrn vybdE c2 ` (f x) dArrm vy prime

dE y = vy dy =vy prime ` dt dArr dv

dE ` let y = f x dy = df x dx in dt dArr dv

[SDClosureNil]dE(f df ) = Ef [λx tf ] nilEf x = dE(x) ` tf dArrn vy

Ef x = bdE c2(x) ` tf dArrm vy prime

[SDPrimitiveNil]dE(f df ) = p nildE(x dx) = vx dvx

dE y = δp(vx ) dy = ∆p(vx dvx ) ` dt dArr dv

[SDClosureChange]dE(f df ) = Ef [λx tf ] dEf [λx dx dtf ]

dE(x dx) = vx dvxEf x = vx ` tf dArrn vy

dEf x = vx dx = dvx ` dtf dArr dvydE y = vy dy = dvy ` dt dArr dv

Figure 173 Step-indexed big-step semantics for the change terms of the source language λAL

Having given these denitions we show in Fig 173 a big-step semantics for change terms (with-out step-indexing) dened through judgment dE ` dt dArr dv pronounced ldquoUnder the environmentdE the change term dt evaluates into the closed change value dvrdquo [SDVar] looks up into dE toreturn a value for dx [SDTuple] builds a tuple out of the values of x and a change tuple out of thechange values of dx as found in the environment dE There are four rules to evaluate let-bindingdepending on the nature of dE(df ) These four rules systematically recomputes the value vy of y inthe original environment They dier in the way they compute the change dy to y

If dE(df ) is a replacement [SDReplaceCall] applies Replacing the value of f in the environmentforces recomputing f x from scratch in the new environment The resulting value v primey is the newvalue which must replace vy so dy binds to v primey when evaluating the let body

If dE(df ) is a nil change we have two rules depending on the nature of dE(f ) If dE(f ) is aclosure [SDClosureNil] applies and in that case the nil change of dE(f ) is the exact same closureHence to compute dy we reevaluate this closure applied to the updated argument bdEc2(x) to avalue v primey and bind dy to v primey In other words this rule is equivalent to [SDReplaceCall] in the casewhere a closure is replaced by itself2 If dE(f ) is a primitive [SDPrimitiveNil] applies The nilchange of a primitive p is its derivative which interpretation is realized by a function ∆p(ndash) Theevaluation of this function on the input value and the input change leads to the change dvy boundto dy

If dE(f ) is a closure change dEf [λx dx dtf ] [SDClosureChange] applies The change dvy resultsfrom the evaluation of dtf in the closure change environment dEf augmented with an input valuefor x and a change value for dx Again let us recall that we will maintain the invariant that theterm dtf behaves as the derivative of f so this rule can be seen as the invocation of f rsquos derivative

2Based on Appendix C we are condent we could in fact use the derivative of tf instead of replacement changes buttransforming terms in a semantics seems aesthetically wrong We can also restrict nil to primitives as we essentially did inAppendix C

Expressiveness A closure in the initial environment can be used to represent a top-level denitionSince environment entries can point to primitives we need no syntax to directly represent calls ofprimitives in the syntax of base terms To encode in our syntax a program with top-level denitionsand a term to be evaluated representing the entry point one can produce a term t representingthe entry point together with an environment E containing as values any top-level denitionsprimitives and constants used in the program

Our formalization does not model directly n-ary functions but they can be encoded throughunary functions and tuples This encoding does not support currying eciently but we discusspossible solutions in Sec 1757

Control operators like recursion combinators or branching can be introduced as primitiveoperations as well If the branching condition changes expressing the output change in generalrequires replacement changes Similarly to branching we can add tagged unions

1732 Static dierentiation in λALDenition 1051 denes dierentiation for simply-typed λ-calculus terms Figure 174 showsdierentiation for λAL syntax

D ιn x o = dx

D ιn let y = (x) in t o = let y = (x) dy = (dx) in D ιn t oD ιn let y = f x in t o = let y = f x dy = df x dx in D ιn t o

Figure 174 Static dierentiation in λAL

Dierentiating a base term t produces a change term D ιn t o its derivative Dierentiatingnal result variable x produces its change variable dx Dierentiation copies each binding of anintermediate result y to the output and adds a new bindings for its change dy If y is bound totuple (x) then dy will be bound to the change tuple (dx) If y is bound to function application f xthen dy will be bound to the application of function change df to input x and its change dx

Evaluating D ιn t o recomputes all intermediate results computed by t This recomputation willbe avoided through cache-transfer style in Sec 1735

The original transformation for static dierential of λ-terms [Cai et al 2014] has three caseswhich we recall here

Derive(x) = dxDerive(t u) = Derive(t)u Derive(u)

Derive(λx t) = λx dx Derive(t)

Even though the rst two cases of Cai et alrsquos dierentiation map into the two cases of ourdierentiation variant one may ask where the third case is realized now Actually this third caseoccurs while we transform the initial environment Indeed we will assume that the closures of theenvironment of the source program have been adjoined a derivative More formally we supposethat the derivative of t is evaluated under an environment D ιnE o obtained as follows

D ιn bull o = bull

D ιnE f = Ef [λx t] o = D ιnE o f = Ef [λx t] df = D ιnEf o[λx dx D ιn t o]D ιnE x = v o = D ιnE o x = v dx = nil (If v is not a closure)

1733 A new soundness proof for static dierentiationAs in Cai et al [2014]rsquos development static dierentiation is only sound on input changes thatare valid However our denition of validity needs to be signicantly dierent Cai et al provesoundness for a strongly normalizing simply-typed λ-calculus using denotational semantics Wegeneralize this result to an untyped and Turing-complete language using purely syntactic argumentsIn both scenarios a function change is only valid if it turns valid input changes into valid outputchanges so validity is a logical relation Since standard logical relations only apply to typedlanguages we turn to step-indexed logical relations

Validity as a step-indexed logical relation

To state and prove soundness of dierentiation we dene validity by introducing a ternary step-indexed relation over base values changes and updated values following previous work on step-indexed logical relations [Ahmed 2006 Acar et al 2008] Experts might notice small dierences inour step-indexing as mentioned in Sec 1731 but they do not aect the substance of the proof Wewrite

dv Bk v1 rarr v2

and say that ldquodv is a valid change fromv1 tov2 up to k stepsrdquo to mean that dv is a change fromv1 tov2 and that dv is a valid description of the dierences between v1 and v2 with validity tested withup to k steps To justify this intuition of validity we state and prove two lemmas a valid changefrom v1 to v2 goes indeed from v1 to v2 (Lemma 1731) and if a change is valid up to k steps it isalso valid up to fewer steps (Lemma 1732)

Lemma 1731 (oplus agrees with validity)If we have dv Bk v1 rarr v2 for all step-indexes k then v1 oplus dv = v2

Lemma 1732 (Downward-closure)If N ge n then dv BN v1 rarr v2 implies dv Bn v1 rarr v2

bull d` Bn ` rarr δoplus(`d`)

bull v2 Bn v1 rarr v2

bull nil Bn v rarr v

bull (dv1 dvm ) Bn (v1 vm ) rarr (v prime1 vprimem )

if and only ifforallk lt nforalli isin [1 m] dvi Bn vi rarr v primei

bull dE[λx dx dt] Bn E1[λx t] rarr E2[λx t]if and only if E2 = E1 oplus dE andforallk lt nv1 dvv2

if dv Bk v1 rarr v2then(dE dv ` dt) Ik (E1 x = v1 ` t) rarr(E2 x = v2 ` t)

Figure 175 Step-indexed relation between values and changes

As usual with step-indexed logical relations validity is dened by well-founded induction overnaturals ordered by lt To show this it helps to observe that evaluation always takes at least onestep

Validity is formally dened by cases in Figure 175 we describe in turn each case First aconstant change d` is a valid change from ` to `oplusd` = δoplus(`d`) Since the function δoplus is partial therelation only holds for the constant changes d` which are valid changes for ` Second a replacementchange v2 is always a valid change from any value v1 to v2 Third a nil change is a valid changebetween any value and itself Fourth a tuple change is valid up to step n if each of its componentsis valid up to any step strictly less than k Fifth we dene validity for closure changes Roughly

speaking this statement means that a closure change is valid if (i) its environment change dE isvalid for the original closure environment E1 and for the new closure environment E2 (ii) whenapplied to related values the closure bodies t1 and t2 are related by dt The validity relation betweenterms is dened as follows

(dE ` dt) In (E1 ` t1) rarr (E2 ` t2)if and only if forallk lt nv1v2E1 ` t1 dArrk v1 and E2 ` t2 dArr v2implies that existdvdE ` dt dArr dv and dv Bnminusk v1 rarr v2

We extend this relation from values to environments by dening a judgment dE Bk E1 rarr E2dened as follows

bull Bk bull rarr bulldE Bk E1 rarr E2 dv Bk v1 rarr v2

(dE x = v1 dx = dv) Bk (E1 x = v1) rarr E2 x = v2

The above lemmas about validity for values extend to environments

Lemma 1733 (oplus agrees with validity for environments)If dE Bk E1 rarr E2 then E1 oplus dE = E2

Lemma 1734 (Downward-closure for environments)If N ge n then dE BN E1 rarr E2 implies dE Bn E1 rarr E2

Finally for both values terms and environments omitting the step count k from validity meansvalidity holds for all ks That is for instance dv B v1 rarr v2 means dv Bk v1 rarr v2 for all k

Soundness of dierentiation

We can state a soundness theorem for dierentiation without mentioning step-indexes Instead ofproving it directly we must rst prove a more technical statement (Lemma 1736) that mentionsstep-indexes explicitly

Theorem 1735 (Soundness of dierentiation in λAL)If dE is a valid change environment from base environment E1 to updated environment E2 that isdE B E1 rarr E2 and if t converges both in the base and updated environment that is E1 ` t dArr v1 andE2 ` t dArr v2 then D ιn t o evaluates under the change environment dE to a valid change dv betweenbase result v1 and updated result v2 that is dE ` D ιn t o dArr dv dv B v1 rarr v2 and v1 oplus dv = v2

Lemma 1736 (Fundamental Property)For each n if dE Bn E1 rarr E2 then (dE ` D ιn t o) In (E1 ` t) rarr (E2 ` t)

1734 The target language iλALIn this section we present the target language of a transformation that extends static dierentiationwith CTS conversion As said earlier the functions of iλAL compute both their output and a cachewhich contains the intermediate values that contributed to that output Derivatives receive thiscache and use it to compute changes without recomputing intermediate values derivatives alsoupdate the caches according to their input changes

Base termsM = let y cyfx = f x in M Call

| let y = (x) in Tuple| (x C) Result

Cache termsC = bull Empty

| C cyfx Sub-cache| C x Cached value

Change termsdM = let dy cyfx = df dx cyfx in dM Call

| let dy = (dx) in dM Tuple| (dx dC) Result

Cache updatesdC = bull Empty

| dC cyfx Sub-cache| dC (x oplus dx) Updated value

Closed valuesV = F [λx M ] Closure

| (V ) Tuple| ` Literal| p Primitive

Cache valuesVc = bull Empty

| Vc Vc Sub-cache| Vc V Cached value

Change valuesdV = dF[λdxC dM ] Closure

| (dV ) Tuple| d` Literal| nil Nil| V Replacement

Base denitionsDv = x = V Value denition

| cyfx = Vc Cache denition

Change denitionsdDv = Dv Base

| dx = dV ChangeEvaluation environments

F = F Dv Binding| bull Empty

Change environmentsdF = dF dDv Binding

| bull Empty

Figure 176 Target language iλAL (syntax)

Syntax The syntax of iλAL is dened in Figure 176 Base terms of iλAL follow again λ-liftedArsquoNF like λAL except that a let-binding for a function application f x now binds an extra cacheidentier cyfx besides output y Cache identiers have non-standard syntax it can be seen as a triplethat refers to the value identiers f x and y Hence an α-renaming of one of these three identiersmust refresh the cache identier accordingly Result terms explicitly return cache C through syntax(xC)

The syntax for caches has three cases a cache can be empty or it can prepend a value or a cachevariable to a cache In other words a cache is a tree-like data structure which is isomorphic to anexecution trace containing both immediate values and the execution traces of the function callsissued during the evaluation

The syntax for change terms of iλAL witnesses the CTS discipline followed by the derivativesto determine dy the derivative of f evaluated at point x with change dx expects the cache producedby evaluating y in the base term The derivative returns the updated cache which contains theintermediate results that would be gathered by the evaluation of f (x oplus dx) The result term of everychange term returns a cache update dC in addition to the computed change

The syntax for cache updates resembles the one for caches but each value identier x of theinput cache is updated with its corresponding change dx

Semantics An evaluation environment F of iλAL contains not only values but also cache valuesThe syntax for values V includes closures tuples primitives and constants The syntax for cachevalues Vc mimics the one for cache terms The evaluation of change terms expects the evaluationenvironments dF to also include bindings for change values

There are ve kinds of change values closure changes tuple changes literal changes nil changesand replacements Closure changes embed an environment dF and a code pointer for a functionwaiting for both a base value x and a cache C By abuse of notation we reuse the same syntax C toboth deconstruct and construct caches Other changes are similar to the ones found in λAL

Base terms of the language are evaluated using a big-step semantics dened in Figure 177

Evaluation of base terms F ` M dArr (V Vc )

[TResult]F (x) = V F ` C dArr Vc

F ` (xC) dArr (V Vc )

[TTuple]F y = F (x) ` M dArr (V Vc )

F ` let y = (x) in M dArr (V Vc )

[TClosureCall]F (f ) = F prime[λx primeM prime]

F prime x prime = F (x) ` M prime dArr (V primeV primec )F y = V prime cyfx = V

primec ` M dArr (V Vc )

F ` let y cyfx = f x in M dArr (V Vc )

[TPrimitiveCall]F (f ) = ` δ`(F (x)) = (V

primeV primec ) F y = V prime cyfx = Vprimec ` M dArr v

Evaluation of caches F ` C dArr Vc

[TEmptyCache]

F ` bull dArr bull

[TCacheVar]F (x) = V F ` C dArr Vc

F ` C x dArr Vc V

[TCacheSubCache]F (c

yfx ) = V

primec F ` C dArr Vc

F ` C cyfx dArr Vc V

primec

Figure 177 Target language iλAL (semantics of base terms and caches)

Judgment ldquoF ` M dArr (V Vc )rdquo is read ldquoUnder evaluation environment F base term M evaluates tovalue V and cache Vc rdquo Auxiliary judgment ldquoF ` C dArr Vc rdquo evaluates cache terms into cache valuesRule [TResult] not only looks into the environment for the return value V but it also evaluatesthe returned cache C Rule [TTuple] is similar to the rule of the source language since no cacheis produced by the allocation of a tuple Rule [TClosureCall] works exactly as [SClosureCall]except that the cache value returned by the closure is bound to cache identier cyfx In the sameway [TPrimitiveCall] resembles [SPrimitiveCall] but also binds cyfx Rule [TEmptyCache] evaluatesan empty cache term into an empty cache value Rule [TCacheVar] computes the value of cachetermC x by appending the value of variable x to cache valueVc computed for cache termC Similarlyrule [TCacheSubCache] appends the cache value of a cache named cyfx to the cache valueVc computedfor C

Change terms of the target language are also evaluated using a big-step semantics dened inFig 178 Judgment ldquodF ` dM dArr (dV Vc )rdquo is read ldquoUnder evaluation environment F change term dMevaluates to change value dV and updated cache Vc rdquo The rst auxiliary judgment ldquodF ` dC dArr Vc rdquodenes evaluation of cache update terms We omit the rules for this judgment since it is similarto the one for cache terms except that cached values are computed by x oplus dx not simply x Thenal auxiliary judgment ldquoVc sim C rarr dFrdquo describes a limited form of pattern matching used byCTS derivatives namely how a cache pattern C matches a cache value Vc to produce a changeenvironment dF

Rule [TDResult] returns the nal change value of a computation as well as a updated cacheresulting from the evaluation of the cache update term dC Rule [TDTuple] resembles its counterpartin the source language but the tuple for y is not built as it has already been pushed in the environmentby the cache

As for λAL there are four rules to deal with let-bindings depending on the shape of the changebound to df in the environment If df is bound to a replacement the rule [TDReplaceCall] applies

Evaluation of change terms dF ` dM dArr (dV Vc )

[TDResult]dF(dx) = dV dF ` dC dArr Vc

dF ` (dx dC) dArr (dV Vc )

[TDTuple]dF dy = dF(dx) ` dM dArr (dV Vc )

dF ` let dy = (dx) in dM(dV Vc ) dArr

[TDReplaceCall]dF(df ) =Vf

bdFc2 ` (f x) dArr (V primeV primec )dF dy =V prime cyfx = V

primec ` dM dArr (dV Vc )

dF ` let dy cyfx = df dx cyfx in dM dArr (dV Vc )

[TDClosureNil]dF(f df ) = dF f [λx Mf ]nil

dF f x = bdFc2(x) ` Mf dArr (Vprimey Vprimec )

dF dy =Vy prime cyfx = V

[TDPrimitiveNil]dF(f df ) = pnil dF(x dx) = Vx dV x dE dy cyfx = ∆p(Vx dV x dF(c

yfx )) ` dM dArr (dV Vc )

[TDClosureChange]dF(df ) = dF f [λdxC dMf ] dF(cyfx ) sim C rarr dF prime

dF f dx = dF(dx) dF prime ` dMf dArr (dV y Vprimec ) dF dy = dV y c

yfx = V

Binding of caches Vc sim C rarr dF

[TMatchEmptyCache]

bull sim bull rarr bull

[TMatchCachedValue]Vc sim C rarr dF

Vc V sim C x rarr dF (x = V )

[TMatchSubCache]Vc sim C rarr dF

Vc Vprimec sim C c

yfx rarr dF (cyfx = V

primec )

Figure 178 Target language iλAL (semantics of change terms and cache updates)

CTS translation of toplevel denitions C n f = λx t oC n f = λx t o = f = λx M

df = λdxC Dbull (xoplusdx)n t owhere (CM) = Cbull xn t o

CTS dierentiation of terms DdCn t oDdCn let y = f x in t o = n let dy cyfx = df dx cyfx in M o

where M = D(dC (yoplusdy) cyfx )n t o

DdCn let y = (x) in t o = n let dy = (dx) in M owhere M = D(dC (yoplusdy))n t o

DdCn x o = n (dx dC) o

CTS translation of terms CCn t oCCn let y = f x in t o = (C prime n let y cyfx = f x in M o)

where (C primeM) = C(C y cyfx )n t o

CCn let y = (x) in t o = (C prime n let y = (x) in M o)where (C primeM) = C(C y)n t o

CCn x o = (C n (xC) o)

Figure 179 Cache-Transfer Style (CTS) conversion from λAL to iλAL

In that case we reevaluate the function call in the updated environment bdFc2 (dened similarly asin the source language) This evaluation leads to a new value V prime which replaces the original one aswell as an updated cache for cyfx

If df is bound to a nil change and f is bound to a closure the rule [TDClosureNil] applies Thisrule mimicks again its counterpart in the source language passing with the dierence that only theresulting change and the updated cache are bound in the environment

If df is bound to a nil change and f is bound to primitive p the rule [TDPrimitiveNil] appliesThe derivative of p is invoked with the value of x its change value and the cache of the original callto p The semantics of prsquos derivative is given by builtin function ∆p(ndash) as in the source language

If df is bound to a closure change and f is bound to a closure the rule [TDClosureNil] appliesThe body of the closure change is evaluated under the closure change environment extended withthe value of the formal argument dx and with the environment resulting from the binding of theoriginal cache value to the variables occuring in the cache C This evaluation leads to a change andan updated cache bound in the environment to continue with the evaluation of the rest of the term

1735 CTS conversion from λAL to iλAL

CTS conversion from λAL to iλAL is dened in Figure 179 It comprises CTS dierentiation Dn ndash ofrom λAL base terms to iλAL change terms and CTS translation C n ndash o from λAL to iλAL which isoverloaded over top-level denitions C n f = λx t o and terms CCn t o

By the rst rule C n ndash o maps each source toplevel denition ldquof = λx trdquo to the compiled codeof the function f and to the derivative df of f expressed in the target language iλAL These targetdenitions are generated by a rst call to the compilation function Cbull xn t o it returns both M thecompiled body of f and the cache term C which contains the names of the intermediate values

CTS translation of values C nv oC nE[λx t] o = C nE o[λx M]

where (CM) = Cbull xn t o

C n ` o = `

CTS translation of change values C n dv oC ndE[λx dx Dιn t o] o = C ndE o[λdxC Dbull (xoplusdx)n t o]

C n v o = C nv oC nd` o = d`

CTS translation of value environments C nE oC n bull o = bull

C nE x = v o = C nE o x = C nv o

CTS translation of change environments C ndE oC n bull o = bull

C ndE x = v dx = dv o = C ndE o x = C nv o dx = C n dv o

Figure 1710 Extending CTS translation to values change values environments and change envi-ronments

computed by the evaluation of M This cache term C is used as a cache pattern to dene the secondargument of the derivative of f That way we make sure that the shape of the cache expectedby df is consistent with the shape of the cache produced by f Derivative body df is computed byderivation call Dbull (xoplusdx)n t o

CTS translation on terms CCn t o accepts a term t and a cache term C This cache is a fragmentof output code in tail position (t = x) it generates code to return both the result x and the cache C When the transformation visits let-bindings it outputs extra bindings for caches cyfx and appendsall variables newly bound in the output to the cache used when visiting the let-body

Similarly to CCn t o CTS derivation DdCn t o accepts a cache update dC to return in tail positionWhile cache terms record intermediate results cache updates record result updates For let-bindingsto update y by change dy CTS derivation appends to dC term y oplus dy to replace y

1736 Soundness of CTS conversionIn this section we outline denitions and main lemmas needed to prove CTS conversion sound Theproof is based on a mostly straightforward simulation in lock-step but two subtle points emergeFirst we must relate λAL environments that do not contain caches with iλAL environments that doSecond while evaluating CTS derivatives the evaluation environment mixes caches from the basecomputation and updated caches computed by the derivatives

Evaluation commutes with CTS conversion Figure 1710 extends CTS translation to valueschange values environments and change environments CTS translation of base terms commuteswith our semantics

dF uarr bull i dFdF uarr C i dFprime

dF uarr C x i dFprimedF uarr C i dFprime bdF ci ` let y c

yfx = f x in (y cyfx ) dArr (_ Vc )

dF uarr C cyfx i dFprime cyfx = Vc

Figure 1711 Extension of an environment with cache values dF uarr C i dF prime (for i = 1 2)

Lemma 1737 (Evaluation commutes with CTS translation)For all E t and v such that E ` t dArr v and for all C there exists Vc C nE o ` CCn t o dArr (C nv oVc )

Stating a corresponding lemma for CTS translation of derived terms is trickier If the derivativeof t evaluates correctly in some environment (that is dE ` D ιn t o dArr dv) CTS derivative DdCn t o can-not be evaluated in environment C ndE o A CTS derivative can only evaluate against environmentscontaining cache values from the base computation but no cache values appear in C ndE o

As a x we enrich C ndE o with the values of a cache C using the pair of judgments dF uarr C idF prime (for i = 1 2) dened in Fig 1711 Judgment dF uarr C i dF prime (for i = 1 2) is read ldquoTarget changeenvironment dF prime extends dF with the original (for i = 1) (or updated for i = 2) values of cache C rdquoSince C is essentially a list of variables containing no values cache values in dF prime must be computedfrom bdFci

Lemma 1738 (Evaluation commutes with CTS dierentiation)LetC be such that (C _) = Cbulln t o For all dE t and dv if dE ` D ιn t o dArr dv and C ndE o uarr C 1 dF then dF ` DdCn t o dArr (C n dv oVc )

The proof of this lemma is not immediate since during the evaluation of DdCn t o the new cachesreplace the old caches In our Coq development we enforce a physical separation between the partof the environment containing old caches and the one containing new caches and we maintain theinvariant that the second part of the environment corresponds to the remaining part of the term

Soundness of CTS conversion Finally we can state soundness of CTS dierentiation relativeto dierentiation The theorem says that (a) the CTS derivative DCn t o computes the CTS transla-tion C n dv o of the change computed by the standard derivative D ιn t o (b) the updated cache Vc2produced by the CTS derivative coincides with the cache produced by the CTS-translated base termM in the updated environment bdF2c2 We must use bdF2c2 instead of dF2 to evaluate CTS-translatedbase term M since dF2 produced by environment extension contains updated caches changes andoriginal values Since we require a correct cache via condition (b) we can use this cache to invokethe CTS derivative on further changes as described in Sec 1722

Theorem 1739 (Soundness of CTS dierentiation wrt dierentiation)Let C and M be such that (CM) = Cbulln t o For all dE t and dv such that dE ` D ιn t o dArr dv and forall dF1 and dF2 such that

C ndE o uarr C 1 dF1C ndE o uarr C 2 dF2bdF1c1 ` M dArr (V1Vc1)bdF2c2 ` M dArr (V2Vc2)

we havedF1 ` DCn t o dArr (C n dv oVc2)

174 Incrementalization case studies

In this section we investigate whether our transformations incrementalize eciently programsin a typed language such as Haskell And indeed after providing support for the needed bulkoperations on sequences bags and maps we successfully transform a few case studies and obtainecient incremental programs that is ones for which incremental computation is faster than fromscratch recomputation

We type CTS programs by by associating to each function f a cache type FC We can transformprograms that use higher-order functions by making closure construction explicit

We illustrate those points on three case studies average computation over bags of integers anested loop over two sequences and a more involved example inspired by Koch et alrsquos work onincrementalizing database queries

In all cases we conrm that results are consistent between from scratch recomputation andincremental evaluation

Our benchmarks were compiled by GHC 802 They were run on a 260GHz dual core Inteli7-6600U CPU with 12GB of RAM running Ubuntu 1604

1741 Averaging integers bagsSection Sec 1722 motivates our transformation with a running example of computing the averageover a bag of integers We represent bags as maps from elements to (possibly negative) multiplicitiesEarlier work [Cai et al 2014 Koch et al 2014] represents bag changes as bags of removed andadded elements to update element a with change da one removes a and adds a oplus da Instead ourbag changes contain element changes a valid bag change is either a replacement change (with anew bag) or a list of atomic changes An atomic change contains an element a a valid change forthat element da a multiplicity n and a change to that multiplicity dn Updating a bag with an atomicchange means removing n occurrences of a and inserting n oplus dn occurrences of updated elementa oplus da

type Bag a = Map a Ztype ∆(Bag a) = Replace (Bag a) | Ch [AtomicChange a]data AtomicChange a = AtomicChange a (∆a) Z (∆Z)

This change type enables both users and derivatives to express eciently various bag changessuch as inserting a single element or changing a single element but also changing some but not alloccurrences of an element in a bag

insertSingle ararr ∆(Bag a)insertSingle a = Ch [AtomicChange a 0a 0 (Ch 1)]changeSingle ararr ∆ararr ∆(Bag a)changeSingle a da = Ch [AtomicChange a da 1 01 ]

Based on this change structure we provide ecient primitives and their derivatives The CTSvariant of map that we call mapC takes a function fC in CTS and a bag as and produces a bag anda cache Because map is not self-maintainable the cache stores the arguments fC and as In additionthe cache stores for each invocation of fC and therefore for each distinct element in as the resultof fC of type b and the cache of type c The incremental function dmapC has to produce an updatedcache We check that this is the same cache that mapC would produce when applied to updatedinputs using the QuickCheck library

Following ideas inspired by Rossberg et al [2010] all higher-order functions (and typically alsotheir caches) are parametric over cache types of their function arguments Here functions mapCand dmapC and cache type MapC are parametric over the cache type c of fC and dfC

map (ararr b) rarr Bag ararr Bag btype MapC a b c = (ararr (b c)Bag aMap a (b c))mapC (ararr (b c)) rarr Bag ararr (Bag bMapC a b c)dmapC (∆ararr c rarr (∆b c)) rarr ∆(Bag a) rarrMapC a b c rarr (∆(Bag b)MapC a b c)

We wrote the length and sum function used in our benchmarks in terms of primitives mapand foldGroup We applied our transformations to get lengthC dlengthC sumC and dsumC Thistransformation could in principle be automated Implementing the CTS variant and CTS derivativeof primitives eciently requires manual work

We evaluate whether we can produce an updated result with davgC faster than by from scratchrecomputation with avg We use the criterion library for benchmarking The benchmarking codeand our raw results are available in the supplementarty material We x an initial bag of size n Itcontains the integers from 1 to n We dene a sequence of consecutive changes of length r as theinsertion of 1 the deletion of 1 the insertion of 2 the deletion of 2 and so on We update the initialbag with these changes one after another to obtain a sequence of input bags

We measure the time taken by from scratch recomputation We apply avg to each input bagin the sequence and fully force all results We report the time from scratch recomputation takesas this measured time divided by the number r of input bags We measure the time taken by ourCTS variant avgC similarly We make sure to fully force all results and caches avgC produces onthe sequence of input bags

We measure the time taken by our CTS derivative davgC We assume a fully forced initial resultand cache We apply davgC to each change in the sequence of bag changes using the cache fromthe previous invocation We then apply the result change to the previous result We fully force theobtained sequence of results Finally we measure the time taken to only produce the sequence ofresult changes without applying them

Because of laziness choosing what exactly to measure is tricky We ensure that reported timesinclude the entire cost of computing the whole sequence of results To measure this cost we ensureany thunks within the sequence are forced We need not force any partial results such as outputchanges or caches Doing so would distort the results because forcing the whole cache can costasymptotically more than running derivatives However not using the cache at all underestimatesthe cost of incremental computation Hence we measure a sequence of consecutive updates eachusing the cache produced by the previous one

The plot in Fig 1712a shows execution time versus the size n of the initial input To produce theinitial result and cache avgC takes longer than the original avg function takes to produce just theresult This is to be expected Producing the result incrementally is much faster than from scratchrecomputation This speedup increases with the size of the initial bag For an initial bag size of 800incrementally updating the result is 70 times faster than from scratch recomputation

The dierence between only producing the output change versus producing the output changeand updating the result is very small in this example This is to be expected because in this examplethe result is an integer and updating and evaluating an integer is fast We will see an example wherethere is a bigger dierence between the two

200 400 600 800

input size

illise

soriginalcaching

change onlyincremental

(a) Benchmark results for avg

200 400 600 800

input size

illise

originalcaching

(b) Benchmark results for totalPrice

1742 Nested loops over two sequencesWe implemented incremental sequences and related primitives following Firsov and Jeltsch [2016]our change operations and rst-order operations (such as concat) reuse their implementation Onthe other hand we must extend higher-order operations such as map to handle non-nil functionchanges and caching A correct and ecient CTS incremental dmapC function has to work dierentlydepending on whether the given function change is nil or not For a non-nil function change it hasto go over the input sequence for a nil function change it can avoid that

Consider the running example again this time in ArsquoNF The partial application of a lambdalifted function constructs a closure We made that explicit with a closure function

cartesianProduct xs ys = letmapPairYs = closure mapPair ysxys = concatMap mapPairYs xsin xys

mapPair ys = λx rarr letpairX = closure pair xxys = map pairX ysin xys

While the only valid change for closed functions is their nil change for closures we can havenon-nil function changes We represent closed functions and closures as variants of the same typeCorrespondingly we represent changes to a closed function and changes to a closure as variantsof the same type of function changes We inspect this representation at runtime to nd out if afunction change is a nil change

data Fun a b c whereClosed (ararr (b c)) rarr Fun a b cClosure (erarr ararr (b c)) rarr erarr Fun a b c

data ∆(Fun a b c) whereDClosed (∆ararr c rarr (∆b c)) rarr ∆(Fun a b c)DClosure (∆erarr ∆ararr c rarr (∆b c)) rarr ∆erarr ∆(Fun a b c)

200 400 600 800

input size

originalcaching

(a) Benchmark results for cartesian product changinginner sequence

200 400 600 800

input size

originalcaching

(b) Benchmark results for Cartesian product changingouter sequence

We have also evaluated another representation of function changes with dierent tradeos wherewe use defunctionalization instead of closure conversion We discuss the use of defunctionalizationin Appendix D

We use the same benchmark setup as in the benchmark for the average computation on bagsThe input for size n is a pair of sequences (xs ys) Each sequence initially contains the integers from1 to n Updating the result in reaction to a change dxs to xs takes less time than updating the resultin reaction to a change dys to ys The reason is that when dys is a nil change we dynamically detectthat the function change passed to dconcatMapc is the nil change and avoid looping over xs entirely

We benchmark changes to the outer sequence xs and the inner sequence ys separately In bothcases the sequence of changes are insertions and deletions of dierent numbers at dierent positionsWhen we benchmark changes to the outer sequence xs we take this sequence of changes for xs andall changes to ys will be nil changes When we benchmark changes to the inner sequence ys wetake this sequence of changes for ys and all changes to xs will be nil changes

In this example preparing the cache takes much longer than from scratch recomputation Thespeedup of incremental computation over from scratch recomputation increases with the size ofthe initial sequences It reaches 25 for a change to the inner sequences and 88 for a change tothe outer sequence when the initial sequences have size 800 Producing and fully forcing only thechanges to the results is 5 times faster for a change to the inner sequence and 370 times faster for achange to the outer sequence

While we do get speedups for both kinds of changes (to the inner and to the outer sequence)the speedups for changes to the outer sequence are bigger Due to our benchmark methodology wehave to fully force updated results This alone takes time quadratic in the size of the input sequencesn Threrefore we also report the time it takes to produce and fully force only the output changeswhich is of a lower time complexity

1743 Indexed joins of two bags

As expected we found that we can compose functions into larger and more complex programs andapply CTS dierentiation to get a fast incremental program

For this example imagine we have a bag of orders and a bag of line items An order is a pair of

an integer key and an exchange rate represented as an integer A line item is a pair of a key of acorresponding order and a price represented as an integer The price of an order is the sum of theprices of the corresponding line items multiplied by the exchange rate We want to nd the totalprice dened as the sum of the prices of all orders

To do so eciently we build an index for line items and an index for orders We representindexes as maps We build a map from order key to the sum of the prices of corresponding lineitems Because orders are not necessarily unique by key and because multiplication distributesover addition we can build an index mapping each order key to the sum of all exchange rates forthis order key We then compute the total price by merging the two maps by key multiplyingcorresponding sums of exchange rates and sums of prices We compute the total price as the sum ofthose products

Because our indexes are maps we need a change structure for maps We generalize the changestructure for bags by generalizing from multiplicities to arbitrary values as long as they have agroup structure A change to a map is either a replacement change (with a new map) or a list ofatomic changes An atomic change is a key k a valid change to that key dk a value a and a validchange to that value da To update a map with an atomic change means to use the group structureof the type of a to subtract a at key k and to add a oplus da at key k oplus dk

type ∆(Map k a) = Replace (Map k a) | Ch [AtomicChange k a]data AtomicChange k a = AtomicChange k (∆k) a (∆a)

Bags have a group structure as long as the elements have a group structure Maps have a groupstructure as long as the values have a group structure This means for example that we can ecientlyincrementalize programs on maps from keys to bags of elements We implemented ecient cachingand incremental primitives for maps and veried their correctness with QuickCheck

To build the indexes we use a groupBy function built from primitive functions foldMapGroup onbags and singleton for bags and maps respectively While computing the indexes with groupBy is self-maintainable merging them is not We need to cache and incrementally update the intermediatelycreated indexes to avoid recomputing them

type Order = (ZZ)type LineItem = (ZZ)totalPrice Bag Order rarr Bag LineItemrarr ZtotalPrice orders lineItems = let

orderIndex = groupBy fst ordersorderSumIndex = Mapmap (BagfoldMapGroup snd) orderIndexlineItemIndex = groupBy fst lineItemslineItemSumIndex = Mapmap (BagfoldMapGroup snd) lineItemIndexmerged = Mapmerge orderSumIndex lineItemSumIndextotal = MapfoldMapGroup multiply mergedin total

This example is inspired by Koch et al [2014] Unlike them we donrsquot automate indexing so toget good performance we start from a program that explicitly uses indexes

We evaluate the performace in the same way we did for the average computation on bags Theinitial input of size n is a pair of bags where both contain the pairs (i i) for i between 1 and n Thesequence of changes are alternations between insertion and deletion of pairs (j j) for dierent j Wealternate the insertions and deletions between the orders bag and the line items bag

Our CTS derivative of the original program produces updated results much faster than fromscratch recomputation and again the speedup increases with the size of the initial bags For an

initial size of the two bags of 800 incrementally updating the result is 180 times faster than fromscratch recomputation

175 Limitations and future workIn this section we describe limitations to be addressed in future work

1751 Hiding the cache typeIn our experiments functions of the same type f1 f2 Ararr B can be transformed to CTS functionsf1 Ararr (BC1) f2 Ararr (BC2) with dierent cache types C1C2 since cache types depend onthe implementation We can x this problem with some runtime overhead by using a single cachetype Cache dened as a tagged union of all cache types If we defunctionalize function changes wecan index this cache type with tags representing functions but other approaches are possible andwe omit details We conjecture (but have not proven) this x gives a type-preserving translationbut leave this question for future work

1752 Nested bagsOur implementation of bags makes nested bags overly slow we represent bags as tree-based mapsfrom elements to multiplicity so looking up a bag b in a bag of bags takes time proportional to thesize of b Possible solutions in this case include shredding like done by [Koch et al 2016] We haveno such problem for nested sequences or other nested data which can be addressed in O(1)

1753 Proper tail callsCTS transformation conicts with proper tail calls as it turns most tail calls into non-tail callsIn AprimeNF syntax tail calls such as let y = f x in g y become let y = f x in let z = g y in zand in CTS that becomes let (y cy) = f x in let (z cz) = g y in (z (cy cz)) where the call tog is genuinely not in tail position This prevents recursion on deeply nested data structures likelong lists but such programs incrementalize ineciently if deeply nested data is aected so it isadvisable to replace lists by other sequences anyway Itrsquos unclear whether such xes are availablefor other uses of tail calls

1754 Pervasive replacement valuesThanks to replacement changes we can compute a change from any v1 to any v2 in constant timeCai et al [2014] use a dierence operator instead but itrsquos hard to implement in constanttime on values of non-constant size So our formalization and implementation allow replacementvalues everywhere to ensure all computations can be incrementalized in some sense Supportingreplacement changes introduces overhead even if they are not used because it prevents writingself-maintainable CTS derivatives Hence to improve performance one should consider droppingsupport for replacement values and restricting supported computations Consider a call to a binaryCTS derivative dfc da db c1 after computing (y1 c1) = fc a1 b1 if db is a replacement change b2then dfc must compute a result afresh by invoking fc a2 b2 and a2 = a1 oplus da requires rememberingprevious input a1 inside c1 By the same argument c1 must also remember input b1 Worsereplacement values are only needed to handle cases where incremental computation reduces torecomputation because the new input is completely dierent or because the condition of an if

expression changed Such changes to conditions are forbidden by other works [Koch et al 2016]we believe for similar reasons

1755 Recomputing updated values

In some cases the same updated input might be recomputed more than once If a derivative dfneeds some base input x (that is if df is not self-maintainable) df rsquos input cache will contain a copyof x and df rsquos output cache will contain its updated value x oplus dx When all or most derivativesare self-maintainable this is convenient because in most cases updated inputs will not need tobe computed But if most derivatives are not self-maintainable the same updated input might becomputed multiple times specically if derivative dh calls functions df and dg and both df and dgneed the same base input x caches for both df and dg will contain the updated value of x oplus dxcomputed independently Worse because of pervasive replacement values (Sec 1754) derivativesin our case studies tend to not be self-maintainable

In some cases such repeated updates should be removable by a standard optimizer after inliningand common-subexpression elimination but it is unclear how often this happens To solve thisproblem derivatives could take and return both old inputs x1 and updated ones x2 = x1 oplus dx andx2 could be computed at the single location where dx is bound In this case to avoid updates forunused base inputs we would have to rely more on absence analysis (Sec 1756) pruning functioninputs appears easier than pruning caches Otherwise computations of updated inputs that are notused in a lazy context might cause space leaks where thunks for x2 = x1 oplus dx1 x3 = x2 oplus dx2 andso on might accumulate and grow without bounds

1756 Cache pruning via absence analysis

To reduce memory usage and runtime overhead it should be possible to automatically removefrom transformed programs any caches or cache fragments that are not used (directly or indirectly)to compute outputs Liu [2000] performs this transformation on CTS programs by using absenceanalysis which was later extended to higher-order languages by Sergey et al [2014] In lazylanguages absence analysis removes thunks that are not needed to compute the output Weconjecture that as long as the analysis is extended to not treat caches as part of the output it shouldbe able to remove unused caches or inputs (assuming unused inputs exist see Sec 1754)

1757 Unary vs n-ary abstraction

We only show our transformation correct for unary functions and tuples But many languagesprovide ecient support for applying curried functions such as div Zrarr Zrarr Z via either thepush-enter or eval-apply evaluation model For instance invoking div m n should not require theoverhead of a function invocation for each argument [Marlow and Jones 2006] Naively transformingsuch a curried functions to CTS would produce a function divc of typeZrarr (Zrarr (ZDivC2))DivC1)with DivC1 = () which adds excessive overhead3 Based on preliminary experiments we believe wecan straightforwardly combine our approach with a push-enter or eval-apply evaluation strategyfor transformed curried functions Alternatively we expect that translating Haskell to StrictCore [Bolingbroke and Peyton Jones 2009] would take care of turning Haskell into a languagewhere function types describe their arity which our transformation can easily take care of Weleave a more proper investigation for future work

3In Sec 172 and our evaluation we use curried functions and never need to use this naive encoding but only because wealways invoke functions of known arity

176 Related workOf all research on incremental computation in both programming languages and databases [Guptaand Mumick 1999 Ramalingam and Reps 1993] we discuss the most closely related works Otherrelated work more closely to cache-transfer style is discussed in Chapter 19

Previouswork on cache-transfer-style Liu [2000]rsquos work has been the fundamental inspirationto this work but her approach has no correctness proof and is restricted to a rst-order untypedlanguage (in part because no absence analysis for higher-order languages was available) Moreoverwhile the idea of cache-transfer-style is similar itrsquos unclear if her approach to incrementalizationwould extend to higher-order programs

Firsov and Jeltsch [2016] also approach incrementalization by code transformation but theirapproach does not deal with changes to functions Instead of transforming functions written interms of primitives they provide combinators to write CTS functions and derivatives togetherOn the other hand they extend their approach to support mutable caches while restricting toimmutable ones as we do might lead to a logarithmic slowdown

Finite dierencing Incremental computation on collections or databases by nite dierencinghas a long tradition [Paige and Koenig 1982 Blakeley et al 1986] The most recent and impressiveline of work is the one on DBToaster [Koch 2010 Koch et al 2014] which is a highly ecientapproach to incrementalize queries over bags by combining iterated nite dierencing with otherprogram transformations They show asymptotic speedups both in theory and through experimentalevaluations Changes are only allowed for datatypes that form groups (such as bags or certainmaps) but not for instance for lists or sets Similar ideas were recently extended to higher-orderand nested computation [Koch et al 2016] though still restricted to datatypes that can be turnedinto groups

Logical relations To study correctness of incremental programs we use a logical relation amonginitial values v1 updated values v2 and changes dv To dene a logical relation for an untypedλ-calculus we use a step-indexed logical relation following [Appel and McAllester 2001 Ahmed2006] in particular our denitions are closest to the ones by Acar et al [2008] who also workswith an untyped language big-step semantics and (a dierent form of) incremental computationHowever their logical relation does not mention changes explicitly since they do not have rst-classstatus in their system Moreover we use environments rather than substitution and use a slightlydierent step-indexing for our semantics

Dynamic incrementalization The approaches to incremental computation with the widestapplicability are in the family of self-adjusting computation [Acar 2005 2009] including its descen-dant Adapton [Hammer et al 2014] These approaches incrementalize programs by combiningmemoization and change propagation after creating a trace of base computations updated inputsare compared with old ones in O(1) to nd corresponding outputs which are updated to accountfor input modications Compared to self-adjusting computation Adapton only updates resultswhen they are demanded As usual incrementalization is not ecient on arbitrary programs Toincrementalize eciently a program must be designed so that input changes produce small changesto the computation trace renement type systems have been designed to assist in this task [Ccediliccedileket al 2016 Hammer et al 2016] Instead of comparing inputs by pointer equality Nominal Adap-ton [Hammer et al 2015] introduces rst-class labels to identify matching inputs enabling reuse inmore situations

Recording traces has often signicant overheads in both space and time (slowdowns of 20-30timesare common) though Acar et al [2010] alleviate that by with datatype-specic support for tracinghigher-level operations while Chen et al [2014] reduce that overhead by optimizing traces to notrecord redundant entries and by logging chunks of operations at once which reduces memoryoverhead but also potential speedups

177 Chapter conclusionWe have presented a program transformation which turns a functional program into its derivativeand eciently shares redundant computations between them thanks to a statically computed cacheAlthough our rst practical case studies show promising results it remains now to integrate thistransformation into a realistic compiler

Chapter 18

Towards dierentiation for SystemF

Dierentiation is closely related to both logical relations and parametricity as noticed by variousauthors in discussions of dierentiation Appendix C presents novel proofs of ILC correctnessby adapting some logical relation techniques As a demonstration we dene in this chapterdierentiation for System F by adapting results about parametricity We stop short of a full proofthat this generalization is correct but we have implemented and tested it on a mature implementationof a System F typechecker we believe the proof will mostly be a straightforward adaptation ofexisting ones about parametricity but we leave verication for future work A key open issue isdiscussed in Remark 1821

History and motivation Various authors noticed that dierentiation appears related to (binary)parametricity (including Atkey [2015]) In particular it resembles a transformation presented byBernardy and Lasson [2011] for arbitrary PTSs1 By analogy with unary parametricity Yufei Caisketched an extension of dierentiation for arbitrary PTSs but many questions remained open andat the time our correctness proof for ILC was signicantly more involved and trickier to extend toSystem F since it was dened in terms of denotational equivalence Later we reduced the proofcore to dening a logical relation proving its fundamental property and a few corollaries as shownin Chapter 12 Extending this logical relation to System F proved comparably more straightforward

Parametricity versus ILC Both parametricity and ILC dene logical relations across programexecutions on dierent inputs When studying parametricity dierences are only allowed in theimplementations of abstractions (through abstract types or other mechanisms) To be relateddierent implementations of the same abstraction must give results that are equivalent according tothe calling program Indeed parametricity denes not just a logical relation but a logical equivalencethat can be shown to be equivalent to contextual equivalence (as explained for instance by Harper[2016 Ch 48] or by Ahmed [2006])

When studying ILC logical equivalence between terms t1 and t2 (written (t1 t2) isin P nτ o)appears to be generalized by the existence of a valid change de between t1 and t2 (that is dt B t1 rarrt2 τ ) As in earlier chapters if terms e1 and e2 are equivalent any valid change dt between them isa nil change but validity allows describing other changes

1Bernardy and Lasson were not the rst to introduce such a transformation But most earlier work focuses on System Fand our presentation follows theirs and uses their added generality We refer for details to existing discussions of relatedwork [Wadler 2007 Bernardy and Lasson 2011]

164 Chapter 18 Towards dierentiation for System F

181 The parametricity transformationFirst we show a variant of their parametricity transformation adapted to a variant of STLC withoutbase types but with type variables Presenting λrarr using type variables will help when we comeback to System F and allows discussing parametricity on STLC This transformation is based on thepresentation of STLC as calculus λrarr a Pure Type System (PTS) [Barendregt 1992 Sec 52]

Background In PTSs terms and types form a single syntactic category but are distinguishedthrough an extended typing judgement (written Γ ` t t) using additional terms called sortsFunction types σ rarr τ are generalized by Π-type Πx s t where x can in general appear free in t ageneralization of Π-types in dependently-typed languages but also of universal types forallX T in theSystem F family if x does not appear free in t we write srarr t Typing restricts whether terms ofsome sort s1 can abstract over terms of sort s2 dierent PTSs allow dierent combinations of sortss1 and s2 (specied by relations R ) but lots of metatheory for PTSs is parameterized over the choiceof combinations In STLC presented as a PTS there is an additional sort those terms τ such that` τ are types2 We do not intend to give a full introduction to PTSs only to introduce whatrsquosstrictly needed for our presentation and refer the readers to existing presentations [Barendregt1992]

Instead of base types λrarr use uninterpreted type variables α but do not allow terms to bindthem Nevertheless we can write open terms free in type variables for say naturals and termvariables for operations on naturals STLC restricts Π-types Πx A B to the usual arrow typesArarr B through R one can show that in Πx A B variable x cannot occur free in B

Parametricity Bernardy and Lasson show how to transform a typed term Γ ` t τ in a stronglynormalizing PTS P into a proof that t satises a parametricity statement for τ The proof is in alogic represented by PTS P2 PTS P2 is constructed uniformly from P and is strongly normalizingwhenever P is When the base PTS P is λrarr Bernardy and Lassonrsquos transformation produces termsin a PTS λ2rarr produced by transforming λrarr PTS λ2rarr extends λrarr with a separate sort de ofpropositions together with enough abstraction power to abstract propositions over values Likeλrarr λ2rarr uses uninterpreted type variables α and does not allow abstracting over them

In parametricity statements about λrarr we write (t1 t2) isin P nτ o for a proposition (hence livingin de) that states that t1 and t2 are related This proposition is dened as usual via a logical relationWe write dxx for a proof that x1 and x2 are related For type variables α transformed terms abstractover an arbitrary relation R α between α1 and α2 When α is instantiated by τ R α can (but doesnot have to) be instantiated with relation (ndash ndash) isin P nτ o but R α abstracts over arbitrary candidaterelations (similar to the notion of reducibility candidates [Girard et al 1989 Ch 1411]) Allowingalternative instantiations makes parametricity statements more powerful and it is also necessary todene parametricity for impredicative type systems (like System F) in a predicative ambient logic3

A transformed term P n t o relates two executions of terms t in dierent environments it can beread as a proof that term t maps related inputs to related outputs The proof strategy that P n t ouses is the standard one for proving fundamental properties of logical relations but instead ofdoing a proof in the metalevel logic (by induction on terms and typing derivations) λ2rarr serves asan object-level logic and P n ndash o generates proof terms in this logic by recursion on terms At themetalevel Bernardy and Lasson Th 3 prove that for any well-typed term Γ ` t τ proof P n t o

2Calculus λrarr has also a sort such that ` but itself has no type Such a sort appears only in PTS typingderivations not in well-typed terms themselves so we do not discuss it further

3Specically if we require R α to be instantiated with the logical relation itself for τ when α is instantiated with τ thedenition becomes circular Consider the denition of (t1 t2) isin P n forallα τ o type variable α can be instantiated with thesame type forallα τ so the denition of (t1 t2) isin P n forallα τ o becomes impredicative

Chapter 18 Towards dierentiation for System F 165

shows that t satises the parametricity statement for type τ given the hypotheses P n Γ o (or moreformally P n Γ o ` P n t o (S1 n t o S2 n t o) isin P nτ o)

(f1 f2) isin P nσ rarr τ o = Π(x1 S1 nσ o) (x2 S2 nσ o) (dxx (x1 x2) isin P nσ o)(f1 x1 f2 x2) isin P nτ o(x1 x2) isin P nα o = R α x1 x2P n x o = dxxP n λ(x σ ) rarr t o =λ(x1 S1 nσ o) (x2 S2 nσ o) (dxx (x1 x2) isin P nσ o) rarr

P n t oP n s t o = P n s o S1 n s o S2 n s o P n t oP n ε o = εP n Γ x τ o = P n Γ o x1 S1 nτ o x2 S2 nτ o dxx (x1 x2) isin P nτ oP n Γα o = P n Γ o α1 α2 R α α1 rarr α2 rarr de

In the above S1 n s o and S2 n s o simply subscript all (term and type) variables in their argumentswith 1 and 2 to make them refer to the rst or second computation To wit the straightforwarddenition is

Si n x o = xiSi n λ(x σ ) rarr t o = λ(xi σ ) rarr Si n t oSi n s t o = Si n s o Si n t oSi nσ rarr τ o = Si nσ orarr Si nτ oSi nα o = αi

It might be unclear how the proof P n t o references the original term t Indeed it does notdo so explicitly But since in PTS β-equivalent types have the same members we can constructtyping judgement that mention t As mentioned from Γ ` t τ it follows that P n Γ o ` P n t o (S1 n t o S2 n t o) isin P nτ o [Bernardy and Lasson 2011 Th 3] This is best shown on a smallexampleExample 1811Take for instance an identity function id = λ(x α) rarr x which is typed in an open context (that isα ` λ(x α) rarr x) The transformation gives us

pid = P n id o = λ(x1 α1) (x2 α2) (dxx (x1 x2) isin P nα o) rarr dxx

which simply returns the proofs that inputs are related

α1 α2 R α α1 rarr α2 rarr de `

pid Π(x1 α1) (x2 α2) (x1 x2) isin P nα orarr (x1 x2) isin P nα o This typing judgement does not mention id But since x1 =β S1 n id o x1 and x2 = S2 n id o x2

we can also show that idrsquos outputs are related whenever the inputs are

pid Π(x1 α1) (x2 α2) (x1 x2) isin P nα orarr (S1 n id o x1 S2 n id o x2) isin P nα o More concisely we can show that

α1 α2 R α α1 rarr α2 rarr de ` pid (S1 n id o S2 n id o) isin P nα rarr α o

182 Dierentiation and parametricityWe obtain a close variant of dierentiation by altering the transformation for binary parametricityWe obtain a variant very close to the one investigated by Bernardy Jansson and Paterson [2010]Instead of only having proofs that values are related we can modify (t1 t2) isin P nτ o to be a typeof values mdash more precisely a dependent type (t1 t2) isin ∆2 nτ o of valid changes indexed by sourcet1 S1 nτ o and destination t2 S2 nτ o Similarly R α is replaced by a dependent type of changes notpropositions that we write ∆α Formally this is encoded by replacing sort de with in Bernardyand Lasson [2011]rsquos transform and by moving target programs in a PTS that allows for both simpleand dependent types like the Edinburgh Logical Framework such a PTS is known as λP [Barendregt1992]

For type variables α we specialize the transformation further ensuring that α1 = α2 = α(and adapting S1 n ndash o S2 n ndash o accordingly to not add subscripts to type variable) Without thisspecialization we get to deal with changes across dierent types which we donrsquot do just yet butdefer to Sec 184

We rst show how the transformation aects typing contexts

D n ε o = εD n Γ x σ o = D n Γ o x1 σ x2 σ dx (x1 x2) isin ∆2 nσ oD n Γα o = D n Γ o α ∆α α rarr α rarr

Unlike standard dierentiation transformed programs bind for each input variable x σ a sourcex1 σ a destination x2 σ and a valid change dx from x1 to x2 More in detail for each variable ininput programs x σ programs produced by standard dierentiation bind both x σ and dx σ valid use of these programs requires dx to be a valid change with source x but this is not enforcedthrough types Instead programs produced by this variant of dierentiation bind for each variablein input programs x σ both source x1 σ destination x2 σ and a valid change dx from x1 to x2where validity is enforced through dependent types

For input type variables α output programs also bind a dependent type of valid changes ∆α Next we dene the type of changes Since change types are indexed by source and destination

the type of function changes forces them to be valid and the denition of function changesresembles our earlier logical relation for validity More precisely if df is a change from f1 to f2(df (f1 f2) isin ∆2 nσ rarr τ o) then df must map any change dx from initial input x1 to updated inputx2 to a change from initial output f1 x1 to updated output f2 x2

(f1 f2) isin ∆2 nσ rarr τ o = Π(x1 x2 σ ) (dx (x1 x2) isin ∆2 nσ o)(f1 x1 f2 x2) isin ∆2 nτ o(x1 x2) isin ∆2 nα o = ∆α x1 x2

At this point the transformation on terms acts on abstraction and application to match thetransformation on typing contexts The denition of D n λ(x σ ) rarr t o follows the denitionof D n Γ x σ omdash both transformation results bind the same variables x1 x2 dx with the same typesApplication provides corresponding arguments

D n x o = dxD n λ(x σ ) rarr t o = λ(x1 x2 σ ) (dx (x1 x2) isin ∆2 nσ o) rarr D n t oD n s t o = D n s o S1 n s o S2 n s o D n t o

If we extend the language to support primitives and their manually-written derivatives it isuseful to have contexts also bind next to type variables α also change structures for α to allowterms to use change operations Since the dierentiation output does not use change operationshere we omit change structures for now

Remark 1821 (Validity and oplus)Because we omit change structures (here and through most of the chapter) the type of dierentiationonly suggests that D n t o is a valid change but not that validity agrees with oplus

In other words dependent types as used here do not prove all theorems expected of an incremen-tal transformation While we can prove the fundamental property of our logical relation we cannotprove that oplus agrees with dierentiation for abstract types As we did in Sec 134 (in particularPlugin Requirement 1341) we must require that validity relations provided for type variables agreewith implementations of oplus as provided for type variables formally we must state (internally) thatforall(x1 x2 α) (dx ∆α x1 x2) (x1 oplus dx x2) isin P nα o We can state this requirement by internalizinglogical equivalence (such as shown in Sec 181 or using Bernardy et al [2010]rsquos variant in λP ) Butsince this statement quanties over members of α it is not clear if it can be proven internally wheninstantiating α with a type argument We leave this question (admittedly crucial) for future work

This transformation is not incremental as it recomputes both source and destination for eachapplication but we could x this by replacing S2 n s o with S1 n s ooplusD n s o (and passing change struc-tures to make oplus available to terms) or by not passing destinations We ignore such complicationshere

183 Proving dierentiation correct externallyInstead of producing dependently-typed programs that show their correctness we might wantto produce simply-typed programs (which are easier to compile eciently) and use an externalcorrectness proof as in Chapter 12 We show how to generate such external proofs here Forsimplicity here we produce external correctness proofs for the transformation we just dened ondependently-typed outputs rather than dening a separate simply-typed transformation

Specically we show how to generate from well-typed terms t a proof that D n t o is correctthat is that (S1 n t o S2 n t o D n t o) isin ∆V nτ o

Proofs are generated through the following transformation dened in terms of the transforma-tion from Sec 182 Again we show rst typing contexts and the logical relation

DP n ε o = εDP n Γ x τ o = DP n Γ o x1 τ x2 τ

dx (x1 x2) isin ∆2 nτ o dxx (x1 x2 dx) isin ∆V nτ oDP n Γα o = DP n Γ o α

∆α α rarr α rarr R α Π(x1 α) (x2 α) (dx (x1 x2) isin ∆2 nα o) rarr de(f1 f2 df ) isin ∆V nσ rarr τ o =

Π(x1 x2 σ ) (dx (x1 x2) isin ∆2 nσ o) (dxx (x1 x2 dx) isin ∆V nσ o)(f1 x1 f2 x2 df x1 x2 dx) isin ∆V nτ o

(x1 x2 dx) isin ∆V nα o = R α x1 x2 dx

The transformation from terms to proofs then matches the denition of typing contexts

DP n x o = dxxDP n λ(x σ ) rarr t o =λ(x1 x2 σ ) (dx (x1 x2) isin ∆2 nσ o) (dxx (x1 x2 dx) isin ∆V nσ o) rarr

DP n t oDP n s t o = DP n s o S1 n s o S2 n s o D n t o DP n t o

This term produces a proof object in PTS λ2P which is produced by augmenting λP followingBernardy and Lasson [2011] The informal proof content of generated proofs follows the proof

of D n ndash orsquos correctness (Theorem 1222) For a variable x we just use the assumption dxx that(x1 dx x2) isin ∆V nτ o that we have in context For abstractions λx rarr t we have to show that D n t ois correct for all valid input changes x1 x2 dx and for all proofs dxx (x1 dx x2) isin ∆V nσ o that dxis indeed a valid input change so we bind all those variables in context including proof dxx and useDP n t o recursively to prove that D n t o is correct in the extended context For applications s t weuse the proof that D n s o is correct (obtained recursively via DP n s o) To show that D n s o D n t othat is D n s o S1 n s o S2 n s o D n t o we simply show that D n s o is being applied to valid inputsusing the proof that D n t o is correct (obtained recursively via DP n t o)

184 Combinators for polymorphic change structuresEarlier we restricted our transformation on λrarr terms so that there could be a change dt from t1 tot2 only if t1 and if t2 have the same type In this section we lift this restriction and dene polymorphicchange structures (also called change structures when no ambiguity arises) To do so we sketchhow to extend core change-structure operations to polymorphic change structures

We also show a few combinators combining existing polymorphic change structures into newones We believe the combinator types are more enlightening than their implementations but weinclude them here for completeness We already described a few constructions on non-polymorphicchange structures however polymorphic change structures enable new constructions that insert orremove data or apply isomorphisms to the source or destination type

We conjecture that combinators for polymorphic change structure can be used to compose outof small building blocks change structures that for instance allow inserting or removing elementsfrom a recursive datatype such as lists Derivatives for primitives could then be produced usingequational reasoning as described in Sec 112 However we leave investigation of such avenues tofuture work

We describe change operations for polymorphic change structures via a Haskell record contain-ing change operations and we dene combinators on polymorphic change structures Of all changeoperations we only consider oplus to avoid confusion we write for the polymorphic variant of oplus

data CS τ1 δτ τ2 = CS () τ1 rarr δτ rarr τ2

This code dene a record type constructor CS with a data constructor also written CS and a eldaccessor written () we use τ1 δτ and τ2 (and later also σ1 δσ and σ2) for Haskell type variablesTo follow Haskell lexical rules we use here lowercase letters (even though Greek ones)

We have not formalized denitions of validity or proofs that it agrees with but for all thechange structures and combinators in this section this exercise appears no harder than the ones inChapter 13

In Sec 111 and 172 change structures are embedded in Haskell using type class ChangeStruct τ Conversely here we do not dene a type class of polymorphic change structures because (apartfrom the simplest scenarios) Haskell type class resolution is unable to choose a canonical way toconstruct a polymorphic change structure using our combinators

All existing change structures (that is instances of ChangeStruct τ ) induce generalized changestructures CS τ (∆τ ) τ

typeCS ChangeStruct τ rArr CS τ (∆τ ) τtypeCS = CS (oplus)

We can also have change structures across dierent types Replacement changes are possible

replCS CS τ1 τ2 τ2replCS = CS $ λx1 x2 rarr x2

But replacement changes are not the only option For product types or for any form of nested datawe can apply changes to the dierent components changing the type of some components We canalso dene a change structure for nullary products (the unit type) which can be useful as a buildingblock in other change structures

prodCS CS σ1 δσ σ2 rarr CS τ1 δτ τ2 rarr CS (σ1τ1) (δσ δτ ) (σ2τ2)prodCS scs tcs = CS $ λ(s1 t1) (ds dt) rarr (() scs s1 ds () tcs t1 dt)unitCS CS () () ()unitCS = CS $ λ() () rarr ()

The ability to modify a eld to one of a dierent type is also known as in the Haskell commu-nity as polymorphic record update a feature that has proven desirable in the context of lens li-braries [OrsquoConnor 2012 Kmett 2012]

We can also dene a combinator sumCS for change structures on sum types similarly to ourearlier construction described in Sec 116 This time we choose to forbid changes across branchessince theyrsquore inecient though we could support them as well if desired

sumCS CS s1 ds s2 rarr CS t1 dt t2 rarrCS (Either s1 t1) (Either ds dt) (Either s2 t2)

sumCS scs tcs = CS gowherego (Le s1) (Le ds) = Le $ () scs s1 dsgo (Right t1) (Right dt) = Right $ () tcs t1 dtgo _ _ = error Invalid changes

Given two change structures from τ1 to τ2 with respective change types δτa and δτb we canalso dene a new change structure with change type Either δτa δτb that allows using changes fromeither structure We capture this construction through combinator mSumCS having the followingsignature

mSumCS CS τ1 δτa τ2 rarr CS τ1 δτb τ2 rarr CS τ1 (Either δτa δτb ) τ2mSumCS acs bcs = CS go

wherego t1 (Le dt) = () acs t1 dtgo t1 (Right dt) = () bcs t1 dt

This construction is possible for non-polymorphic change structures we only need change structuresto be rst-class (instead of a type class) to be able to internalize this construction in Haskell

Using combinator lInsCS we can describe updates going from type τ1 to type (σ τ2) assuminga change structure from τ1 to τ2 that is we can prepend a value of type σ to our data while wemodify it Similarly combinator lRemCS allows removing values

lInsCS CS t1 dt t2 rarr CS t1 (s dt) (s t2)lInsCS tcs = CS $ λt1 (s dt) rarr (s () tcs t1 dt)lRemCS CS t1 dt (s t2) rarr CS t1 dt t2lRemCS tcs = CS $ λt1 dt rarr snd $ () tcs t1 dt

We can also transform change structures given suitable conversion functions

lIsoCS (t1 rarr s1) rarr CS s1 dt t2 rarr CS t1 dt t2mIsoCS (dt rarr ds) rarr CS t1 ds t2 rarr CS t1 dt t2rIsoCS (s2 rarr t2) rarr CS t1 dt s2 rarr CS t1 dt t2isoCS (t1 rarr s1) rarr (dt rarr ds) rarr (s2 rarr t2) rarr CS s1 ds s2 rarr CS t1 dt t2

To do so we must only compose with the given conversion functions according to the typesCombinator isoCS arises by simply combining the other ones

lIsoCS f tcs = CS $ λs1 dt rarr () tcs (f s1) dtmIsoCS g tcs = CS $ λt1 dsrarr () tcs t1 (g ds)rIsoCS h tcs = CS $ λt1 dt rarr h $ () tcs t1 dtisoCS f g h scs = lIsoCS f $ mIsoCS g $ rIsoCS h scs

With a bit of datatype-generic programming infrastructure we can reobtain only using combi-nators the change structure for lists shown in Sec 1133 which allows modifying list elements Thecore denition is the following one

listMuChangeCS CS a1 da a2 rarr CS (Listmicro a1) (Listmicro da) (Listmicro a2)listMuChangeCS acs = go where

go = isoCS unRollL unRollL rollL $sumCS unitCS $ prodCS acs go

The needed infrastructure appears in Fig 181

-- Our list typedata List a = Nil | Cons a (List a)

-- Equivalently we can represent lists as a xpoint of a sum-of-product pattern functordata Mu f = Roll unRoll f (Mu f )data ListF a x = L unL Either () (a x)type Listmicro a = Mu (ListF a)rollL Either () (a Listmicro a) rarr Listmicro arollL = Roll LunRollL Listmicro ararr Either () (a Listmicro a)unRollL = unL unRoll

-- Isomorphism between List and Listmicro lToLMu List ararr Listmicro alToLMu Nil = rollL $ Le ()lToLMu (Cons a as) = rollL $ Right (a lToLMu as)lMuToL Listmicro ararr List alMuToL = go unRollL wherego (Le ()) = Nilgo (Right (a as)) = Cons a (lMuToL as)-- A change structure for List

listChangesCS CS a1 da a2 rarr CS (List a1) (List da) (List a2)listChangesCS acs = isoCS lToLMu lToLMu lMuToL $ listMuChangeCS acs

Figure 181 A change structure that allows modifying list elements

Section summary We have dened polymorphic change structures and shown they admit a richcombinator language Now we return to using these change structures for dierentiating λrarr andSystem F

185 Dierentiation for System FAfter introducing changes across dierent types we can also generalize dierentiation for λrarr sothat it allows for the now generalized changes

(f1 f2) isin ∆2 nσ rarr τ o = Π(x1 S1 nσ o) (x2 S2 nσ o) (dx (x1 x2) isin ∆2 nσ o)(f1 x1 f2 x2) isin ∆2 nτ o(x1 x2) isin ∆2 nα o = ∆α x1 x2D n x o = dxD n λ(x σ ) rarr t o = λ(x1 S1 nσ o) (x2 S2 nσ o) (dx (x1 x2) isin ∆2 nσ o) rarr D n t oD n s t o = D n s o S1 n s o S2 n s o D n t oD n ε o = εD n Γ x τ o = D n Γ o x1 S1 nτ o x2 S2 nτ o dx (x1 x2) isin ∆2 nτ oD n Γα o = D n Γ o α1 α2 ∆α α1 rarr α2 rarr

By adding a few additional rules we can extend dierentiation to System F (the PTS λ2) We chooseto present the additional rules using System F syntax rather than PTS syntax

(f1 f2) isin ∆2 n forallα T o =Π(α1 lowast) (α2 lowast) (∆α α1 rarr α2 rarr ) (f1 [α1 ] f2 [α2 ]) isin ∆2 n T o

D nΛα t o =λ(α1 α2 ) (∆α α1 rarr α2 rarr ) rarr D n t o

D n t [τ ] o = D n t o S1 nτ o S2 nτ o ((ndash ndash) isin ∆2 nτ o)Produced terms use a combination of System F and dependent types which is known as λP2 [Baren-dregt 1992] and is strongly normalizing This PTS corresponds to second-order (intuitionistic)predicate logic and is part of Barendregtrsquos lambda cube λP2 does not admit types depending ontypes (that is type-level functions) but admits all other forms of abstraction in the lambda cube(terms on terms like λrarr terms on types like System F and types on terms like LFλP )

Finally we sketch a transformation producing proofs that dierentiation is correct for System F

DP n ε o = εDP n Γ x τ o = DP n Γ o x1 S1 nτ o x2 S2 nτ o

dx (x1 x2) isin ∆2 nτ o dxx (x1 x2 dx) isin ∆V nτ oDP n Γα o = DP n Γ o α1 α2 ∆α α1 rarr α2 rarr R α Π(x1 α1) (x2 α2) (dx (x1 x2) isin ∆2 nα o) rarr de(f1 f2 df ) isin ∆V n forallα T o =

Π(α1 lowast) (α2 lowast) (∆α α1 rarr α2 rarr )(R α Π(x1 α1) (x2 α2) (dx (x1 x2) isin ∆2 nα o) rarr de)(f1 [α1 ] f2 [α2 ] df [α1 ] [α2 ] [∆α ]) isin ∆V n T o

(f1 f2 df ) isin ∆V nσ rarr τ o =Π(x1 S1 nσ o) (x2 S2 nσ o) (dx (x1 x2) isin ∆2 nσ o) (dxx (x1 x2 dx) isin ∆V nσ o)(f1 x1 f2 x2 df x1 x2 dx) isin ∆V nτ o

DP n x o = dxxDP n λ(x σ ) rarr t o =λ(x1 S1 nσ o) (x2 S2 nσ o) (dx (x1 x2) isin ∆2 nσ o)(dxx (x1 x2 dx) isin ∆V nσ o) rarrDP n t o

DP n s t o = DP n s o S1 n s o S2 n s o D n t o DP n t oDP nΛα t o =λ(α1 α2 ) (∆α α1 rarr α2 rarr )(R α Π(x1 α1) (x2 α2) (dx (x1 x2) isin ∆2 nα o) rarr de) rarrDP n t o

DP n t [τ ] o = DP n t o S1 nτ o S2 nτ o ((ndash ndash) isin ∆2 nτ o) ((ndash ndash ndash) isin ∆V nτ o)Produced terms live in λ2P2 the logic produced by extending λP2 following Bernardy and Lasson Avariant producing proofs for a non-dependently-typed dierentiation (as suggested earlier) wouldproduce proofs in λ22 the logic produced by extending λ2 following Bernardy and Lasson

186 Related workDependently-typed dierentiation for System F as given coincides with the parametricity transfor-mation for System F given by Bernardy Jansson and Paterson [2010 Sec 31] But our applicationis fundamentally dierent for known types Bernardy et al only consider identity relations whilewe can consider non-identity relations as long as we assume that is available for all types Whatis more Bernardy et al do not consider changes the update operator oplus or change structures acrossdierent types changes are replaced by proofs of relatedness with no computational signicanceFinally non-dependently-typed dierentiation (and its corectness proof) is novel here as it makeslimited sense in the context of parametricity even though it is a small variant of Bernardy et alrsquosparametricity transform

187 Prototype implementationWe have written a prototype implementation of the above rules for a PTS presentation of System Fand veried it on a few representative examples of System F terms We have built our implementationon top of an existing typechecker for PTSs4

Since our implementation is dened in terms of a generic PTS some of the rules presented aboveare unied and generalized in our implementation suggesting the transformation might generalizeto arbitrary PTSs In the absence of further evidence on the correctness of this generalization weleave a detailed investigation as future work

188 Chapter conclusionIn this chapter we have sketched how to dene and prove correct dierentiation following Bernardyand Lasson [2011]rsquos and Bernardy et al [2010]rsquos work on parametricity by code transformationWe give no formal correctness proof but proofs appear mostly an extension of their methods Animportant open point is Remark 1821 We have implemented and tested dierentiation in anexisting mature PTS implementation and veried it is type-correct on a few typical terms

We leave further investigation as future work4hpsgithubcomToxarispts by Tillmann Rendel based on van Benthem Jutting et al [1994]rsquos algorithm for PTS

typechecking

Chapter 19

Related work

Existing work on incremental computation can be divided into two groups Static incrementalizationand dynamic incrementalization Static approaches analyze a program statically and generate anincremental version of it Dynamic approaches create dynamic dependency graphs while theprogram runs and propagate changes along these graphs

The trade-o between the two is that static approaches have the potential to be faster becauseno dependency tracking at runtime is needed whereas dynamic approaches can support moreexpressive programming languages ILC is a static approach but compared to the other staticapproaches it supports more expressive languages

In the remainder of this section we analyze the relation to the most closely related prior worksRamalingam and Reps [1993] Gupta and Mumick [1999] and Acar et al [2006] discuss furtherrelated work Other related work more closely to cache-transfer style is discussed in Sec 176

191 Dynamic approachesOne of the most advanced dynamic approach to incrementalization is self-adjusting computationwhich has been applied to Standard ML and large subsets of C [Acar 2009 Hammer et al 2011]In this approach programs execute on the original input in an enhanced runtime environmentthat tracks the dependencies between values in a dynamic dependence graph [Acar et al 2006]intermediate results are memoized Later changes to the input propagate through dependencygraphs from changed inputs to results updating both intermediate and nal results this processingis often more ecient than recomputation

However creating dynamic dependence graphs imposes a large constant-factor overhead duringruntime ranging from 2 to 30 in reported experiments [Acar et al 2009 2010] and aecting theinitial run of the program on its base input Acar et al [2010] show how to support high-level datatypes in the context of self-adjusting computation however the approach still requires expensiveruntime bookkeeping during the initial run Our approach like other static ones uses a standardruntime environment and has no overhead during base computation but may be less ecient whenprocessing changes This pays o if the initial input is big compared to its changes

Chen et al [2011] have developed a static transformation for purely functional programs but thistransformation just provides a superior interface to use the runtime support with less boilerplateand does not reduce this performance overhead Hence it is still a dynamic approach unlike thetransformation this work presents

Another property of self-adjusting computation is that incrementalization is only ecient if theprogram has a suitable computation structure For instance a program folding the elements of a

bag with a left or right fold will not have ecient incremental behavior instead itrsquos necessary thatthe fold be shaped like a balanced tree In general incremental computations become ecient onlyif they are stable [Acar 2005] Hence one may need to massage the program to make it ecientOur methodology is dierent Since we do not aim to incrementalize arbitrary programs writtenin standard programming languages we can select primitives that have ecient derivatives andthereby require the programmer to use them

Functional reactive programming [Elliott and Hudak 1997] can also be seen as a dynamicapproach to incremental computation recent work by Maier and Odersky [2013] has focused onspeeding up reactions to input changes by making them incremental on collections Willis et al[2008] use dynamic techniques to incrementalize JQL queries

192 Static approachesStatic approaches analyze a program at compile-time and produce an incremental version thateciently updates the output of the original program according to changing inputs

Static approaches have the potential to be more ecient than dynamic approaches becauseno bookkeeping at runtime is required Also the computed incremental versions can often beoptimized using standard compiler techniques such as constant folding or inlining However noneof them support rst-class functions some approaches have further restrictions

Our aim is to apply static incrementalization to more expressive languages in particular ILCsupports rst-class functions and an open set of base types with associated primitive operations

Sundaresh and Hudak [1991] propose to incrementalize programs using partial evaluation givena partitioning of program inputs in parts that change and parts that stay constant Sundaresh andHudak partially evaluates a given program relative to the constant input parts and combine theresult with the changing inputs

1921 Finite dierencingPaige and Koenig [1982] present derivatives for a rst-order language with a xed set of primitivesBlakeley et al [1986] apply these ideas to a class of relational queries The database communityextended this work to queries on relational data such as in algebraic dierencing [Gupta andMumick 1999] which inspired our work and terminology However most of this work does notapply to nested collections or algebraic data types but only to relational (at) data and no previousapproach handles rst-class functions or programs resulting from defunctionalization or closureconversion Incremental support is typically designed monolithically for a whole language ratherthan piecewise Improving on algebraic dierencing DBToaster (Koch [2010] Koch et al [2014])guarantees asymptotic speedups with a compositional query transformation and delivers hugespeedup in realistic benchmarks though still for a rst-order database language

More general (non-relational) data types are considered in the work by Gluche et al [1997]our support for bags and the use of groups is inspired by their work but their architecture isstill rather restrictive they lack support for function changes and restrict incrementalization toself-maintainable views without hinting at a possible solution

It seems possible to transform higher-order functional programs to database queries using avariety of approaches [Grust et al 2009 Cheney et al 2013] some of which support rst-classfunctions via closure conversion [Grust and Ulrich 2013 Grust et al 2013] and incrementalize theresulting programs using standard database technology Such a solution would inherit limitations ofdatabase incrementalization approaches in particular it appears that database incrementalizationapproaches such as DBToaster can handle the insertion and removal of entire table rows not ofsmaller changes Nevertheless such an alternative approach might be worth investigating

Unlike later approaches to higher-order dierentiation we do not restrict our base types togroups unlike Koch et al [2016] and transformed programs we produce do not require furtherruntime transformation unlike Koch et al [2016] and Huesca [2015] as we discuss further next

1922 λndashdi and partial dierentialsIn work concurrent with Cai et al [2014] Huesca [2015] introduces an alternative formalism calledλndashdi for incremental computation by program transformation While λndashdi has some appealingfeatures it currently appears to require program transformations at runtime Other systems appearto share this feature [Koch et al 2016] Hence this section discusses the reason in some detail

Instead of dierentiating a term t relative to all inputs (free variables and function arguments)via D n t o like ILC λndashdi dierentiates terms relative to one input variable and writes parttpartx dx forthe result of dierentiating t relative to x a term that computes the change in t when the value forx is updated by change dx The formalism also uses pointwise function changes similarly to whatwe discussed in Sec 153

Unfortunately it is not known how to dene such a transformation for a λ-calculus with astandard semantics and the needed semantics appear to require runtime term transformationswhich are usually considered problematic when implementing functional languages In particularit appears necessary to introduce a new term constructor D t which evaluates t to a function valueλy rarr u and then evaluates to λ(ydy ) rarr parttparty dy which dierentiates t at runtime relative to itshead variable y As an indirect consequence if the program under incrementalization containsfunction term Γ ` t σ rarr τ where Γ contains n free variables it can be necessary in the worst-caseto dierentiate t over any subset of the n free variables in Γ There are 2n such subsets To performall term transformations before runtime it seems hence necessary in the worst case to precompute2n partial derivatives for each function term t which appears unfeasible On the other hand itis not clear how often this worst-case is realized or how big n grows in typical programs or if itis simply feasible to perform dierentiation at runtime similarly to JIT compilation Overall anecient implementation of λndashdi remains an open problem It appears also Koch et al [2016] suersimilar problems but a few details appear simpler since they restrict focus to functions over groups

To see why λndashdi need introduce D t consider dierentiating parts tpartx dx that is the change d ofs t when xx is updated by change dx Change d depends (a) on the change of t when x is updatedby dx that is parttpartx dx (b) on how s changes when its input t is updated by parttpartx dx to express thischange λndashdi expresses this via (D s) t parttpartx dx (c) on the change of s (applied to the updated t )when x is updated by dx that is parttpartx dx To compute component (b) λndashdi writes D s to dierentiates not relative to x but relative to the still unknown head variable of s If s evaluates to λy rarr u theny is the head variable of s and D s dierentiates u relative to y and evaluates to λ(ydy ) rarr partuparty dy

Overall the rule for dierentiating application in λ-di is

parts t

partx dx= (Ds)

(t partt

partx dx

)parts

partx dx

(t oplus

partx dx

This rule appears closely related to Eq (151) hence we refer to its discussion for claricationOn the other hand dierentiating a term relative to all its inputs introduces a dierent sort

of overhead For instance it is much more ecient to dierentiate map f xs relative to xs thanrelative to f if f undergoes a non-nil change df D nmap f xs o must apply df to each elements inthe updated input xs Therefore in our practical implementations D nmap f xs o tests whether dfis nil and uses a more ecient implementation In Chapter 16 we detect at compile time whetherdf is guaranteed to be nil In Sec 1742 we instead detect at runtime whether df is nil In bothcases authors of derivatives must implement this optimization by hand Instead λndashdi hints at amore general solution

1923 Static memoizationLiursquos work [Liu 2000] allows to incrementalize a rst-order base program f (xold) to compute f (xnew)knowing how xnew is related to xold To this end they transform f (xnew) into an incremental programwhich reuses the intermediate results produced while computing f (xold) the base program Tothis end (i) rst the base program is transformed to save all its intermediate results then (ii) theincremental program is transformed to reuse those intermediate results and nally (iii) intermediateresults which are not needed are pruned from the base program However to reuse intermediateresults the incremental program must often be rearranged using some form of equational reasoninginto some equivalent program where partial results appear literally For instance if the base programf uses a left fold to sum the elements of a list of integers xold accessing them from the head onwardsand xnew prepends a new element h to the list at no point does f (xnew) recompute the same resultsBut since addition is commutative on integers we can rewrite f (xnew) as f (xold) + h The authorrsquosCACHET system will try to perform such rewritings automatically but it is not guaranteed tosucceed Similarly CACHET will try to synthesize any additional results which can be computedcheaply by the base program to help make the incremental program more ecient

Since it is hard to fully automate such reasoning we move equational reasoning to the plugindesign phase A plugin provides general-purpose higher-order primitives for which the pluginauthors have devised ecient derivatives (by using equational reasoning in the design phase) Thenthe dierentiation algorithm computes incremental versions of user programs without requiringfurther user intervention It would be useful to combine ILC with some form of static caching tomake the computation of derivatives which are not self-maintainable more ecient We plan to doso in future work

Chapter 20

Conclusion and future work

In databases standard nite dierencing technology allows incrementalizing programs in a specicdomain-specic rst-order language of collection operations Unlike other approaches nitedierencing transforms programs to programs which can in turn be transformed further

Dierentiation (Chapter 10) transforms higher-order programs to incremental ones calledderivatives as long as one can provide incremental support for primitive types and operations

Case studies in this thesis consider support for several such primitives We rst study a settingrestricted to self-maintainable derivatives (Chapter 16) Then we study a more general settingwhere it is possible to remember inputs and intermediate results from one execution to the nextthanks to a transformation to cache-transfer style (Chapter 17) In all cases we are able to deliverorder-of-magnitude speedups on non-trivial examples moreover incrementalization producesprograms in standard languages that are subject to further optimization and code transformation

Correctness of incrementalization appeared initially surprising so we devoted signicant eortto formal correctness proofs either on paper or in mechanized proof assistants The originalcorrectness proof of dierentiation using a set-theoretic denotational semantics [Cai et al 2014]was a signicant milestone but since then we have simplied the proof to a logical relations proofthat denes when a change is valid from a source to a destination and proves that dierentiationproduces valid changes (Chapter 12) Valid changes witness the dierence between sources anddestinations since changes can be nil change validity arises as a generalization of the concept oflogical equivalence and parametricity for a language (at least in terminating languages) (Chapter 18)Crucially changes are not just witnesses operation oplus takes a change and its source to the changedestination One can consider further operations that give rise to an algebraic structure that wecall change structure (Chapter 13)

Based on this simplied proof in this thesis we generalize correctness to further languages usingbig-step operational semantics and (step-indexed) logical relations (Appendix C) Using operationalsemantics we reprove correctness for simply-typed λ-calculus then add support for recursivefunctions (which would require domain-theoretic methods when using denotational semantics)and nally extend the proof to untyped λ-calculus Building on a variant of this proof (Sec 1733)we show that conversion to cache-transfer-style also preserves correctness

Based on a dierent formalism for logical equivalence and parametricity [Bernardy and Lasson2011] we sketch variants of the transformation and correctness proofs for simply-typed λ-calculuswith type variables and then for full System F (Chapter 18)

Future work To extend dierentiation to System F we must consider changes where source anddestination have dierent types This generalization of changes makes change operations much

178 Chapter 20 Conclusion and future work

more exible it becomes possible to dene a combinator language for change structures and itappears possible to dene nontrivial change structures for algebraic datatypes using this combinatorlanguage

We conjecture such a combinator language will allow programmers to dene correct changestructures out of simple reusable components

Incrementalizing primitives correctly remains at present a signicant challenge We providetools to support this task by formalizing equational reasoning but it appears necessary to providemore tools to programmers as done by Liu [2000] Conceivably it might be possible to build such arewriting tool on top of the rewriting and automation support of theorem provers such as Coq

Appendixes

Appendix A

Preliminaries syntax andsemantics of simply-typedλ-calculus

To discuss how we incrementalize programs and prove that our incrementalization technique givescorrect results we specify which foundation we use for our proofs and what object language westudy throughout most of Part II

We mechanize our correctness proof using Agda hence we use Agdarsquos underlying type theoryas our foundation We discuss what this means in Appendix A1

Our object language is a standard simply-typed λ-calculus (STLC) [Pierce 2002 Ch 9] param-eterized over base types and constants We term the set of base types and constants a languageplugin (see Appendix A25) In our examples we assume that the language plugins supports neededbase types and constants Later (eg in Chapter 12) we add further requirements to languageplugins to support incrementalization of the language features they add to our STLC Rather thanoperational semantics we use a denotational semantics which is however set-theoretic rather thandomain-theoretic Our object language and its semantics are summarized in Fig A1

At this point readers might want to skip to Chapter 10 right away or focus on denotationalsemantics and refer to this section as needed

A1 Our proof meta-languageIn this section we describe the logic (or meta-language) used in our mechanized correctness proof

First as usual we distinguish between ldquoformalizationrdquo (that is on-paper formalized proofs) andldquomechanizationrdquo (that is proofs encoded in the language of a proof assistant for computer-aidedmechanized verication)

To prove the correctness of ILC we provide a mechanized proof in Agda [Agda DevelopmentTeam 2013] Agda implements intensional Martin-Loumlf type theory (from now on simply typetheory) so type theory is also the foundation of our proofs

At times we use conventional set-theoretic language to discuss our proofs but the dierencesare only supercial For instance we might talk about a set of elements S and with elements suchas s isin S though we always mean that S is a metalanguage type that s is a metalanguage value andthat s S Talking about sets avoids ambiguity between types of our meta-language and types ofour object-language (that we discuss next in Appendix A2)

182 Chapter A Preliminaries

Notation A11Wersquoll let uppercase latin letters ABC V U range over sets never over types

We do not prove correctness of all our language plugins However in earlier work [Cai et al2014] we prove correctness for a language plugin supporting bags a type of collection (described inSec 102) For that proof we extend our logic by postulating a few standard axioms on the imple-mentation of bags to avoid proving correct an implementation of bags or needing to account fordierent values representing the same bag (such dierent values typically arise when implementingbags as search trees)

A11 Type theory versus set theory

Here we summarize a few features of type theory over set theoryType theory is dependently typed so it generalizes function type Ararr B to dependent function

type (x A) rarr B where x can appear free in B Such a type guarantees that if we apply a functionf (x A) rarr B to an argument a A the result has type B [x = a] that is B where x is substitutedby a At times we will use dependent types in our presentation

Moreover by using type theory

bull We do not postulate the law of excluded middle that is our logic is constructive

bull Unlike set theory type theory is proof-relevant that is proofs are rst-class mathematicalobjects

bull Instead of subsets x isin A | P(x) we must use Σ-types Σ(x A)P(x) which contain pairs ofelements x and proofs they satisfy predicate P

bull In set theory we can assume without further ado functional extensionality that is thatfunctions that give equal results on all equal inputs are equal themselves Intuitionistic typetheory does not prove functional extensionality so we need to add it as a postulate In Agdathis postulate is known to be consistent [Hofmann 1996] hence it is safe to assume1

To handle binding issues in our object language our formalization uses typed de Bruijn indexesbecause this techniques takes advantage of Agdarsquos support for type renement in pattern matchingOn top of that we implement a HOAS-like frontend which we use for writing specic terms

A2 Simply-typed λ-calculusWe consider as object language a strongly-normalizing simply-typed λ-calculus (STLC) We chooseSTLC as it is the simplest language with rst-class functions and types while being a sucientmodel of realistic total languages2 We recall the syntax and typing rules of STLC in Figs A1aand A1b together with metavariables we use Language plugins dene base types ι and constants cTypes can be base types ι or function types σ rarr τ Terms can be constants c variables x functionapplications t1 t2 or λ-abstractions λ(x σ ) rarr t To describe assumptions on variable types whentyping terms we dene (typing) contexts Γ as being either empty ε or as context extensions Γ x τ which extend context Γ by asserting variable x has type τ Typing is dened through a judgment

1hppermalinkgmaneorggmanecomplangagda23432To know why we restrict to total languages see Sec 151

Chapter A Preliminaries 183

ι = (base types)σ τ = ι | τ rarr τ (types)

Γ = ε | Γx τ (typing contexts)c = (constants)

s t = c | λ(x σ ) rarr t | t t | x (terms)

(a) Syntax

`C c τΓ ` c τ

ConstΓ1x τ Γ2 ` x τ

Lookup Γ ` t τ Γx σ ` t τΓ ` λ(x σ ) rarr t σ rarr τ

Γ ` s σ rarr τ Γ ` t σΓ ` s t τ

(b) Typing

n ι o = nσ rarr τ o = nσ orarr nτ o

(c) Values

n ε o = n Γx τ o = (ρx = v) | ρ isin n Γ o andv isin nτ o

(d) Environments

n c o ρ = n c oC

n λ(x σ ) rarr t o ρ = λ(v nσ o) rarr n t o (ρx = v)n s t o ρ = (n s o ρ) (n t o ρ)

nx o ρ = lookup x in ρ

(e) Denotational semantics

Figure A1 Standard denitions for the simply-typed lambda calculus

Γ ` t τ stating that term t under context Γ has type τ 3 For a proper introduction to STLC we referthe reader to Pierce [2002 Ch 9] We will assume signicant familiarity with it

An extensible syntax of types In fact the denition of base types can be mutually recursivewith the denition of types So a language plugin might add as base types for instance collectionsof elements of type τ products and sums of type σ and type τ and so on However this mutualrecursion must satisfy a few technical restrictions to avoid introducing subtle inconsistencies andAgda cannot enforce these restrictions across modules Hence if we dene language plugins asseparate modules in our mechanization we need to verify by hand that such restrictions are satised(which they are) See Appendix B1 for the gory details

Notation A21We typically omit type annotations on λ-abstractions that is we write λx rarr t rather than λ(x σ ) rarr t Such type annotations can often be inferred from context (or type inference) Nevertheless

3We only formalize typed terms not untyped ones so that each term has a unique type That is in the relevant jargon weuse Church-style typing as opposed to Curry-style typing Alternatively we use an intrinsically-typed term representationIn fact arguably we mechanize at once both well-typed terms and their typing derivations This is even more clear in ourmechanization see discussion in Appendix A24

whenever we discuss terms of shape λx rarr t wersquore in fact discussing λ(x σ ) rarr t for some arbitraryσ We write λx rarr t instead of λx t for consistency with the notation we use later for Haskellprograms

We often omit ε from typing contexts with some assumptions For instance we write x τ1 y τ2instead of ε x τ1 y τ2

We overload symbols (often without warning) when they can be disambiguated from contextespecially when we can teach modern programming languages to disambiguate such overloadingsFor instance we reuserarr for lambda abstractions λx rarr t function spaces Ararr B and functiontypes σ rarr τ even though the rst is the separator

Extensions In our examples we will use some unproblematic syntactic sugar over STLC includinglet expressions global denitions type inference and we will use a Haskell-like concrete syntax Inparticular when giving type signatures or type annotations in Haskell snippets we will use toseparate terms or variables from their types rather than as in λ-calculus To avoid confusion wenever use to denote the constructor for Haskell lists

At times our concrete examples will use Hindley-Milner (prenex) polymorphism but this is alsonot such a signicant extension A top-level denition using prenex polymorphism that is of typeforallα τ (where α is free in τ ) can be taken as sugar for a metalevel family of object-level programsindexed by type argument τ1 of denitions of type τ [α = τ1 ] We use this trick without explicitmention in our rst implementation of incrementalization in Scala [Cai et al 2014]

A21 Denotational semantics for STLCTo prove that incrementalization preserves the semantics of our object-language programs wedene a semantics for STLC We use a naive set-theoretic denotational semantics Since STLC isstrongly normalizing [Pierce 2002 Ch 12] its semantics need not handle partiality Hence wecan use denotational semantics but eschew using domain theory and simply use sets from themetalanguage (see Appendix A1) Likewise we can use normal functions as domains for functiontypes

We rst associate to every type τ a set of values nτ o so that terms of a type τ evaluate to valuesin nτ o We call set nτ o a domain Domains associated to types τ depend on domain associated tobase types ι that must be specied by language plugins (Plugin Requirement A210)Denition A22 (Domains and values)The domain nτ o of a type τ is dened as in Fig A1c A value is a member of a domain

We let metavariables u v a b range over members of domains we tend to use v forgeneric values and a for values we use as a function argument We also let metavariable f g range over values in the domain for a function type At times we might also use metavariablesf g to range over terms of function types the context will clarify what is intended

Given this domain construction we can now dene a denotational semantics for terms Theplugin has to provide the evaluation function for constants In general the evaluation function n t ocomputes the value of a well-typed term t given the values of all free variables in t The values ofthe free variables are provided in an environmentDenition A23 (Environments)An environment ρ assigns values to the names of free variables

ρ = ε | ρx = v

We write n Γ o for the set of environments that assign values to the names bound in Γ (seeFig A1d)

Notation We often omit from environments with some assignments For instance we writex = v1 y = v2 instead of x = v1 y = v2Denition A24 (Evaluation)Given Γ ` t τ the meaning of t is dened by the function n t o of type n Γ orarr nτ o in Fig A1e

This is a standard denotational semantics of the simply-typed λ-calculusFor each constant c τ the plugin provides n c oC nτ o the semantics of c (by Plugin Require-

ment A211) since constants donrsquot contain free variables n c oC does not depend on an environmentWe dene a program equivalence across terms of the same type t1 t2 to mean n t1 o = n t2 o

Remark A26Beware that denotational equivalence cannot always be strengthened by dropping unused variablesthat is Γ x σ t1 t2 τ does not imply Γ t1 t2 τ even if x does not occur free ineither t1 or t2 Counterexamples rely on σ being an empty type For instance we cannot weakenx 0τ 0 1 Z (where 0τ is an empty type) this equality is only true vacuously because thereexists no environment for context x 0τ

A22 WeakeningWhile we donrsquot discuss our formalization of variables in full in this subsection we discuss brieyweakening on STLC terms and state as a lemma that weakening preserves meaning This lemma isneeded in a key proof the one of Theorem 1222

As usual if a term t is well-typed in a given context Γ1 and context Γ2 extends Γ1 (which wewrite as Γ1 Γ2) then t is also well-typed in Γ2Lemma A27 (Weakening is admissible)The following typing rule is admissible

Γ1 ` t τ Γ1 Γ2

Γ2 ` t τWeaken

Weakening also preserves semantics If a term t is typed in context Γ1 evaluating it requires anenvironment matching Γ1 So if we weaken t to a bigger context Γ2 evaluation requires an extendedenvironment matching Γ2 and is going to produce the same resultLemma A28 (Weakening preserves meaning)Take Γ1 ` t τ and ρ1 n Γ1 o If Γ1 Γ2 and ρ2 n Γ2 o extends ρ1 then we have that

n t o ρ1 = n t o ρ2

Mechanize these statements and their proofs requires some care We have a meta-level typeTerm Γ τ of object terms having type τ in context Γ Evaluation has type n ndash o Term Γ τ rarrn Γ o rarr nτ o so n t o ρ1 = n t o ρ2 is not directly ill-typed To remedy this we dene formallythe subcontext relation Γ1 Γ2 and an explicit operation that weakens a term in context Γ1 to acorresponding term in bigger context Γ2 weaken Γ1 Γ2 rarr Term Γ1 τ rarr Term Γ2 τ We denethe subcontext relation Γ1 Γ2 as a judgment using order preserving embeddings4 We refer to ourmechanized proof for details including auxiliary denitions and relevant lemmas

4As mentioned by James Chapman at hpslistschalmerssepipermailagda2011003423html who attributes them toConor McBride

A23 SubstitutionSome facts can be presented using (capture-avoiding) substitution rather than environments andwe do so at some points so let us x notation We write t [x = s ] for the result of substitutingvariable x in term t by term s

We have mostly avoided mechanizing proofs about substitution but we have mechanizedsubstitution following Keller and Altenkirch [2010] and proved the following substitution lemmaLemma A29 (Substitution lemma)For any term Γ ` t τ variable x σ bound in Γ we write Γ minus x for the result of removing variable xfrom Γ (as dened by Keller and Altenkirch) Take term Γ minus x ` s σ and environment ρ n Γ minus x oThen we have that substitution and evaluation commute as follows

n t [x = s ] o ρ = n t o (ρ x = n s o ρ)

A24 Discussion Our mechanization and semantic styleTo formalize meaning of our programs we use denotational semantics while nowadays most preferoperational semantics in particular small-step Hence we next justify our choice and discuss relatedwork

We expect we could use other semantics techniques such as big-step or small-step semanticsBut at least for such a simple object language working with denotational semantics as we use it iseasier than other approaches in a proof assistant especially in Agda

bull Our semantics n ndash o is a function and not a relation like in small-step or big-step semantics

bull It is clear to Agda that our semantics is a total function since it is structurally recursive

bull Agda can normalize n ndash o on partially-known terms when normalizing goals

bull The meanings of our programs are well-behaved Agda functions not syntax so we knowldquowhat they meanrdquo and need not prove any lemmas about it We need not prove say thatevaluation is deterministic

In Agda the domains for our denotational semantics are simply Agda types and semantic valuesare Agda values mdash in other words we give a denotational semantics in terms of type theory Usingdenotational semantics allows us to state the specication of dierentiation directly in the semanticdomain and take advantage of Agdarsquos support for equational reasoning for proving equalitiesbetween Agda functions

Related work Our variant is used for instance by McBride [2010] who attribute it to Augustssonand Carlsson [1999] and Altenkirch and Reus [1999] In particular Altenkirch and Reus [1999]already dene our type Term Γ τ of simply-typed λ-terms t well-typed with type τ in context Γwhile Augustsson and Carlsson [1999] dene semantic domains by induction over types Bentonet al [2012] and Allais et al [2017] also discuss this approach to formalizing λ terms and discusshow to best prove various lemmas needed to reason for instance about substitution

More in general similar approaches are becoming more common when using proof assistantsOur denotational semantics could be otherwise called a denitional interpreter (which is in particularcompositional) and mechanized formalizations using a variety of denitional interpreters arenowadays often advocated either using denotational semantics [Chlipala 2008] or using functionalbig-step semantics Functional semantics are so convenient that their use has been advocated evenfor languages that are not strongly normalizing [Owens et al 2016 Amin and Rompf 2017] evenat the cost of dealing with step-indexes

A25 Language pluginsOur object language is parameterized by language plugins (or just plugins) that encapsulate itsdomain-specic aspects

In our examples our language plugin will typically support integers and primitive operationson them However our plugin will also support various sorts collections and base operations onthem Our rst example of collection will be bags Bags are unordered collections (like sets) whereelements are allowed to appear more than once (unlike in sets) and they are also called multisets

Our formalization is parameterized over one language plugin providing all base types andprimitives In fact we expect a language plugin to be composed out of multiple language pluginsmerged together [Erdweg et al 2012] Our mechanization is mostly parameterized over languageplugins but see Appendix B1 for discussion of a few limitations

The sets of base types and primitive constants as well as the types for primitive constants areon purpose left unspecied and only dened by plugins mdash they are extensions points We writesome extension points using ellipses (ldquo rdquo) and other ones by creating names which typically useC as a superscript

A plugin denes a set of base types ι primitives c and their denotational semantics n c oC Asusual we require that n c oC nτ o whenever c τ

Summary To sum up the discussion of plugins we collect formally the plugin requirements wehave mentioned in this chapter

Plugin Requirement A210 (Base types)There is a set of base types ι and for each there is a domain n ι o

Plugin Requirement A211 (Constants)There is a set of constants c To each constant is associated a type τ such that the constant has thattype that is `C c τ and the constantsrsquo semantics matches that type that is n c oC nτ o

After discussing the metalanguage of our proofs the object language we study and its semanticswe begin discussing incrementalization in next chapter

Appendix B

More on our formalization

B1 Mechanizing plugins modularly and limitationsNext we discuss our mechanization of language plugins in Agda and its limitations For theconcerned reader we can say these limitations aect essentially how modular our proofs are andnot their cogency

In essence itrsquos not possible to express in Agda the correct interface for language plugins sosome parts of language plugins canrsquot be modularized as desirable Alternatively we can mechanizeany xed language plugin together with the main formalization which is not truly modular ormechanize a core formulation parameterized on a language plugin which that runs into a fewlimitations or encode plugins so they can be modularized and deal with encoding overhead

This section requires some Agda knowledge not provided here but we hope that readers familiarwith both Haskell and Coq will be able to follow along

Our mechanization is divided into multiple Agda modules Most modules have denitions thatdepend on language plugins directly or indirectly Those take denitions from language plugins asmodule parameters

For instance STLC object types are formalized through Agda type Type dened in moduleParametricSyntax Type The latter is parameterized over Base the type of base types

For instance Base can support a base type of integers and a base type of bags of elements oftype ι (where ι Base) Simplifying a few details our denition is equivalent to the following Agdacode

module ParametricSyntax Type (Base Set) wheredata Type Set where

base (ι Base) rarr Type_rArr_ (σ Type) rarr (τ Type) rarr Type

-- Elsewhere in plugindata Base Set wherebaseInt BasebaseBag (ι Base) rarr Base--

But with these denitions we only have bags of elements of base type If ι is a base typebase (baseBag ι) is the type of bags with elements of type base ι Hence we have bags of elementsof base type But we donrsquot have a way to encode Bag τ if τ is an arbitrary non-base type such as

190 Chapter B More on our formalization

base baseInt rArr base baseInt (the encoding of object type Zrarr Z) Can we do better If we ignoremodularization we can dene types through the following mutually recursive datatypes

mutualdata Type Set wherebase (ι Base Type) rarr Type_rArr_ (σ Type) rarr (τ Type) rarr Type

data Base Set wherebaseInt BasebaseBag (ι Type) rarr Base

So far so good but these types have to be dened together We can go a step further by dening

data Base (Type Set) Set wherebaseInt Base TypebaseBag (ι Type) rarr Base Type

Here Base takes the type of object types as a parameter and Type uses Base Type to tie therecursion This denition still works but only as long as Base and Type are dened together

If we try to separate the denitions of Base and Type into dierent modules we run into trouble

module ParametricSyntax Type (Base Set rarr Set) wheredata Type Set wherebase (ι Base Type) rarr Type_rArr_ (σ Type) rarr (τ Type) rarr Type

-- Elsewhere in plugindata Base (Type Set) Set wherebaseInt Base TypebaseBag (ι Type) rarr Base Type

Here Type is dened for an arbitrary function on types Base Set rarr Set However thisdenition is rejected by Agdarsquos positivity checker Like Coq Agda forbids dening datatypes thatare not strictly positive as they can introduce inconsistencies

The above denition of Type is not strictly positive because we could pass to it as argumentBase = λτ rarr (τ rarr τ ) so that Base Type = Typerarr Type making Type occur in a negative positionHowever the actual uses of Base we are interested in are ne The problem is that we cannot informthe positivity checker that Base is supposed to be a strictly positive type function because Agdadoesnrsquot supply the needed expressivity

This problem is well-known It could be solved if Agda function spaces supported positivityannotations1 or by encoding a universe of strictly-positive type constructors This encoding isnot fully transparent and adds hard-to-hide noise to development [Schwaab and Siek 2013]2 Fewalternatives remain

bull We can forbid types from occurring in base types as we did in our original paper [Cai et al2014] There we did not discuss at all a recursive denition of base types

1As discussed in hpsgithubcomagdaagdaissues570 and hpsgithubcomagdaagdaissues24112Using pattern synonyms and DISPLAY pragmas might successfully hide this noise

Chapter B More on our formalization 191

bull We can use the modular mechanization disable positivity checking and risk introducinginconsistencies We tried this successfully in a branch of that formalization We believe wedid not introduce inconsistencies in that branch but have no hard evidence

bull We can simply combine the core formalization with a sample plugin This is not trulymodular because the core modularization can only be reused by copy-and-paste Moreover independently-typed languages the content of a denition can aect whether later denitionstypecheck so alternative plugins using the same interface might not typecheck3

Sadly giving up on modularity altogether appears the more conventional choice Either way aswe claimed at the outset these modularity concerns only limit the modularity of the mechanizedproofs not their cogency

3Indeed if Base were not strictly positive the application Type Base would be ill-typed as it would fail positivity checkingeven though Base Set rarr Set does not require Base to be strictly positive

Appendix C

(Un)typed ILC operationally

In Chapter 12 we have proved ILC correct for a simply-typed λ-calculus What about other languageswith more expressive type systems or no type system at all

In this chapter we prove that ILC is still correct in untyped call-by-value (CBV) λ-calculusWe do so without using denotational semantics but using only an environment-based big-stepoperational semantics and step-indexed logical relations The formal development in this chapterstands alone from the rest of the thesis though we do not repeat ideas present elsewhere

We prove ILC correct using in increasing order of complexity

1 STLC and standard syntactic logical relations

2 STLC and step-indexed logical relations

3 an untyped λ-calculus and step-indexed logical relations

We have fully mechanized the second proof in Agda1 and done the others on paper In all cases weprove the fundamental property for validity we detail later which corollaries we prove in which caseThe proof for untyped λ-calculus is the most interesting but the others can serve as stepping stonesYann Reacutegis-Gianas in collaboration with me recently mechanized a similar proof for untypedλ-calculus in Coq which appears in Sec 1732

Using operational semantics and step-indexed logical relations simplies extending the proofsto more expressive languages where denotational semantics or other forms of logical relationswould require more sophistication as argued by Ahmed [2006]

Proofs by (step-indexed) logical relations also promise to be scalable All these proofs appear tobe slight variants of proof techniques for logical program equivalence and parametricity which arewell-studied topics suggesting the approach might scale to more expressive type systems Hencewe believe these proofs clarify the relation with parametricity that has been noticed earlier [Atkey2015] However actually proving ILC correct for a polymorphic language (such as System F) is leftas future work

We also expect that from our logical relations one might derive a logical partial equivalencerelation among changes similarly to Sec 142 but we leave a development for future work

Compared to earlier chapters this one will be more technical and concise because we alreadyintroduced the ideas behind both ILC and logical relation proofs

1Source code available in this GitHub repo hpsgithubcominc-lcilc-agda2Mechanizing the proof for untyped λ-calculus is harder for purely technical reasons mechanizing well-founded

induction in Agda is harder than mechanizing structural induction

194 Chapter C (Un)typed ILC operationally

Binding and environments On the technical side we are able to mechanize our proof withoutneeding any technical lemmas about binding or weakening thanks to a number of choices wemention later

Among other reasons we avoid lemmas on binding because instead of substituting variableswith arbitrary terms we record mappings from variables to closed values via environments We alsoused environments in Appendix A but this time we use syntactic values rather than semantic onesAs a downside on paper using environments makes for more and bigger denitions because weneed to carry around both an environment and a term instead of merging them into a single term viasubstitution and because values are not a subset of terms but an entirely separate category But onpaper the extra work is straightforward and in a mechanized setting it is simpler than substitutionespecially in a setting with an intrinsically typed term representation (see Appendix A24)

Backgroundrelated work Our development is inspired signicantly by the use of step-indexedlogical relations by Ahmed [2006] and Acar Ahmed and Blume [2008] We refer to those worksand to Ahmedrsquos lectures at OPLSS 20133 for an introduction to (step-indexed) logical relations

Intensional and extensional validity Until this point change validity only species how func-tion changes behave that is their extension Using operational semantics we can specify how validfunction changes are concretely dened that is their intension To distinguish the two conceptswe contrast extensional validity and intensional validity Some extensionally valid changes are notintensionally valid but such changes are never created by derivation Dening intensional validityhelps to understand function changes function changes are produced from changes to values inenvironments or from functions being replaced altogether Requiring intensional validity helps toimplement change operations such as oplus more eciently by operating on environments Later inAppendix D we use similar ideas to implement change operations on defunctionalized functionchanges intensional validity helps put such eorts on more robust foundations even though we donot account formally for defunctionalization but only for the use of closures in the semantics

We use operational semantics to dene extensional validity in Appendix C3 (using plain logicalrelations) and Appendix C4 (using step-indexed logical relations) We switch to intensional validitydenition in Appendix C5

Non-termination and general recursion This proof implies correctness of ILC in the presenceof general recursion because untyped λ-calculus supports general recursion via xpoint combina-tors However the proof only applies to terminating executions of base programs like for earlierauthors [Acar Ahmed and Blume 2008] we prove that if a function terminates against both a baseinput v1 and an updated one v2 its derivative terminates against the base input and a valid inputchange dv from v1 to v2

We can also add a xpoint construct to our typed λ-calculus and to our mechanization withoutsignicant changes to our relations However a mechanical proof would require use of well-foundedinduction which our current mechanization avoids We return to this point in Appendix C7

While this support for general recursion is eective in some scenarios other scenarios can stillbe incrementalized better using structural recursion More ecient support for general recursion isa separate problem that we do not tackle here and leave for future work We refer to discussion inSec 151

Correctness statement Our nal correctness theorem is a variant of Corollary 1345 that werepeat for comparison

3hpswwwcsuoregoneduresearchsummerschoolsummer13curriculumhtml

Chapter C (Un)typed ILC operationally 195

We present our nal correctness theorem statement in Appendix C5 We anticipate it here forillustration the new statement is more explicit about evaluation but otherwise broadly similar

Corollary C56 (D n ndasho is correct corollary)Take any term t that is well-typed (Γ ` t τ ) and any suitable environments ρ1 dρ ρ2 intensionallyvalid at any step count (forallk (k ρ1 dρ ρ2) isin RG n Γ o) Assume t terminates in both the old envi-ronment ρ1 and the new environment ρ2 evaluating to output values v1 and v2 (ρ1 ` t dArr v1 andρ2 ` t dArr v2) Then D n t o evaluates in environment ρ and change environment dρ to a change valuedv (ρ1 dρ `∆ t dArr dv) and dv is a valid change from v1 to v2 so that v1 oplus dv = v2

Overall in this chapter we present the following contributions

bull We give an alternative presentation of derivation that can be mechanized without any binding-related lemmas not even weakening-related ones by introducing a separate syntax for changeterms (Appendix C1)

bull We prove formally ILC correct for STLC using big-step semantics and logical relations (Ap-pendix C3)

bull We show formally (with pen-and-paper proofs) that our semantics is equivalent to small-stepsemantics denitions (Appendix C2)

bull We introduce a formalized step-indexed variant of our denitions and proofs for simply-typed λ-calculus (Appendix C3) which scales directly to denitions and proofs for untypedλ-calculus

bull For typed λ-calculus we also mechanize our step-indexed proof

bull In addition to (extensional) validity we introduce a concept of intensional validity thatcaptures formally that function changes arise from changing environments or functions beingreplaced altogether (Appendix C5)

C1 Formalization

To present the proofs we rst describe our formal model of CBV ANF λ-calculus We dene anuntyped ANF language called λA We also dene a simply-typed variant called λArarr by adding ontop of λA a separate Curry-style type system

In our mechanization of λArarr however we nd it more convenient to dene a Church-styletype system (that is a syntax that only describes typing derivations for well-typed terms) separatelyfrom the untyped language

Terms resulting from dierentiation satisfy additional invariants and exploiting those invariantshelps simplify the proof Hence we dene separate languages for change terms produced fromdierentiation again in untyped (λ∆A) and typed (λ∆Ararr) variants

The syntax is summarized in Fig C1 the type systems in Fig C2 and the semantics in Fig C3The base languages are mostly standard while the change languages pick a few dierent choices

p = succ | add | m n = -- numbersc = n | w = x | λx rarr t | 〈ww〉 | cs t = w | w w | p w | let x = t in tv = ρ [λx rarr t ] | 〈v v〉 | cρ = x1 = v1 xn =vn

(a) Base terms (λA)

let x = t dx = dt in dtdv = ρ dρ [λx dx rarr dt ] | 〈dv dv〉 | dcdρ = dx1 = dv1 dxn = dvn

(b) Change terms (λ∆A)

DC n n o = 0

D n x o = dx

D n λx rarr t o = λx dx rarr D n t oD n 〈wa wb 〉 o = 〈D nwa o D nwb o〉

D n c o = DC n c o

Dwf wa

wa D nwa o

D n p w o = 0p w D nw oD n let x = s in t o = let x = s dx = D n s o in D n t o

(c) Dierentiation

τ = N | τa times τb | σ rarr τ

∆N = N∆(τa times τb ) = ∆τa times ∆τb∆(σ rarr τ ) = σ rarr ∆σ rarr ∆τ

∆ε = ε∆(Γ x τ ) = ∆Γ dx ∆τ

(d) Types and contexts

Figure C1 ANF λ-calculus λA and λ∆A

C11 Types and contextsWe show the syntax of types contexts and change types in Fig C1d We introduce types forfunctions binary products and naturals Tuples can be encoded as usual through nested pairsChange types are mostly like earlier but this time we use naturals as change for naturals (hencewe cannot dene a total operation)

We modify the denition of change contexts and environment changes to not contain entriesfor base values in this presentation we use separate environments for base variables and changevariables This choice avoids the need to dene weakening lemmas

C12 Base syntax for λAFor convenience we consider a λ-calculus in A-normal form We do not parameterize this calculusover language plugins to reduce mechanization overhead but we dene separate syntactic categoriesfor possible extension points

We show the syntax of terms in Fig C1aMeta-variable v ranges over (closed) syntactic values that is evaluation results Values are

numbers pairs of values or closures A closure is a pair of a function and an environment as usualEnvironments ρ are nite maps from variables to syntactic values in our mechanization using deBruijn indexes environments are in particular nite lists of syntactic values

Meta-variable t ranges over arbitrary terms and w ranges over neutral forms Neutral formsevaluate to values in zero steps but unlike values they can be open a neutral form is either a

`C n NT-Lit

`P succ Nrarr NT-Succ

`P add (N times N) rarr NT-Add

x τ isin ΓΓ ` x τ

T-Var`C c τΓ ` c τ

T-Const Γ ` t τ Γ ` wf σ rarr τ Γ ` wa σΓ ` wf wa τ

Γ x σ ` t τΓ ` λx rarr t σ rarr τ

T-LamΓ ` s σ Γ x σ ` t τ

Γ ` let x = s in t τT-Let

Γ ` wa τa Γ ` wb τbΓ ` 〈wa wb 〉 τa times τb

T-Pair

`P p σ rarr τ Γ ` w σΓ ` p w τ

T-Prim

(a) λArarr base typing

x τ isin ΓΓ `∆ dx τ

T-DVar`C c ∆τΓ `∆ c τ

T-DConst Γ `∆ dt τ

Γ `∆ dwf σ rarr τ Γ ` wa σ Γ `∆ dwa σΓ `∆ dwf wa dwa τ

T-DApp

Γ ` s σ Γ `∆ ds σ Γ x σ `∆ dt τΓ `∆ let x = s dx = ds in dt τ

T-DLetΓ x σ `∆ dt τ

Γ `∆ λx dx rarr dt σ rarr τT-DLam

Γ `∆ dwa τa Γ `∆ dwb τbΓ `∆ 〈dwa dwb 〉 τa times τb

T-DPair`P p σ rarr τ Γ ` w σ Γ `∆ dw σ

Γ `∆ 0p w dw τT-DPrim

(b) λ∆Ararr change typing Judgement Γ `∆ dt τ means that variables from both Γ and ∆Γ are in scope in dt andthe nal type is in fact ∆τ

Figure C2 ANF λ-calculus λArarr and λ∆Ararr type system

ρ ` p w dArr1 P n p o (V nw o ρ) E-Primρ ` w dArr0 V nw o ρ E-Val ρ ` t dArrn v

ρ ` s dArrns vs (ρ x = vs ) ` t dArrnt vtρ ` let x = s in t dArr1+ns+nt vt

E-Letρ ` wf dArr0 vf ρ ` wa dArr0 va app vf va dArrn v

ρ ` wf wa dArr1+n vE-App

app (ρprime [λx rarr t ]) va = (ρprime x = va ) ` tV n x o ρ = ρ (x)V n λx rarr t o ρ = ρ [λx rarr t ]V n 〈wa wb 〉 o ρ = 〈V nwa o ρ V nwb o ρ 〉V n c o ρ = c

P n succ o n = 1 + nP n add o (m n) = m + n

(a) Base semantics Judgement ρ ` t dArrn v says that ρ ` t a pair of environment ρ and term t evaluates to valuev in n steps and app vf va dArrn v constructs the pair to evaluate via app vf va

ρ dρ `∆ dw dArr V∆ n dw o ρ dρE-DVal ρ dρ `∆ dt dArr dv

ρ dρ `∆ 0p w dw dArr P∆ n p o (V nw o ρ) (V∆ n dw o ρ dρ)E-DPrim

ρ dρ `∆ dwf dArr dvf ρ ` wa dArr va ρ dρ `∆ dwa dArr dva dapp dvf va dva dArr dv

ρ dρ `∆ dwf wa dwa dArr dvE-DApp

ρ ` s dArr vs ρ dρ `∆ ds dArr dvs (ρ x = vs ) (dρ dx = dvs) `∆ dt dArr dvt

ρ dρ `∆ let x = s dx = ds in dt dArr dvtE-DLet

dapp (ρprime dρprime [λx dx rarr dt ]) va dva = (ρprime x = va ) (dρprime dx = dva ) `∆ dt

V∆ n dx o ρ dρ = dρ (dx)V∆ n λx dx rarr dt o ρ dρ = ρ dρ [λx dx rarr dt ]V∆ n 〈dwa dwb 〉 o ρ dρ = 〈V∆ n dwa o ρ dρ V∆ n dwb o ρ dρ 〉V∆ n c o ρ dρ = c

P∆ n succ o n dn = dnP∆ n add o 〈m n〉〈dm dn〉 = dm + dn

(b) Change semantics Judgement ρ dρ `∆ t dArr dv says that ρ dρ `∆ dt a triple of environment ρ changeenvironment dρ and change term t evaluates to change value dv and dapp dvf va dva dArr dv constructs thetriple to evaluate via dapp dvf va dva

Figure C3 ANF λ-calculus (λA and λ∆A) CBV semantics

variable a constant value c a λ-abstraction or a pair of neutral formsA term is either a neutral form an application of neutral forms a let expression or an application

of a primitive function p to a neutral form Multi-argument primitives are encoded as primitivestaking (nested) tuples of arguments Here we use literal numbers as constants and +1 and additionas primitives (to illustrate dierent arities) but further primitives are possible

Notation C11We use subscripts ab for pair components f a for function and argument and keep using 12 for oldand new values

C13 Change syntax for λ∆ANext we consider a separate language for change terms which can be transformed into the baselanguage This language supports directly the structure of change terms base variables and changevariables live in separate namespaces As we show later for the typed language those namespacesare represented by typing contexts Γ and ∆Γ that is the typing context for change variables isalways the change context for Γ

We show the syntax of change terms in Fig C1bChange terms often take or bind two parameters at once one for a base value and one for its

change Since a function change is applied to a base input and a change for it at once the syntax forchange term has a special binary application node dwf wa dwa otherwise in ANF such syntaxmust be encoded through separate applications via let df a = dwf wa in df a dwa In the sameway closure changes ρ dρ [λx dx rarr dt ] bind two variables at once and close over separateenvironments for base and change variables Various other changes in the same spirit simplifysimilar formalization and mechanization details

In change terms we write 0p as syntax for the derivative of p evaluated as such by the semanticsStrictly speaking dierentiation must map primitives to standard terms so that the resultingprograms can be executed by a standard semantics hence we should replace 0p by a concreteimplementation of the derivative of p However doing so in a new formalization yields littleadditional insight and requires writing concrete derivatives of primitives as de Bruijn terms

C14 DierentiationWe show dierentiation in Fig C1c Dierentiation maps constructs in the language of base termsone-to-one to constructs in the language of change terms

C15 Typing λArarr and λ∆ArarrWe dene typing judgement for λArarr base terms and for λ∆Ararr change terms We show typing rulesin Fig C2b

Typing for base terms is mostly standard We use judgements `P p and `C c to specify typing ofprimitive functions and constant values For change terms one could expect a type system onlyproving judgements with shape Γ∆Γ ` dt ∆τ (where Γ∆Γ stands for the concatenation of Γ and∆Γ) To simplify inversion on such judgements (especially in Agda) we write instead Γ `∆ dt τ sothat one can verify the following derived typing rule for D n ndash o

Γ ` t τΓ `∆ D n t o τ

T-Derive

We also use mutually recursive typing judgment v τ for values and ρ Γ for environmentsand similarly ∆ dv τ for change values and ∆ dρ Γ for change environments We only showthe (unsurprising) rules for v τ and omit the others One could alternatively and equivalentlydene typing on syntactic values v by translating them to neutral forms w = vlowast (using unsur-prising denitions in Appendix C2) and reusing typing on terms but as usual we prefer to avoidsubstitution

n NTV-Nat

va τa vb τb 〈va vb 〉 τa times τb

TV-Pair v τ

Γ x σ ` t τ ρ Γ ρ [λx rarr t ] σ rarr τ

TV-Lam

C16 SemanticsWe present our semantics for base terms in Fig C3a Our semantics gives meaning to pairs ofa environment ρ and term t consistent with each other that we write ρ ` t By ldquoconsistentrdquowe mean that ρ contains values for all free variables of t and (in a typed language) values withcompatible values (if Γ ` t τ then ρ Γ) Judgement ρ ` t dArrn v says that ρ ` t evaluates tovalue v in n steps The denition is given via a CBV big-step semantics Following Acar Ahmedand Blume [2008] we index our evaluation judgements via a step count which counts in essenceβ-reduction steps we use such step counts later to dene step-indexed logical relations Sinceour semantics uses environments β-reduction steps are implemented not via substitution but viaenvironment extension but the resulting step-counts are the same (Appendix C2) Applying closurevf = ρ prime [λx rarr t ] to argument va produces environment-term pair (ρ prime x = va) ` t which weabbreviate as app vf va Wersquoll reuse this syntax later to dene logical relations

In our mechanized formalization we have additionally proved lemmas to ensure that thissemantics is sound relative to our earlier denotational semantics (adapted for the ANF syntax)

Evaluation of primitives is delegated to function P n ndash o ndash We show complete equations for thetyped case for the untyped case we must turn P n ndash o ndash and P∆ n ndash o ndash into relations (or add expliciterror results) but we omit the standard details (see also Appendix C6) For simplicity we assumeevaluation of primitives takes one step We conjecture higher-order primitives might need to beassigned dierent costs but leave details for future work

We can evaluate neutral forms w to syntactic values v using a simple evaluation functionV nw o ρ and use P n p o v to evaluate primitives When we need to omit indexes we write ρ ` t dArr vto mean that for some n we have ρ ` t dArrn v

We can also dene an analogous non-indexed big-step semantics for change terms and wepresent it in Fig C3b

C17 Type soundnessEvaluation preserves types in the expected way

Lemma C12 (Big-step preservation)1 If Γ ` t τ ρ Γ and ρ ` t dArrn v then v τ

2 If Γ `∆ dt τ ρ Γ ∆ dρ Γ and ρ dρ `∆ dt dArr dv then ∆ dv τ

Proof By structural induction on evaluation judgements In our intrinsically typed mechanizationthis is actually part of the denition of values and big-step evaluation rules

To ensure that our semantics for λArarr is complete for the typed language instead of provinga small-step progress lemma or extending the semantics with errors we just prove that all typedterms normalize in the standard way As usual this fails if we add xpoints or for untyped terms Ifwe wanted to ensure type safety in such a case we could switch to functional big-step semantics ordenitional interpreters [Amin and Rompf 2017 Owens et al 2016]

Theorem C13 (CBV normalization)For any well-typed and closed term ` t τ there exist a step count n and value v such that ` t dArrn v

Proof A variant of standard proofs of normalization for STLC [Pierce 2002 Ch 12] adapted forbig-step semantics rather than small-step semantics (similarly to Appendix C3) We omit neededdenitions and refer interested readers to our Agda mechanization

We havenrsquot attempted to prove this lemma for arbitrary change terms (though we expect wecould prove it by dening an erasure to the base language and relating evaluation results) but weprove it for the result of dierentiating well-typed terms in Corollary C32

C2 Validating our step-indexed semanticsIn this section we show how we ensure the step counts in our base semantics are set correctly andhow we can relate this environment-based semantics to more conventional semantics based onsubstitution andor small-step We only consider the core calculus without primitives constantsand pairs Results from this section are not needed later and we have proved them formally onpaper but not mechanized them as our goal is to use environment-based big-step semantics in ourmechanization

To this end we relate our semantics rst with a big-step semantics based on substitution (ratherthan environments) and then relating this alternative semantics to a small-step semantics Resultsin this section are useful to understand better our semantics and as a design aide to modify it butare not necessary to the proof so we have not mechanized them

As a consequence we also conjecture that our logical relations and proofs could be adaptedto small-step semantics along the lines of Ahmed [2006] We however do not nd that necessaryWhile small-step semantics gives meaning to non-terminating programs and that is important fortype soundness proofs it does not seem useful (or possible) to try to incrementalize them or toensure we do so correctly

In proofs using step-indexed logical relations the use of step-counts in denitions is oftendelicate and tricky to get right But Acar Ahmed and Blume provide a robust recipe to ensurecorrect step-indexing in the semantics To be able to prove the fundamental property of logicalrelations we ensure step-counts agree with the ones induced by small-step semantics (which countsβ-reductions) Such a lemma is not actually needed in other proofs but only useful as a sanitycheck We also attempted using the style of step-indexing used by Amin and Rompf [2017] butwere unable to produce a proof To the best of our knowledge all proofs using step-indexed logicalrelations even with functional big-step semantics [Owens et al 2016] use step-indexing that agreeswith small-step semantics

Unlike Acar Ahmed and Blume we use environments in our big-step semantics this avoids theneed to dene substitution in our mechanized proof Nevertheless one can show the two semanticscorrespond to each other Our environments ρ can be regarded as closed value substitutions as longas we also substitute away environments in values Formally we write ρlowast(t) for the ldquohomomorphic

extensionrdquo of substitution ρ to terms which produces other terms If v is a value using environmentswe write w = vlowast for the result of translating that value to not use environments this operationproduces a closed neutral form w Operations ρlowast(t) and vlowast can be dened in a mutually recursiveway

ρlowast(x) = (ρ (x))lowast

ρlowast(λx rarr t) = λx rarr ρlowast(t)ρlowast(w1 w2) = ρ

lowast(w1) ρlowast(w2)

ρlowast(let x = t1 in t2) = let x = ρlowast(t1) in ρlowast(t2)

(ρ [λx rarr t ])lowast = λx rarr ρlowast(t)

If ρ ` t dArrn v in our semantics a standard induction over the derivation of ρ ` t dArrn v shows thatρlowast(t) dArrn vlowast in a more conventional big-step semantics using substitution rather than environments(also following Acar Ahmed and Blume)

x dArr0 xVarrsquo

λx rarr t dArr0 λx rarr tLamrsquo

t [x = w2 ] dArrn w prime

(λx rarr t) w2 dArr1+n w primeApprsquo

t1 dArrn1 w1 t2 [x = w1 ] dArrn2 w2

let x = t1 in t2 dArr1+n1+n2 w2Letrsquo

In this form it is more apparent that the step indexes count steps of β-reduction or substitutionItrsquos also easy to see that this big-step semantics agrees with a standard small-step semantics

7rarr (which we omit) if t dArrn w then t 7rarrn w Overall the two statements can be composed soour original semantics agrees with small-step semantics if ρ ` t dArrn v then ρlowast(t) dArrn vlowast and nallyρlowast(t) 7rarrn vlowast Hence we can translate evaluation derivations using big-step semantics to derivationsusing small-step semantics with the same step count

However to state and prove the fundamental property we need not prove that our semanticsis sound relative to some other semantics We simply dene the appropriate logical relation forvalidity and show it agrees with a suitable denition for oplus

Having dened our semantics we proceed to dene extensional validity

C3 Validity syntactically (λArarr λ∆Ararr)For our typed language λArarr at least as long as we do not add xpoint operators we can denelogical relations using big-step semantics but without using step-indexes The resulting relationsare well-founded only because they use structural recursion on types We present in Fig C4 theneeded denitions as a stepping stone to the denitions using step-indexed logical relations

Following Ahmed [2006] and Acar Ahmed and Blume [2008] we encode extensional validitythrough two mutually recursive type-indexed families of ternary logical relations RV nτ o overclosed values and RC nτ o over terms (and environments)

These relations are analogous to notions we considered earlier and express similar informalnotions

bull With denotational semantics we write dv B v1 rarr v2 τ to say that change value dv isin n∆τ ois a valid change from v1 to v2 at type τ With operational semantics instead we write(v1 dv v2) isin RV nτ o where v1 dv and v2 are now closed syntactic values

bull For terms with denotational semantics we write n dt o dρ B n t1 o ρ1 rarr n t2 o ρ2 τ tosay that dt is a valid change from t1 and t2 considering the respective environments Withoperational semantics instead we write (ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) isin RC nτ o

Since we use Church typing and only mechanize typed terms we must include in all casesappropriate typing assumptions

Relation RC nτ o relates tuples of environments and computations ρ1 ` t1 ρ dρ `∆ dt andρ2 ` t2 it holds if t1 evaluates in environment ρ1 to v1 and t2 evaluates in environment ρ2 to v2then dt must evaluate in environments ρ and dρ to a change value dv with v1 dv v2 related byRV nτ o The environments themselves need not be related this denition characterizes validityextensionally that is it can relate t1 dt and t2 that have unrelated implementations and unrelatedenvironments mdash in fact even unrelated typing contexts This exibility is useful to when relatingclosures of type σ rarr τ two closures might be related even if they have close over environmentsof dierent shape For instance closures v1 = [λx rarr 0] and v2 = (y = 0) [λx rarr y ] are relatedby a nil change such as dv = [λx dx rarr 0] In Appendix C5 we discuss instead an intensionaldenition of validity

In particular for function types the relation RV nσ rarr τ o relates function values f1 df and f2 ifthey map related input values (and for df input changes) to related output computations

We also extend the relation on values to environments via RG n Γ o environments are related iftheir corresponding entries are related values

RV nN o = (n1 dn n2) | n1 dn n2 isin N and n1 + dn = n2 RV nτa times τb o = (〈va1 vb1〉〈dva dvb 〉〈va2 vb2〉) |

(va1 dva va2) isin RV nτa o and (vb1 dvb vb2) isin RV nτb o RV nσ rarr τ o = (vf 1 dvf vf 2) |

vf 1 σ rarr τ and ∆ dvf σ rarr τ and vf 2 σ rarr τ

andforall(v1 dv v2) isin RV nσ o (app vf 1 v1 dapp dvf v1 dv app vf 2 v2) isin RC nτ o

RC nτ o = (ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) |(existΓ1 Γ Γ2 Γ1 ` t1 τ and Γ `∆ dt τ and Γ2 ` t2 τ )andforallv1 v2(ρ1 ` t1 dArr v1 and ρ2 ` t2 dArr v2) rArrexistdv ρ dρ `∆ dt dArr dv and (v1 dv v2) isin RV nτ o

RG n ε o = () RG n Γ x τ o = ((ρ1 x = v1) (dρ dx = dv) (ρ2 x = v2)) |

(ρ1 dρ ρ2) isin RG n Γ o and (v1 dv v2) isin RV nτ o Γ |= dt I t1 rarr t2 τ = forall(ρ1 dρ ρ2) isin RG n Γ o

(ρ1 ` t1 ρ1 dρ `∆ dt ρ2 ` t2) isin RC nτ o

Figure C4 Dening extensional validity via logical relations and big-step semantics

Given these denitions one can prove the fundamental property

Theorem C31 (Fundamental property correctness of D n ndasho)For every well-typed term Γ ` t τ we have that Γ |= D n t o I t rarr t τ

Proof sketch By induction on the structure on terms using ideas similar to Theorem 1222

It also follows that D n t o normalizes

Corollary C32 (D n ndasho normalizes to a nil change)For any well-typed and closed term ` t τ there exist value v and change value dv such that ` t dArr v`∆ D n t o dArr dv and (v dv v) isin RV nτ o

Proof A corollary of the fundamental property and of the normalization theorem (Theorem C13)since t is well-typed and closed it normalizes to a value v for some step count (` t dArr v) The emptyenvironment change is valid () isin RG n ε o so from the fundamental property we get that(` t `∆ D n t o ` t) isin RC nτ o From the denition of RC nτ o and ` t dArr v it follows that there existsdv such that `∆ D n t o dArr dv and (v dv v) isin RV nτ o

Remark C33Compared to prior work these relations are unusual for two reasons First instead of just relatingtwo executions of a term we relate two executions of a term with an execution of a change termSecond most such logical relations (including Ahmed [2006]rsquos one but except Acar et al [2008]rsquosone) dene a logical relation (sometimes called logical equivalence) that characterizes contextualequivalence while we donrsquot

Consider a logical equivalence dened through sets RC nτ o and RV nτ o If (t1 t2) isin RC nτ oholds and t1 terminates (with result v1) then t2 must terminate as well (with result v2) and theirresults v1 and v2 must in turn be logically equivalent (v1 v2 isin RV nτ o) And at base types like N(v1 v2) isin RV nN o means that v1 = v2

Here instead the fundamental property relates two executions of a term on dierent inputswhich might take dierent paths during execution In a suitably extended language we could evenwrite term t = λx rarr if x = 0 then 1 else loop and run it on inputs v1 = 0 and v2 = 1 theseinputs are related by change dv = 1 but t will converge on v1 and diverge on v2 We must usea semantics that allow such behavioral dierence Hence at base type N (v1 dv v2) isin RV nN omeans just that dv is a change from v1 to v2 hence that v1 oplus dv is equivalent to v2 because oplus agreeswith extensional validity in this context as well And if (ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) isin RC nτ oterm t1 might converge while t2 diverges only if both converge must their results be related

These subtleties become more relevant in the presence of general recursion and non-terminatingprograms as in untyped language λA or in a hypothetical extension of λArarr with xpoint operators

C4 Step-indexed extensional validity (λArarr λ∆Ararr)Step-indexed logical relations dene approximations to a relation to enable dealing with non-terminating programs Logical relations relate the behavior of multiple terms during evaluationwith step-indexed logical relations we can take a bound k and restrict attention to evaluations thattake at most k steps overall as made precise by the denitions Crucially entities related at stepcount k are also related at any step count j lt k Conversely the higher the step count the moreprecise the dened relation In the limit if entities are related at all step counts we simply say theyare related This construction of limits resembles various other approaches to constructing relationsby approximations but the entire framework remains elementary In particular the relations aredened simply because they are well-founded (that is only dened by induction on smaller numbers)

Proofs of logical relation statement need to deal with step-indexes but when they work (as here)they are otherwise not much harder than other syntactic or logical relation proofs

For instance if we dene equivalence as a step-indexed logical relation we can say that twoterms are equivalent for k or fewer steps even if they might have dierent behavior with moresteps available In our case we can say that a change appears valid at step count k if it behaves likea valid change in ldquoobservationsrdquo using at most k steps

Like before we dene a relation on values and one on computations as sets RV nτ o and RC nτ oInstead of indexing the sets with the step-count we add the step counts to the tuples they containso for instance (k v1 dv v2) isin RV nτ o means that value v1 change value dv and value v2 arerelated at step count k (or k-related) and similarly for RC nτ o

The details or the relation denitions are subtle but follow closely the use of step-indexingby Acar Ahmed and Blume [2008] We add mention of changes and must decide how to usestep-indexing for them

How step-indexing proceeds We explain gradually in words how the denition proceedsFirst we say k-related function values take j-related arguments to j-related results for all j

less than k That is reected in the denition for RV nσ rarr τ o it contains (k vf 1 dvf vf 2) if forall (j v1 dv v2) isin RV nσ o) with j lt k the result of applications are also j-related However theresult of application are not syntactic applications encoding vf 1 v14 It is instead necessary to useapp vf 1 v1 the result of one step of reduction The two denitions are not equivalent because asyntactic application would take one extra step to reduce

The denition for computations takes longer to describe Roughly speaking computations arek-related if after j steps of evaluations (with j lt k) they produce values related at k minus j stepsin particular if the computations happen to be neutral forms and evaluate in zero steps theyrsquorek-related as computations if the values they produce are k-related In fact the rule for evaluation hasa wrinkle Formally instead of saying that computations (ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) are k-relatedwe say that (k ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) isin RC nτ o We do not require all three computationsto take j steps Instead if the rst computation ρ1 ` t1 evaluates to a value in j lt k steps and thesecond computation ρ2 ` t2 evaluates to a value in any number of steps then the change computationρ dρ `∆ dt must also terminate to a change value dv (in an unspecied number of steps) and theresulting values must be related at step-count k minus j (that is (k minus j v1 dv v2) isin RV nτ o)

What is new in the above denition is the addition of changes and the choice to allow changeterm dt to evaluate to dv in an unbounded number of steps (like t2) as no bound is necessary forour proofs This is why the semantics we dened for change terms has no step counts

Well-foundedness of step-indexed logical relations has a small wrinkle because k minus j need notbe strictly less than k But we dene the two relations in a mutually recursive way and the pair ofrelations at step-count k is dened in terms of the pair of relation at smaller step-count All otherrecursive uses of relations are at smaller step-indexes

In this section we index the relation by both types and step-indexes since this is the one we usein our mechanized proof This relation is dened by structural induction on types We show thisdenition in Fig C5 Instead in Appendix C6 we consider untyped λ-calculus and drop types Theresulting denition is very similar but is dened by well-founded recursion on step-indexes

Again since we use Church typing and only mechanize typed terms we must include in all casesappropriate typing assumptions This choice does not match Ahmed [2006] but is one alternativeshe describes as equivalent Indeed while adapting the proof the extra typing assumptions andproof obligations were not a problem

4That happens to be illegal syntax in this presentation but can be encoded for instance as f = vf 1 x = v1 ` (f x) andthe problem is more general

RV nN o = (k n1 dn n2) | n1 dn n2 isin N and n1 + dn = n2 RV nτa times τb o = (k 〈va1 vb1〉〈dva dvb 〉〈va2 vb2〉) |

(k va1 dva va2) isin RV nτa o and (k vb1 dvb vb2) isin RV nτb o RV nσ rarr τ o = (k vf 1 dvf vf 2) |

andforall(j v1 dv v2) isin RV nσ o j lt k rArr(j app vf 1 v1 dapp dvf v1 dv app vf 2 v2) isin RC nτ o

RC nτ o = (k ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) |(existΓ1 Γ Γ2 Γ1 ` t1 τ and Γ `∆ dt τ and Γ2 ` t2 τ )andforallj v1 v2(j lt k and ρ1 ` t1 dArrj v1 and ρ2 ` t2 dArr v2) rArrexistdv ρ dρ `∆ dt dArr dv and (k minus j v1 dv v2) isin RV nτ o

RG n ε o = (k) RG n Γ x τ o = (k (ρ1 x = v1) (dρ dx = dv) (ρ2 x = v2)) |

(k ρ1 dρ ρ2) isin RG n Γ o and (k v1 dv v2) isin RV nτ o Γ |= dt I t1 rarr t2 τ = forall(k ρ1 dρ ρ2) isin RG n Γ o

(k ρ1 ` t1 ρ1 dρ `∆ dt ρ2 ` t2) isin RC nτ o

Figure C5 Dening extensional validity via step-indexed logical relations and big-step semantics

At this moment we do not require that related closures contain related environments againwe are dening extensional validity

Given these denitions we can prove that all relations are downward-closed that is relations atstep-count n imply relations at step-count k lt n

Lemma C41 (Extensional validity is downward-closed)Assume k le n

1 If (n v1 dv v2) isin RV nτ o then (k v1 dv v2) isin RV nτ o

2 If (n ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) isin RC nτ o then

(k ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) isin RC nτ o

3 If (n ρ1 dρ ρ2) isin RG n Γ o then (k ρ1 dρ ρ2) isin RG n Γ o

Proof sketch For RV nτ o case split on τ and expand hypothesis and thesis If τ = N they coincideFor RV nσ rarr τ o parts of the hypothesis and thesis match For some relation P the rest of thehypothesis has shape foralljltn P (j v1 dv v2) and the rest of the thesis has shape foralljltk P (j v1 dv v2)Assume j lt k We must prove P (j v1 dv v2) but since j lt k le n we can just apply the hypothesis

The proof for RC nτ o follows the same idea as RV nσ rarr τ oFor RG n Γ o apply the theorem for RV nτ o to each environments entry x τ

At this point we prove the fundamental property

Proof sketch By structural induction on typing derivations using ideas similar to Theorem C31and relying on Lemma C41 to reduce step counts where needed

C5 Step-indexed intensional validityUp to now we have dened when a function change is valid extensionally that is purely based onits behavior as we have done earlier when using denotational semantics We conjecture that withthese one can dene oplus and prove it agrees with extensional validity However we have not done so

Instead we modify denitions in Appendix C4 to dene validity intensionally To ensure thatf1oplusdf = f2 (for a suitable oplus) we choose to require that closures f1 df and f2 close over environmentsof matching shapes This change does not complicate the proof of the fundamental lemma all theadditional proof obligations are automatically satised

However it can still be necessary to replace a function value with a dierent one Hencewe extend our denition of values to allow replacement values Closure replacements producereplacements as results so we make replacement values into valid changes for all types We mustalso extend the change semantics both to allow evaluating closure replacements and to allowderivatives of primitive to handle replacement values

We have not added replacement values to the syntax so currently they can just be added tochange environments but we donrsquot expect adding them to the syntax would cause any signicanttrouble

We present the changes described above for the typed semantics We have successfully mecha-nized this variant of the semantics as well Adding replacement values v requires extending thedenition of change values evaluation and validity We add replacement values to change values

dv = | v

Derivatives of primitives when applied to replacement changes must recompute their output Therequired additional equations are not interesting but we show them anyway for completeness

P∆ n succ o n1 (n2) = (n2 + 1)P∆ n add o 〈_ _〉 (〈m2 n2〉) = (m2 + n2)P∆ n add o p1 (dp 〈dm n2〉) = (P n add o (p1 oplus dp))P∆ n add o p1 (dp 〈m dv〉) = (P n add o (p1 oplus dp))

Evaluation requires a new rule E-BangApp to evaluate change applications where the functionchange evaluates to a replacement change

ρ dρ `∆ dwf dArr vf ρ ` wa dArr va ρ dρ `∆ dwa dArr dva app vf (va oplus dva ) dArr v

ρ dρ `∆ dwf wa dwa dArr vE-BangApp

Evaluation rule E-BangApp requires dening oplus on syntactic values We dene it intensionally

Denition C51 (Update operator oplus)Operator oplus is dened on values by the following equations where match` ρ dρ (whose denitionwe omit) tests if ρ and dρ are environments for the same typing context Γ

v1 oplus v2 = v2ρ [λx rarr t ] oplus dρ [λx dx rarr dt ] =if match` ρ dρ then

-- If ρ and dρ are environments for the same typing context Γ(ρ oplus dρ) [λx rarr t ]

else-- otherwise the input change is invalid so just give-- any type-correct result

ρ [λx rarr t ]n oplus dn = n + dn〈va1 vb1〉 oplus 〈dva dvb 〉 = 〈va1 oplus dva vb1 oplus dvb 〉

-- An additional equation is needed in the untyped language-- not in the typed one This equation is for invalid-- changes so we can just return v1

v1 oplus dv = v1

We dene oplus on environments for matching contexts to combine values and changes pointwise

(x1 = v1 xn =vn) oplus (dx1 = dv1 dxn = dvn) =(x1 = v1 oplus dv1 xn =vn oplus dvn)

The denition of update for closures can only update them in few cases but this is not a problemas we show in a moment we restrict validity to the closure changes for which it is correct

We ensure replacement values are accepted as valid for all types by requiring the followingequation holds (hence modifying all equations for RV n ndash o we omit details)

RV nτ o supe (k v1 v2 v2) | v1 τ and v2 τ (C1)

where we write v τ to state that value v has type τ we omit the unsurprising rules for thisjudgement

To restrict valid closure changes RV nσ rarr τ o now requires that related elements satisfypredicate matchImpl vf 1 dvf vf 2 dened by the following equations

matchImpl(ρ1 [λx rarr t ])(ρ1 dρ [λx dx rarr D n t o])((ρ1 oplus dρ) [λx rarr t ]) = True

matchImpl _ _ _ = False

In other words validity (k vf 1 dvf vf 2) isin RV nσ rarr τ o now requires via matchImpl that thebase closure environment ρ1 and the base environment of the closure change dvf coincide thatρ2 = ρ1 oplus dρ and that vf 1 and vf 2 have λx rarr t as body while dvf has body λx dx rarr D n t o

To dene intensional validity for function changes RV nσ rarr τ o must use matchImpl and

explicitly support replacement closures to satisfy Eq (C1) Its denition is as follows

RV nσ rarr τ o = (k vf 1 dvf vf 2) | vf 1 σ rarr τ and ∆ dvf σ rarr τ and vf 2 σ rarr τ

andmatchImpl vf 1 dvf vf 2andforall(j v1 dv v2) isin RV nσ o j lt k rArr(j app vf 1 v1 dapp dvf v1 dv app vf 2 v2) isin RC nτ o cup (k vf 1 vf 2 vf 2) | vf 1 σ rarr τ and vf 2 σ rarr τ

Denitions of RV n ndash o for other types can be similarly updated to support replacement changesand satisfy Eq (C1)

Using these updated denitions we can again prove the fundamental property with the samestatement as Theorem C42 Furthermore we now prove that oplus agrees with validity

Theorem C53 (oplus agrees with step-indexed intensional validity)If (k v1 dv v2) isin RV nτ o then v1 oplus dv = v2

Proof By induction on types For type N validity coincides with the thesis For type 〈τa τb 〉 wemust apply the induction hypothesis on both pair components

For closures validity requires that v1 = ρ1 [λx rarr t ] dv = dρ [λx dx rarr D n t o] v2 =ρ2 [λx rarr t ] with ρ1 oplus dρ = ρ2 and there exists Γ such that Γ x σ ` t τ Moreover from validitywe can show that ρ and dρ have matching shapes ρ is an environment matching Γ and dρ is achange environment matching ∆Γ Hence v1 oplus dv can update the stored environment and we canshow the thesis by the following calculation

v1 oplus dv = ρ1 [λx rarr t ] oplus dρ [λx dx rarr dt ] =(ρ1 oplus dρ) [λx rarr t ] = ρ2 [λx rarr t ] = v2

We can also dene 0 intensionallyas a metafunction on values and environments and proveit correct For closures we dierentiate the body and recurse on the environment The denitionextends from values to environments variable-wise so we omit the standard formal denitionDenition C54 (Nil changes 0)Nil changes on values are dened as follows

0ρ [λxrarrt ] = ρ 0ρ [λx dx rarr D n t o]0〈ab〉 = 〈0a 0b〉0n = 0

Lemma C55 (0 produces valid changes)For all values v τ and indexes k we have (k v 0v v) isin RV nτ o

Proof sketch By induction on v For closures we must apply the fundamental property (Theo-rem C52) to D n t o

Because 0 transforms closure bodies we cannot dene it internally to the language This problemcan be avoided by defunctionalizing functions and function changes as we do in Appendix D

We conclude with the overall correctness theorem analogous to Corollary 1345

Proof Follows immediately from Theorem C52 and Theorem C53

C6 Untyped step-indexed validity (λA λ∆A)By removing mentions of types from step-indexed validity (intensional or extensional though weshow extensional denitions) we can adapt it to an untyped language We can still distinguishbetween functions numbers and pairs by matching on values themselves instead of matching ontypes Without types typing contexts Γ now degenerate to lists of free variables of a term we stilluse them to ensure that environments contain enough valid entries to evaluate a term Validityapplies to terminating executions hence we need not consider executions producing dynamic typeerrors when proving the fundamental property

We show resulting denitions for extensional validity in Fig C6 but we can also dene inten-sional validity and prove the fundamental lemma for it As mentioned earlier for λA we must turnP n ndash o ndash and P∆ n ndash o ndash into relations and update E-Prim accordingly

The main dierence in the proof is that this time the recursion used in the relations can only beproved to be well-founded because of the use of step-indexes we omit details [Ahmed 2006]

Otherwise the proof proceeds just as earlier in Appendix C4 We prove that the relations aredownward-closed just like in Lemma C41 (we omit the new statement) and we prove the newfundamental lemma by induction on the structure of terms (not of typing derivations)

Theorem C61 (Fundamental property correctness of D n ndasho)If FV (t) sube Γ then we have that Γ |= D n t o I t rarr t

Proof sketch Similar to the proof of Theorem C42 but by structural induction on terms andcomplete induction on step counts not on typing derivations

However we can use the induction hypothesis in the same ways as in earlier proofs for typedlanguages all uses of the induction hypothesis in the proof are on smaller terms and some also atsmaller step counts

C7 General recursion in λArarr and λ∆ArarrWe have sketched informally in Sec 151 how to support xpoint combinators

We have also extended our typed languages with a xpoint combinators and proved them correctformally (not mechanically yet) In this section we include the needed denitions to make ourclaim precise They are mostly unsurprising if long to state

Since we are in a call-by-value setting we only add recursive functions not recursive values ingeneral To this end we replace λ-abstraction λx rarr t with recursive function rec f x rarr t whichbinds both f and x in t and replaces f with the function itself upon evaluation

RV = (k n1 dn n2) | n1 dn n2 isin N and n1 + dn = n2 cup (k vf 1 dvf vf 2) |forall(j v1 dv v2) isin RV j lt k rArr(j app vf 1 v1 dapp dvf v1 dv app vf 2 v2) isin RC cup (k 〈va1 vb1〉〈dva dvb 〉〈va2 vb2〉) |(k va1 dva va2) isin RV and (k vb1 dvb vb2) isin RV

RC = (k ρ1 ` t1 ρ dρ `∆ dt ρ2 ` t2) |forallj v1 v2(j lt k and ρ1 ` t1 dArrj v1 and ρ2 ` t2 dArr v2) rArrexistdv ρ dρ `∆ dt dArr dv and (k minus j v1 dv v2) isin RV

RG n ε o = (k) RG n Γ x o = (k (ρ1 x = v1) (dρ dx = dv) (ρ2 x = v2)) |

(k ρ1 dρ ρ2) isin RG n Γ o and (k v1 dv v2) isin RV Γ |= dt I t1 rarr t2 = forall(k ρ1 dρ ρ2) isin RG n Γ o

(k ρ1 ` t1 ρ1 dρ `∆ dt ρ2 ` t2) isin RC

Figure C6 Dening extensional validity via untyped step-indexed logical relations and big-stepsemantics

The associated small-step reduction rule would be (rec f x rarr t) v rarr t [x =v f =rec f x rarr t ]As before we formalize reduction using big-step semantics

Typing rule T-Lam is replaced by a rule for recursive functions

Γ f σ rarr τ x σ ` t τΓ ` rec f x rarr t σ rarr τ

We replace closures with recursive closures in the denition of values

w = rec f x rarr t | v = ρ [rec f x rarr t ] |

We also modify the semantics for abstraction and application Rules E-Val and E-App are unchangedit is sucient to adapt the denitions of V n ndash o ndash and app so that evaluation of a function value vfhas access to vf in the environment

V n rec f x rarr t o ρ = ρ [rec f x rarr t ]app (vf (ρ prime [rec f x rarr t ])) va =(ρ prime f = vf x = va) ` t

Like in Haskell we write x p in equations to bind an argument as metavariable x and match itagainst pattern p

Similarly we modify the language of changes the denition of dierentiation and evaluationmetafunctions V∆ n ndash o ndash and dapp Since the derivative of a recursive function f = rec f x rarr t

can call the base function we remember the original function body t in the derivative togetherwith its derivative D n t o This should not be surprising in Sec 151 where recursive functions aredened using letrec a recursive function f is in scope in the body of its derivative df Here we usea dierent syntax but still ensure that f is in scope in the body of derivative df The denitions areotherwise unsurprising if long

dw = drec f df x dx rarr t dt | dv = ρ dρ [drec f df x dx rarr t dt ] | D n rec f x rarr t o = drec f df x dx rarr t D n t oV∆ ndrec f df x dx rarr dt o ρ dρ =ρ dρ [drec f df x dx rarr t dt ]

dapp (dvf (ρ prime dρ prime [rec f df x dx rarr dt ])) va dva =let vf = ρ prime [rec f x rarr t ]in (ρ prime f = vf x = va) (dρ prime df = dvf dx = dva) `∆ dt

Γ f σ rarr τ x σ ` t τ Γ f σ rarr τ x σ `∆ dt τΓ `∆ drec f df x dx rarr t dt σ rarr τ

T-DRec

We can adapt the proof of the fundamental property to the use of recursive functions

Proof sketch Mostly as before modulo one interesting dierence To prove the fundamentalproperty for recursive functions at step-count k this time we must use the fundamental prop-erty inductively on the same term but at step-count j lt k This happens because to evaluatedw = D n rec f x rarr t o we evaluate D n t o with the value dv for dw in the environment to showthis invocation is valid we must show dw is itself a valid change But the step-indexed denition toRV nσ rarr τ o constrains the evaluation of the body only forallj lt k

C8 Future workWe have shown that oplus and 0 agree with validity which we consider a key requirement of a coreILC proof However change structures support further operators We leave operator for futurework though we are not aware of particular diculties However deserves special attention

C81 Change compositionWe have looked into change composition and it appears that composition of change expression isnot always valid but we conjecture that composition of change values preserves validity Showingthat change composition is valid appears related to showing that Ahmedrsquos logical equivalence is atransitive relation which is a subtle issue She only proves transitivity in a typed setting and witha stronger relation and her proof does not carry over directly indeed there is no correspondingproof in the untyped setting of Acar Ahmed and Blume [2008]

However the failure of transitivity we have veried is not overly worrisome the problem isthat transitivity is too strong an expectation in general Assume that Γ |= de1 I e1 rarr e2 andΓ |= de2 I e2 rarr e3 and try to show that Γ |= de1 de2 I e1 rarr e3 that is very roughly andignoring the environments we can assume that e1 and e3 terminate and have to show that their

result satisfy some properties To use both our hypotheses we need to know that e1 e2 and e3all terminate but we have no such guaranteed for e2 Indeed if e2 always diverges (because it issay the diverging term ω = (λx rarr x x) (λx rarr x x)) then de1 and de2 are vacuously valid Ife1 and e3 terminate we canrsquot expect de1 de2 to be a change between them To wit take e1 = 0e2 = ω e3 = 10 and de1 = de2 = 0 We can verify that for any Γ we have Γ |= de1 I e1 rarr e2 andΓ |= de2 I e2 rarr e3 while Γ |= de1 de2 I e1 rarr e3 means the absurd Γ |= 0 0 I 0 rarr 10

Apossible x Does transitivity hold if e2 terminates If (k e1 de1 e2) isin RC nτ o and (k e2 de2 e3) isinRC nτ o we still cannot conclude anything But like in Ahmed [2006] if e2 amd e3 are related at allstep counts that is if (k e1 de1 e2) isin RC nτ o and (foralln (n e2 de2 e3) isin RC nτ o) and if additionallye2 terminates we conjecture that Ahmedrsquos proof goes through We have however not yet examinedall details

C9 Development historyThe proof strategy used in this chapter comes from a collaboration between me and Yann Reacutegis-Gianas who came up with the general strategy and the rst partial proofs for untyped λ-calculiAfter we both struggled for a while to set up step-indexing correctly enough for a full proof I rstmanaged to give the denitions in this chapter and complete the proofs here described Reacutegis-Gianasthen mechanized a variant of our proof for untyped λ-calculus in Coq [Giarrusso et al Submitted]that appears here in Sec 173 That proof takes a few dierent choices and unfortunately strictlyspeaking neither proof subsumes the other (1) We also give a non-step-indexed syntactic prooffor simply-typed λ-calculus together with proofs dening validity extensionally (2) To supportremembering intermediate results by conversion to cache-transfer-style (CTS) Reacutegis-Gianasrsquo proofuses a lambda-lifted ArsquoNF syntax instead of plain ANF (3) Reacutegis-Gianasrsquo formalization adds tochange values a single token 0 which is a valid nil change for all valid values Hence if we knowa change is nil we can erase it As a downside evaluating df a da when df is 0 requires lookingup f In our presentation instead if f and g are dierent function values they have dierent nilchanges Such nil changes carry information so they can be evaluated directly but they cannot beerased Techniques in Appendix D enable erasing and reconstructing a nil change df for functionvalue f as long as the value of f is available

C10 ConclusionIn this chapter we have shown how to construct novel models for ILC by using (step-indexed)logical relations and have used this technique to deliver a new syntactic proof of correctness for ILCfor simply-typed lambda-calculus and to deliver the rst proof of correctness of ILC for untypedλ-calculus Moreover our proof appears rather close to existing logical-relations proofs hence webelieve it should be possible to translate other results to ILC theorems

By formally dening intensional validity for closures we provide a solid foundation for the useof defunctionalized function changes (Appendix D)

This proof builds on Ahmed [2006]rsquos work on step-indexed logical relations which enablehandling of powerful semantics feature using rather elementary techniques The only downside isthat it can be tricky to set up the correct denitions especially for a slightly non-standard semanticslike ours As an exercise we have shown that the our semantics is equivalent to more conventionalpresentations down to the produced step counts

Appendix D

Defunctionalizing functionchanges

In Chapter 12 and throughout most of Part II we represent function changes as functions which canonly be applied However incremental programs often inspect changes to decide how to react to themmost eciently Also inspecting function changes would help performance further Representingfunction changes as closures as we do in Chapter 17 and Appendix C allows implementing someoperations more ecient but is not fully satisfactory In this chapter we address these restrictionsby defunctionalizing functions and function changes so that we can inspect both at runtime withoutrestrictions

Once we defunctionalize function changes we can detect at runtime whether a function changeis nil As we have mentioned in Sec 163 nil function changes can typically be handled moreeciently For instance consider t = map f xs a term that maps function f to each element ofsequence xs In general D n t o = dmap f df xs dxs must handle any change in dxs (which weassume to be small) but also apply function change df to each element of xs (which we assume tobe big) However if df is nil we can skip this step decreasing time complexity of dmap f df xs dxsfrom O (|xs | + |dxs |) to O (|dxs |)

We will also present a change structure on defunctionalized function changes and show thatoperations on defunctionalized function changes become cheaper

D1 SetupWe write incremental programs based on ILC by manually writing Haskell code containing bothmanually-written plugin code and code that is transformed systematically based on informalgeneralizations and variants of D n ndash o Our main goal is to study variants of dierentiation andof encodings in Haskell while also studying language plugins for non-trivial primitives such asdierent sorts of collections We make liberal use of GHC extensions where needed

Code in this chapter has been extracted and type-checked though we elide a few details (mostlylanguage extensions and imports from the standard library) Code in this appendix is otherwiseself-contained We have also tested a copy of this code more extensively

As sketched before we dene change structure inside Haskell

class ChangeStruct t where-- Next line declares ∆t as an injective type function

216 Chapter D Defunctionalizing function changes

type ∆t = r | r rarr t(oplus) t rarr ∆t rarr toreplace t rarr ∆t

class ChangeStruct t rArr NilChangeStruct t where0 t rarr ∆t

class ChangeStruct arArr CompChangeStruct a where-- Compose change dx1 with dx2 so that-- x oplus (dx1 dx2) equiv x oplus dx1 oplus dx2() ∆ararr ∆ararr ∆a

With this code we dene type classes ChangeStruct NilChangeStruct and CompChangeStruct Weexplain each of the declared members in turn

First type ∆t represents the change type for type t To improve Haskell type inference wedeclare that ∆ is injective so that ∆t1 = ∆t2 implies t1 = t2 This forbids some potentially usefulchange structures but in exchange it makes type inference vastly more usable

Next we declare oplus 0 and as available to object programs Last we introduce oreplace toconstruct replacement changes characterized by the absorption law x oplus oreplace y = y for all xFunction oreplace encodes t that is the bang operator We use a dierent notation because isreserved for other purposes in Haskell

These typeclasses omit operation intentionally we do not require that change structuressupport a proper dierence operation Nevertheless as discussed b a can be expressed throughoreplace b

We can then dierentiate Haskell functions mdash even polymorphic ones We show a few trivialexamples to highlight how derivatives are encoded in Haskell especially polymorphic ones

-- The standard id functionid ararr aid x = x

-- and its derivativedid ararr ∆ararr ∆adid x dx = dxinstance (NilChangeStruct aChangeStruct b) rArr

ChangeStruct (ararr b) wheretype ∆(ararr b) = ararr ∆ararr ∆bf oplus df = λx rarr f x oplus df x 0xoreplace f = λx dx rarr oreplace (f (x oplus dx))

instance (NilChangeStruct aChangeStruct b) rArrNilChangeStruct (ararr b) where

0f = oreplace f-- Same for apply

apply (ararr b) rarr ararr bapply f x = f xdapply (ararr b) rarr ∆(ararr b) rarr ararr ∆ararr ∆bdapply f df x dx = df x dx

Which polymorphism As visible here polymorphism does not cause particular problemsHowever we only support ML (or prenex) polymorphism not rst-class polymorphism for tworeasons

Chapter D Defunctionalizing function changes 217

First with rst-class polymorphism we can encode existential types existX T and two valuesv1 v2 of the same existential type existX T can hide dierent types T1 T2 Hence a change between v1and v2 requires handling changes between types While we discuss such topics in Chapter 18 weavoid them here

Second prenex polymorphism is a small extension of simply-typed lambda calculus metatheoreti-cally We can treat prenex-polymorphic denitions as families of monomorphic (hence simply-typed)denitions to each denition we can apply all the ILC theory we developed to show dierentiationis correct

D2 DefunctionalizationDefunctionalization is a whole-program transformation that turns a program relying on rst-classfunctions into a rst-order program The resulting program is expressed in a rst-order language(often a subset of the original language) closures are encoded by data values which embed both theclosure environment and a tag to distinguish dierent function Defunctionalization also generatesa function that interprets encoded closures which we call applyFun

In a typed language defunctionalization can be done using generalized algebraic datatypes(GADTs) [Pottier and Gauthier 2004] Each rst-class function of type σ rarr τ is replaced by a valueof a new GADT Fun σ τ that represents defunctionalized function values and has a constructor foreach dierent function If a rst-class function t1 closes over x τ1 the corresponding constructorC1 will take x τ1 as an argument The interpreter for defunctionalized function values has typesignature applyFun Fun σ τ rarr σ rarr τ The resulting programs are expressed in a rst-ordersubset of the original programming language In defunctionalized programs all remaining functionsare rst-order top-level functions

For instance consider the program on sequences in Fig D1

successors [Z] rarr [Z]successors xs = map (λx rarr x + 1) xsnestedLoop [σ ] rarr [τ ] rarr [(σ τ )]nestedLoop xs ys = concatMap (λx rarr map (λy rarr (x y)) ys) xsmap (σ rarr τ ) rarr [σ ] rarr [τ ]map f [ ] = [ ]map f (x xs) = f x map f xsconcatMap (σ rarr [τ ]) rarr [σ ] rarr [τ ]concatMap f xs = concat (map f xs)

Figure D1 A small example program for defunctionalization

In this program the rst-class function values arise from evaluating the three terms λy rarr (x y)that we call pair λx rarr map (λy rarr (x y)) ys that we call mapPair and λx rarr x + 1 that we calladdOne Defunctionalization creates a type Fun σ τ with a constructor for each of the three termsrespectively Pair MapPair and AddOne Both pair and mapPair close over some free variables sotheir corresponding constructors will take an argument for each free variable for pair we have

x σ ` λy rarr (x y) τ rarr (σ τ )

while for mapPair we have

ys [τ ] ` λx rarr map (λy rarr (x y)) ys σ rarr [(σ τ )]

Hence the type of defunctionalized functions Fun σ τ and its interpreter applyFun become

data Fun σ τ whereAddOne Fun Z ZPair σ rarr Fun τ (σ τ )MapPair [τ ] rarr Fun σ [(σ τ )]

applyFun Fun σ τ rarr σ rarr τapplyFun AddOne x = x + 1applyFun (Pair x) y = (x y)applyFun (MapPair ys) x = mapDF (Pair x) ys

We need to also transform the rest of the program accordingly

successors [Z] rarr [Z]successors xs = map (λx rarr x + 1) xsnestedLoopDF [σ ] rarr [τ ] rarr [(σ τ )]nestedLoopDF xs ys = concatMapDF (MapPair ys) xsmapDF Fun σ τ rarr [σ ] rarr [τ ]mapDF f [ ] = [ ]mapDF f (x xs) = applyFun f x mapDF f xsconcatMapDF Fun σ [τ ] rarr [σ ] rarr [τ ]concatMapDF f xs = concat (mapDF f xs)

Figure D2 Defunctionalized program

Some readers might notice this program still uses rst-class function because it encodes multi-argument functions through currying To get a fully rst-order program we encode multi-argumentsfunctions using tuples instead of currying1 Using tuples our example becomes

applyFun (Fun σ τ σ ) rarr τapplyFun (AddOne x) = x + 1applyFun (Pair x y) = (x y)applyFun (MapPair ys x) = mapDF (Pair x ys)mapDF (Fun σ τ [σ ]) rarr [τ ]mapDF (f [ ]) = [ ]mapDF (f x xs) = applyFun (f x) mapDF (f xs)concatMapDF (Fun σ [τ ] [σ ]) rarr [τ ]concatMapDF (f xs) = concat (mapDF (f xs))nestedLoopDF ([σ ] [τ ]) rarr [(σ τ )]nestedLoopDF (xs ys) = concatMapDF (MapPair ys xs)

However wersquoll often write such defunctionalized programs using Haskellrsquos typical curriedsyntax as in Fig D2 Such programs must not contain partial applications

1Strictly speaking the resulting program is still not rst-order because in Haskell multi-argument data constructorssuch as the pair constructor () that we use are still rst-class curried functions unlike for instance in OCaml To make thisprogram truly rst-order we must formalize tuple constructors as a term constructor or formalize these function denitionsas multi-argument functions At this point this discussion is merely a technicality that does not aect our goals but itbecomes relevant if we formalize the resulting rst-order language as in Sec 173

In general defunctionalization creates a constructor C of type Fun σ τ for each rst-classfunction expression Γ ` t σ rarr τ in the source program2 While lambda expression l closesimplicitly over environment ρ n Γ o C closes over it explicitly the values bound to free variablesin environment ρ are passed as arguments to constructor C As a standard optimization we onlyincludes variables that actually occur free in l not all those that are bound in the context where loccurs

D21 Defunctionalization with separate function codesNext we show a slight variant of defunctionalization that we use to achieve our goals with lesscode duplication even at the expense of some eciency we call this variant defunctionalizationwith separate function codes

We rst encode contexts as types and environments as values Empty environments are encodedas empty tuples Environments for a context such as x τ1 y τ2 are encoded as values of typeτ1 times τ2 times

In this defunctionalization variant instead of dening directly a GADT of defunctionalizedfunctions Fun σ τ we dene a GADT of function codes Code env σ τ whose values contain noenvironment Type Code is indexed not just by σ and τ but also by the type of environments andhas a constructor for each rst-class function expression in the source program like Fun σ τ doesin conventional defunctionalization We then dene Fun σ τ as a pair of a function code of typeCode env σ τ and an environment of type env

As a downside separating function codes adds a few indirections to the memory representationof closures for instance we use (AddOne ()) instead of AddOne and (Pair 1) instead of Pair 1

As an upside with separate function codes we can dene many operations generically across allfunction codes (see Appendix D22) instead of generating denitions matching on each functionWhatrsquos more we later dene operations that use raw function codes and need no environmentwe could alternatively dene function codes without using them in the representation of functionvalues at the expense of even more code duplication Code duplication is especially relevant becausewe currently perform defunctionalization by hand though we are condent it would be conceptuallystraightforward to automate the process

Defunctionalizing the program with separate function codes produces the following GADT offunction codes

type Env env = (CompChangeStruct envNilTestable env)data Code env σ τ where

AddOne Code () Z ZPair Code σ τ (σ τ )MapPair Env σ rArr Code [τ ] σ [(σ τ )]

In this denition type Env names a set of typeclass constraints on the type of the environmentusing the ConstraintKinds GHC language extension Satisfying these constraints is necessary toimplement a few operations on functions We also require an interpretation function applyCode forfunction codes If c is the code for a function f = λx rarr t calling applyCode c computes f rsquos outputfrom an environment env for f and an argument for x

applyCode Code env σ τ rarr env rarr σ rarr τapplyCode AddOne () x = x + 1

2We only need codes for functions that are used as rst-class arguments not for other functions though codes for thelatter can be added as well

applyCode Pair x y = (x y)applyCode MapPair ys x = mapDF (F (Pair x)) ys

The implementation of applyCode MapPair only works because of the Env σ constraint for con-structor MapPair this constraint is required when constructing the defunctionalized function valuethat we pass as argument to mapDF

We represent defunctionalized function values through type RawFun env σ τ a type synonymof the product of Code env σ τ and env We encode type σ rarr τ through type Fun σ τ dened asRawFun env σ τ where env is existentially bound We add constraint Env env to the denition ofFun σ τ because implementing oplus on function changes will require using oplus on environments

type RawFun env σ τ = (Code env σ τ env)data Fun σ τ whereF Env env rArr RawFun env σ τ rarr Fun σ τ

To interpret defunctionalized function values we wrap applyCode in a new version of applyFunhaving the same interface as the earlier applyFun

applyFun Fun σ τ rarr σ rarr τapplyFun (F (code env)) arg = applyCode code env arg

The rest of the source program is defunctionalized like before using the new denition of Fun σ τand of applyFun

D22 Defunctionalizing function changesDefunctionalization encodes function values as pairs of function codes and environments In ILC afunction value can change because the environment changes or because the whole closure is replacedby a dierent one with a dierent function code and dierent environment For now we focus onenvironment changes for simplicity To allow inspecting function changes we defunctionalize themas well but treat them specially

Assume we want to defunctionalize a function change df with type ∆(σ rarr τ ) = σ rarr ∆σ rarr ∆τ valid for function f σ rarr τ Instead of transforming type ∆(σ rarr τ ) into Fun σ (Fun ∆σ ∆τ ) wetransform ∆(σ rarr τ ) into a new type DFun σ τ the change type of Fun σ τ (∆(Fun σ τ ) = DFun σ τ )To apply DFun σ τ we introduce an interpreter dapplyFun Fun σ τ rarr ∆(Fun σ τ ) rarr ∆(σ rarr τ )or equivalently dapplyFun Fun σ τ rarr DFun σ τ rarr σ rarr ∆σ rarr ∆τ which also serves asderivative of applyFun

Like we did for Fun σ τ we dene DFun σ τ using function codes That is DFun σ τ pairs afunction code Code env σ τ together with an environment change and a change structure for theenvironment type

data DFun σ τ = forallenvChangeStruct env rArr DF (∆envCode env σ τ )

Without separate function codes the denition of DFun would have to include one case for eachrst-class function

Environment changes

Instead of dening change structures for environments we encode environments using tuples anddene change structures for tuples

We dene rst change structures for empty tuples and pairs Sec 1233

instance ChangeStruct () wheretype ∆() = ()_ oplus _ = ()oreplace _ = ()

instance (ChangeStruct aChangeStruct b) rArr ChangeStruct (a b) wheretype ∆(a b) = (∆a∆b)(a b) oplus (da db) = (a oplus da b oplus db)oreplace (a2 b2) = (oreplace a2 oreplace b2)

To dene change structures for n-uples of other arities we have two choices which we show ontriples (a b c) and can be easily generalized

We can encode triples as nested pairs (a (b (c ()))) Or we can dene change structures fortriples directly

instance (ChangeStruct aChangeStruct bChangeStruct c) rArr ChangeStruct (a b c) where

type ∆(a b c) = (∆a∆b∆c)(a b c) oplus (da db dc) = (a oplus da b oplus db c oplus dc)oreplace (a2 b2 c2) = (oreplace a2 oreplace b2 oreplace c2)

Generalizing from pairs and triples one can dene similar instances for n-uples in general (sayfor values of n up to some high threshold)

Validity and oplus on defunctionalized function changes

A function change df is valid for f if df has the same function code as f and if df rsquos environmentchange is valid for f rsquos environment

DF dρ c B F ρ1 c rarr F ρ2 c Fun σ τ = dρ B ρ1 rarr ρ2 env

where c is a function code of type Code env σ τ and where crsquos type binds the type variable env weuse on the right-hand side

Next we implement oplus on function changes to match our denition of validity as required Weonly need f oplus df to give a result if df is a valid change for f Hence if the function code embeddedin df does not match the one in f we give an error3 However our rst attempt does not typechecksince the typechecker does not know whether the environment and the environment change havecompatible types

instance ChangeStruct (Fun σ τ ) wheretype ∆(Fun σ τ ) = DFun σ τF (env c1) oplus DF (denv c2) =

if c1 equiv c2 thenF (env oplus denv) c1

elseerror Invalid function change in oplus

In particular env oplus denv is reported as ill-typed because we donrsquot know that env and denvhave compatible types Take c1 = Pair c2 = MapPair f = F (env Pair) σ rarr τ and df =

3We originally specied oplus as a total function to avoid formalizing partial functions but as mentioned in Sec 103 we donot rely on the behavior of oplus on invalid changes

DF (denvMapPair) ∆(σ rarr τ ) Assume we evaluate f oplus df = F (env Pair) oplus DF (denvMapPair)there indeed env σ and denv [τ ] so env oplus denv is not type-correct Yet evaluating f oplus dfwould not fail because MapPair and Pair are dierent c1 equiv c2 will return false and env oplus denvwonrsquot be evaluated But the typechecker does not know that

Hence we need an equality operation that produces a witness of type equality We dene theneeded infrastructure with few lines of code First we need a GADT of witnesses of type equalitywe can borrow from GHCrsquos standard library its denition which is just

-- From DataTypeEqualitydata τ1 sim τ2 where

Re τ sim τ

If x has type τ1 sim τ2 and matches pattern Re then by standard GADT typing rules τ1 and τ2 areequal Even if τ1 sim τ2 has only constructor Re a match is necessary since x might be bottomReaders familiar with type theory Agda or Coq will recognize that sim resembles Agdarsquos propositionalequality or Martin-Loumlfrsquos identity types even though it can only represents equality between typesand not between values

Next we implement function codeMatch to compare codes For equal codes this operationproduces a witness that their environment types match4 Using this operation we can complete theabove instance of ChangeStruct (Fun σ τ )

codeMatch Code env1 σ τ rarr Code env2 σ τ rarr Maybe (env1 sim env2)codeMatch AddOne AddOne = Just RecodeMatch Pair Pair = Just RecodeMatch MapPair MapPair = Just RecodeMatch _ _ = Nothinginstance ChangeStruct (Fun σ τ ) where

type ∆(Fun σ τ ) = DFun σ τF (c1 env) oplus DF (c2 denv) =case codeMatch c1 c2 ofJust Re rarr F (c1 env oplus denv)Nothing rarr error Invalid function change in oplus

Applying function changes

After dening environment changes we dene an incremental interpretation function dapplyCodeIf c is the code for a function f = λx rarr t as discussed calling applyCode c computes the output off from an environment env for f and an argument for x Similarly calling dapplyCode c computesthe output of D n f o from an environment env for f an environment change denv valid for env anargument for x and an argument change dx

In our example we have

dapplyCode Code env σ τ rarr env rarr ∆env rarr σ rarr ∆σ rarr ∆τdapplyCode AddOne () () x dx = dxdapplyCode Pair x dx y dy = (dx dy)dapplyCode MapPair ys dys x dx =dmapDF (F (Pair x)) (DF (Pair dx)) ys dys

4If a code is polymorphic in the environment type it must take as argument a representation of its type argument to beused to implement codeMatch We represent type arguments at runtime via instances of Typeable and omit standard detailshere

On top of dapplyCode we can dene dapplyFun which functions as a a derivative for applyFunand allows applying function changes

dapplyFun Fun σ τ rarr DFun σ τ rarr σ rarr ∆σ rarr ∆τdapplyFun (F (c1 env)) (DF (c2 denv)) x dx =

case codeMatch c1 c2 ofJust Re rarr dapplyCode c1 env denv x dxNothing rarr error Invalid function change in dapplyFun

However we can also implement further accessors that inspect function changes We can nownally detect if a change is nil To this end we rst dene a typeclass that allows testing changes todetermine if theyrsquore nil

class NilChangeStruct t rArr NilTestable t whereisNil ∆t rarr Bool

Now a function change is nil only if the contained environment is nil

instance NilTestable (Fun σ τ ) whereisNil DFun σ τ rarr BoolisNil (DF (denv code)) = isNil denv

However this denition of isNil only works given a typeclass instance for NilTestable env we needto add this requirement as a constraint but we cannot add it to isNilrsquos type signature since typevariable env is not in scope there Instead we must add the constraint where we introduce it byexistential quantication just like the constraint ChangeStruct env In particular we can reuse theconstraint Env env

data DFun σ τ =forallenv Env env rArrDF (Code env σ τ ∆env)

instance NilChangeStruct (Fun σ τ ) where0F (codeenv) = DF (code 0env)

instance NilTestable (Fun σ τ ) whereisNil DFun σ τ rarr BoolisNil (DF (code denv)) = isNil denv

We can then wrap a derivative via function wrapDF to return a nil change immediately if atruntime all input changes turn out to be nil This was not possible in the setting described by Caiet al [2014] because nil function changes could not be detected at runtime only at compile timeTo do so we must produce directly a nil change for v = applyFun f x To avoid computing v weassume we can compute a nil change for v without access to v via operation onil and typeclassOnilChangeStruct (omitted here) argument Proxy is a constant required for purely technical reasons

wrapDF OnilChangeStruct τ rArr Fun σ τ rarr DFun σ τ rarr σ rarr ∆σ rarr ∆τwrapDF f df x dx =if isNil df thenonil Proxy -- Change-equivalent to 0applyFun f x

elsedapplyFun f df x dx

D3 Defunctionalization and cache-transfer-styleWe can combine the above ideas with cache-transfer-style (Chapter 17) We show the details quickly

Combining the above with caching we can use defunctionalization as described to implement thefollowing interface for functions in caches For extra generality we use extension ConstraintKindsto allow instances to dene the required typeclass constraints

class FunOps k wheretype Dk k = (dk lowast rarr lowast rarr lowast rarr lowast rarr lowast) | dk rarr ktype ApplyCtx k i o Constraintapply ApplyCtx k i orArr k i o cacherarr irarr (o cache)type DApplyCtx k i o ConstraintdApply DApplyCtx k i orArrDk k i o cache1 cache2 rarr ∆irarr cache1 rarr (∆o cache2)

type DerivCtx k i o Constraintderiv DerivCtx k i orArrk i o cacherarr Dk k i o cache cache

type FunUpdCtx k i o ConstraintfunUpdate FunUpdCtx k i orArrk i o cache1 rarr Dk k i o cache1 cache2 rarr k i o cache2

isNilFun Dk k i o cache1 cache2 rarr Maybe (cache1 sim cache2)updatedDeriv (FunOps k FunUpdCtx k i oDerivCtx k i o) rArrk i o cache1 rarr Dk k i o cache1 cache2 rarr Dk k i o cache2 cache2

updatedDeriv f df = deriv (f lsquofunUpdatelsquo df )

Type constructor k denes the specic constructor for the function type In this interface the typeof function changes DK k i o cache1 cache2 represents functions (encoded by type constructor Dk k)with inputs of type i outputs of type o input cache type cache1 and output cache type cache2 Typescache1 and cache2 coincide for typical function changes but can be dierent for replacement functionchanges or more generally for function changes across functions with dierent implementationsand cache types Correspondingly dApply supports applying such changes across closures withdierent implementations unfortunately unless the two implementations are similar the cachecontent cannot be reused

To implement this interface it is sucient to dene a type of codes that admits an instance of type-class Codelike Earlier denitions of codeMatch applyFun and dapplyFun adapted to cache-transferstyle

class Codelike code wherecodeMatch code env1 a1 r1 cache1 rarr code env2 a2 r2 cache2 rarrMaybe ((env1 cache1) sim (env2 cache2))

applyCode code env a b cacherarr env rarr ararr (b cache)dapplyCode code env a b cacherarr ∆env rarr ∆ararr cacherarr (∆b cache)

Typically a defunctionalized program uses no rst-class functions and has a single type of functionsHaving a type class of function codes weakens that property We can still use a single type of codesthroughout our program we can also use dierent types of codes for dierent parts of a programwithout allowing for communications between those parts

On top of the Codelike interface we can dene instances of interface FunOps and ChangeStructTo demonstrate this we show a complete implementation in Figs D3 and D4 Similarly to oplus we

can implement by comparing codes contained in function changes this is not straightforwardwhen using closure conversion as in Sec 1742 unless we store even more type representations

We can detect nil changes at runtime even in cache-passing style We can for instance denefunction wrapDer1 which does something trickier than wrapDF here we assume that dg is aderivative taking function change df as argument Then if df is nil dg df must also be nil so wecan return a nil change directly together with the input cache The input cache has the requiredtype because in this case the type cache2 of the desired output cache has matches type cache1 of theinput cache (because we have a nil change df across them) the return value of isNilFun witnessesthis type equality

wrapDer1 (FunOps kOnilChangeStruct r prime) rArr(Dk k i o cache1 cache2 rarr f cache1 rarr (∆r prime f cache2)) rarr(Dk k i o cache1 cache2 rarr f cache1 rarr (∆r prime f cache2))

wrapDer1 dg df c =case isNilFun df ofJust Re rarr (onil Proxy c)Nothing rarr dg df c

We can also hide the dierence between dierence cache types by dening a uniform type ofcaches Cache code To hide caches we can use a pair of a cache (of type cache) and a code for thatcache type When applying a function (change) to a cache or when composing the function we cancompare the function code with the cache code

In this code we have not shown support for replacement values for functions we leave detailsto our implementation

type RawFun a b code env cache = (code env a b cache env)type RawDFun a b code env cache = (code env a b cache∆env)data Fun a b code = forallenv cache Env env rArr

F (RawFun a b code env cache)data DFun a b code = forallenv cache Env env rArr

DF (RawDFun a b code env cache)-- This cache is not indexed by a and b

data Cache code = foralla b env cacheC (code env a b cache) cache-- Wrapper

data FunW code a b cache whereW Fun a b coderarr FunW code a b (Cache code)

data DFunW code a b cache1 cache2 whereDW DFun a b coderarr DFunW code a b (Cache code) (Cache code)

derivFun Fun a b coderarr DFun a b codederivFun (F (code env)) = DF (code 0env)oplusBase Codelike coderArr Fun a b coderarr DFun a b coderarr Fun a b codeoplusBase (F (c1 env)) (DF (c2 denv)) =

case codeMatch c1 c2 ofJust Re rarrF (c1 env oplus denv)

_rarr error INVALID call to oplusBase

ocomposeBase Codelike coderArr DFun a b coderarr DFun a b coderarr DFun a b codeocomposeBase (DF (c1 denv1)) (DF (c2 denv2)) =case codeMatch c1 c2 ofJust Re rarrDF (c1 denv1 denv2)

_rarr error INVALID call to ocomposeBase

instance Codelike coderArr ChangeStruct (Fun a b code) wheretype ∆(Fun a b code) = DFun a b code(oplus) = oplusBase

instance Codelike coderArr NilChangeStruct (Fun a b code) where0F (cenv) = DF (c 0env)

instance Codelike coderArr CompChangeStruct (Fun a b code) wheredf 1 df 2 = ocomposeBase df 1 df 2

instance Codelike coderArr NilTestable (Fun a b code) whereisNil (DF (c env)) = isNil env

Figure D3 Implementing change structures using Codelike instances

applyRaw Codelike coderArr RawFun a b code env cacherarr ararr (b cache)applyRaw (code env) = applyCode code envdapplyRaw Codelike coderArr RawDFun a b code env cacherarr ∆ararr cacherarr (∆b cache)dapplyRaw (code denv) = dapplyCode code denvapplyFun Codelike coderArr Fun a b coderarr ararr (bCache code)applyFun (F f (code env)) arg =(id lowastlowastlowast C code) $ applyRaw f arg

dapplyFun Codelike coderArr DFun a b coderarr ∆ararr Cache coderarr (∆bCache code)dapplyFun (DF (code1 denv)) darg (C code2 cache1) =case codeMatch code1 code2 ofJust Re rarr(id lowastlowastlowast C code1) $ dapplyCode code1 denv darg cache1

_rarr error INVALID call to dapplyFun

instance Codelike coderArr FunOps (FunW code) wheretype Dk (FunW code) = DFunW codeapply (W f ) = applyFun fdApply (DW df ) = dapplyFun dfderiv (W f ) = DW (derivFun f )funUpdate (W f ) (DW df ) = W (f oplus df )isNilFun (DW df ) =if isNil df thenJust Re

elseNothing

Figure D4 Implementing FunOps using Codelike instances

Acknowledgments

I thank my supervisor Klaus Ostermann for encouraging me to do a PhD and for his years ofsupervision He has always been patient and kind

Thanks also to Fritz Henglein for agreeing to review my thesisI have many people to thank and too little time to do it properly Irsquoll nish these acknowledgments

in the nal versionThe research presented here has beneted from the help of coauthors reviewers and people

who oered helpful comments including comments from the POPLrsquo18 reviewers Moreover thisPhD thesis wouldnrsquot be possible without all the people who have supported me both during all theyears of my PhD but also in the years before

230 Chapter D Acknowledgments

Bibliography[Citing pages are listed after each reference]

Umut A Acar Self-Adjusting Computation PhD thesis Princeton University 2005 [Pages 73 135161 and 174]

Umut A Acar Self-adjusting computation (an overview) In PEPM pages 1ndash6 ACM 2009 [Pages 2128 161 and 173]

Umut A Acar Guy E Blelloch and Robert Harper Adaptive functional programming TOPLAS 28(6)990ndash1034 November 2006 [Page 173]

Umut A Acar Amal Ahmed and Matthias Blume Imperative self-adjusting computation InProceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programminglanguages POPL rsquo08 pages 309ndash322 New York NY USA 2008 ACM ISBN 978-1-59593-689-9doi 10114513284381328476 URL hpdoiacmorg10114513284381328476 [Pages 143 146161 194 200 201 202 204 205 and 212]

Umut A Acar Guy E Blelloch Matthias Blume Robert Harper and Kanat Tangwongsan Anexperimental analysis of self-adjusting computation TOPLAS 32(1)31ndash353 November 2009[Page 173]

Umut A Acar Guy Blelloch Ruy Ley-Wild Kanat Tangwongsan and Duru Turkoglu Traceable datatypes for self-adjusting computation In PLDI pages 483ndash496 ACM 2010 [Pages 162 and 173]

Stefan Ackermann Vojin Jovanovic Tiark Rompf and Martin Odersky Jet An embedded DSL forhigh performance big data processing In Intrsquol Workshop on End-to-end Management of Big Data(BigData) 2012 [Page 49]

Agda Development Team The Agda Wiki hpwikiportalchalmersseagda 2013 Accessed on2017-06-29 [Page 181]

Amal Ahmed Step-Indexed Syntactic Logical Relations for Recursive and Quantied Types InProgramming Languages and Systems pages 69ndash83 Springer Berlin Heidelberg March 2006 doi10100711693024_6 [Pages 143 146 161 163 193 194 201 202 204 205 210 212 and 213]

Guillaume Allais James Chapman Conor McBride and James McKinna Type-and-scope safeprograms and their proofs In Yves Bertot and Viktor Vafeiadis editors CPP 2017 ACM NewYork NY New York January 2017 ISBN 978-1-4503-4705-1 [Page 186]

Thorsten Altenkirch and Bernhard Reus Monadic Presentations of Lambda Terms Using GeneralizedInductive Types In Joumlrg Flum and Mario Rodriguez-Artalejo editors Computer Science LogicLecture Notes in Computer Science pages 453ndash468 Springer Berlin Heidelberg September 1999ISBN 978-3-540-66536-6 978-3-540-48168-3 doi 1010073-540-48168-0_32 [Page 186]

232 Bibliography

Nada Amin and Tiark Rompf Type soundness proofs with denitional interpreters In Proceedingsof the 44th ACM SIGPLAN Symposium on Principles of Programming Languages POPL 2017 pages666ndash679 New York NY USA 2017 ACM ISBN 978-1-4503-4660-3 doi 10114530098373009866URL hpdoiacmorg10114530098373009866 [Pages 186 and 201]

Andrew W Appel and Trevor Jim Shrinking lambda expressions in linear time JFP 7515ndash5401997 [Page 129]

Andrew W Appel and David McAllester An indexed model of recursive types for foundationalproof-carrying code ACM Trans Program Lang Syst 23(5)657ndash683 September 2001 ISSN0164-0925 doi 101145504709504712 URL hpdoiacmorg101145504709504712 [Page 161]

Robert Atkey The incremental λ-calculus and relational parametricity 2015 URL hpbentniborgposts2015-04-23-incremental-lambda-calculus-and-parametricityhtml Last accessed on 29June 2017 [Pages 81 126 163 and 193]

Lennart Augustsson and Magnus Carlsson An exercise in dependent types A well-typed interpreterIn In Workshop on Dependent Types in Programming Gothenburg 1999 [Page 186]

H P Barendregt The Lambda Calculus Its Syntax and Semantics Elsevier 1984 [Page 63]

Henk P Barendregt Lambda Calculi with Types pages 117ndash309 Oxford University Press New York1992 [Pages 164 166 and 171]

Nick Benton Chung-Kil Hur Andrew J Kennedy and Conor McBride Strongly Typed TermRepresentations in Coq Journal of Automated Reasoning 49(2)141ndash159 August 2012 ISSN0168-7433 1573-0670 doi 101007s10817-011-9219-0 [Page 186]

Jean-Philippe Bernardy and Marc Lasson Realizability and parametricity in pure type systemsIn Proceedings of the 14th International Conference on Foundations of Software Science and Com-putational Structures Part of the Joint European Conferences on Theory and Practice of SoftwareFOSSACSrsquo11ETAPSrsquo11 pages 108ndash122 Berlin Heidelberg 2011 Springer-Verlag ISBN 978-3-642-19804-5 [Pages 89 163 164 165 166 167 172 and 177]

Jean-Philippe Bernardy Patrik Jansson and Ross Paterson Parametricity and dependent typesIn Proceedings of the 15th ACM SIGPLAN International Conference on Functional ProgrammingICFP rsquo10 pages 345ndash356 New York NY USA 2010 ACM ISBN 978-1-60558-794-3 doi 10114518635431863592 URL hpdoiacmorg10114518635431863592 [Pages 166 167 and 172]

Gavin M Bierman Erik Meijer and Mads Torgersen Lost in translation formalizing proposedextensions to C In OOPSLA pages 479ndash498 ACM 2007 [Page 47]

Jose A Blakeley Per-Ake Larson and Frank Wm Tompa Eciently updating materialized viewsIn SIGMOD pages 61ndash71 ACM 1986 [Pages v vii 2 161 and 174]

Maximilian C Bolingbroke and Simon L Peyton Jones Types are calling conventions In Proceedingsof the 2nd ACM SIGPLAN Symposium on Haskell Haskell rsquo09 pages 1ndash12 New York NY USA2009 ACM ISBN 978-1-60558-508-6 doi 10114515966381596640 [Page 160]

KJ Brown AK Sujeeth Hyouk Joong Lee T Rompf H Cha M Odersky and K OlukotunA heterogeneous parallel framework for domain-specic languages In Parallel Architecturesand Compilation Techniques (PACT) 2011 International Conference on pages 89ndash100 2011 doi101109PACT201115 [Page 78]

Bibliography 233

Yufei Cai PhD thesis University of Tuumlbingen 2017 In preparation [Pages 81 and 126]

Yufei Cai Paolo G Giarrusso Tillmann Rendel and Klaus Ostermann A theory of changes for higher-order languages mdash Incrementalizing λ-calculi by static dierentiation In Proceedings of the 35thACM SIGPLAN Conference on Programming Language Design and Implementation PLDI rsquo14 pages145ndash155 New York NY USA 2014 ACM ISBN 978-1-4503-2784-8 doi 10114525942912594304URL hpdoiacmorg10114525942912594304 [Pages 2 3 59 66 84 92 93 107 117 121 124125 135 136 137 138 141 145 146 154 159 175 177 182 184 190 and 223]

Jacques Carette Oleg Kiselyov and Chung-chieh Shan Finally tagless partially evaluated Taglessstaged interpreters for simpler typed languages JFP 19509ndash543 2009 [Page 31]

Craig Chambers Ashish Raniwala Frances Perry Stephen Adams Robert R Henry Robert Bradshawand Nathan Weizenbaum FlumeJava easy ecient data-parallel pipelines In PLDI pages 363ndash375 ACM 2010 [Pages 10 20 and 47]

Yan Chen Joshua Duneld Matthew A Hammer and Umut A Acar Implicit self-adjustingcomputation for purely functional programs In ICFP pages 129ndash141 ACM 2011 [Page 173]

Yan Chen Umut A Acar and Kanat Tangwongsan Functional programming for dynamic and largedata with self-adjusting computation In Proceedings of the 19th ACM SIGPLAN InternationalConference on Functional Programming ICFP rsquo14 pages 227ndash240 New York NY USA 2014 ACMISBN 978-1-4503-2873-9 doi 10114526281362628150 URL hpdoiacmorg10114526281362628150 [Page 162]

James Cheney Sam Lindley and Philip Wadler A Practical Theory of Language-integrated QueryIn Proceedings of the 18th ACM SIGPLAN International Conference on Functional ProgrammingICFP rsquo13 pages 403ndash416 New York NY USA 2013 ACM ISBN 978-1-4503-2326-0 doi 10114525003652500586 [Page 174]

Adam Chlipala Parametric higher-order abstract syntax for mechanized semantics In Proceedingsof the 13th ACM SIGPLAN international conference on Functional programming ICFP rsquo08 pages143ndash156 New York NY USA 2008 ACM ISBN 978-1-59593-919-7 doi 10114514112041411226URL hpdoiacmorg10114514112041411226 [Page 186]

Ezgi Ccediliccedilek Zoe Paraskevopoulou and Deepak Garg A Type Theory for Incremental ComputationalComplexity with Control Flow Changes In Proceedings of the 21st ACM SIGPLAN InternationalConference on Functional Programming ICFP 2016 pages 132ndash145 New York NY USA 2016ACM ISBN 978-1-4503-4219-3 doi 10114529519132951950 [Page 161]

Thomas H Cormen Charles E Leiserson Ronald L Rivest and Cliord Stein Introduction to Algo-rithms MIT Press Cambridge MA USA 2nd edition 2001 ISBN 0-262-03293-7 9780262032933[Page 73]

Oren Eini The pain of implementing LINQ providers Commun ACM 54(8)55ndash61 2011 [Page 47]

Conal Elliott and Paul Hudak Functional reactive animation In ICFP pages 263ndash273 ACM 1997[Page 174]

Conal Elliott Sigbjorn Finne and Oege de Moor Compiling embedded languages JFP 13(2)455ndash4812003 [Pages 20 21 and 48]

Burak Emir Andrew Kennedy Claudio Russo and Dachuan Yu Variance and generalized constraintsfor C] generics In ECOOP pages 279ndash303 Springer-Verlag 2006 [Page 4]

234 Bibliography

Burak Emir Qin Ma and Martin Odersky Translation correctness for rst-order object-orientedpattern matching In Proceedings of the 5th Asian conference on Programming languages andsystems APLASrsquo07 pages 54ndash70 Berlin Heidelberg 2007a Springer-Verlag ISBN 3-540-76636-7978-3-540-76636-0 URL hpdlacmorgcitationcfmid=17847741784782 [Page 4]

Burak Emir Martin Odersky and John Williams Matching objects with patterns In ECOOP pages273ndash298 Springer-Verlag 2007b [Pages 4 32 44 and 45]

Sebastian Erdweg Paolo G Giarrusso and Tillmann Rendel Language composition untangled InProceedings of Workshop on Language Descriptions Tools and Applications (LDTA) LDTA rsquo12 pages71ndash78 New York NY USA 2012 ACM ISBN 978-1-4503-1536-4 doi 10114524270482427055URL hpdoiacmorg10114524270482427055 [Pages 4 and 187]

Sebastian Erdweg Tijs van der Storm and Yi Dai Capture-Avoiding and Hygienic ProgramTransformations In ECOOP 2014 ndash Object-Oriented Programming pages 489ndash514 Springer BerlinHeidelberg July 2014 doi 101007978-3-662-44202-9_20 [Page 93]

Andrew Farmer Andy Gill Ed Komp and Neil Sculthorpe The HERMIT in the Machine APlugin for the Interactive Transformation of GHC Core Language Programs In Proceedings ofthe 2012 Haskell Symposium Haskell rsquo12 pages 1ndash12 New York NY USA 2012 ACM ISBN978-1-4503-1574-6 doi 10114523645062364508 [Page 78]

Leonidas Fegaras Optimizing queries with object updates Journal of Intelligent InformationSystems 12219ndash242 1999 ISSN 0925-9902 URL hpdxdoiorg101023A1008757010516101023A1008757010516 [Page 71]

Leonidas Fegaras and David Maier Towards an eective calculus for object query languages InProceedings of the 1995 ACM SIGMOD international conference on Management of data SIGMODrsquo95 pages 47ndash58 New York NY USA 1995 ACM ISBN 0-89791-731-6 doi 101145223784223789URL hpdoiacmorg101145223784223789 [Page 71]

Leonidas Fegaras and David Maier Optimizing object queries using an eective calculus ACMTrans Database Systems (TODS) 25457ndash516 2000 [Pages 11 21 and 48]

Denis Firsov and Wolfgang Jeltsch Purely Functional Incremental Computing In Fernando Castorand Yu David Liu editors Programming Languages number 9889 in Lecture Notes in ComputerScience pages 62ndash77 Springer International Publishing September 2016 ISBN 978-3-319-45278-4978-3-319-45279-1 doi 101007978-3-319-45279-1_5 [Pages 70 75 156 and 161]

Philip J Fleming and John J Wallace How not to lie with statistics the correct way to summarizebenchmark results Commun ACM 29(3)218ndash221 March 1986 [Pages xvii 39 and 41]

Bent Flyvbjerg Five misunderstandings about case-study research Qualitative Inquiry 12(2)219ndash245 2006 [Page 23]

Miguel Garcia Anastasia Izmaylova and Sibylle Schupp Extending Scala with database querycapability Journal of Object Technology 9(4)45ndash68 2010 [Page 48]

Andy Georges Dries Buytaert and Lieven Eeckhout Statistically rigorous Java performanceevaluation In OOPSLA pages 57ndash76 ACM 2007 [Pages 40 41 and 132]

Paolo G Giarrusso Open GADTs and declaration-site variance A problem statement In Proceedingsof the 4th Workshop on Scala SCALA rsquo13 pages 51ndash54 New York NY USA 2013 ACM ISBN 978-1-4503-2064-1 doi 10114524898372489842 URL hpdoiacmorg10114524898372489842[Page 4]

Bibliography 235

Paolo G Giarrusso Klaus Ostermann Michael Eichberg Ralf Mitschke Tillmann Rendel andChristian Kaumlstner Reify your collection queries for modularity and speed CoRR abs121062842012 URL hparxivorgabs12106284 [Page 20]

Paolo G Giarrusso Klaus Ostermann Michael Eichberg Ralf Mitschke Tillmann Rendel andChristian Kaumlstner Reify your collection queries for modularity and speed In AOSD pages 1ndash12ACM 2013 [Pages 2 and 3]

Paolo G Giarrusso Yann Reacutegis-Gianas and Philipp Schuster Incremental λ-calculus in cache-transfer style Static memoization by program transformation Submitted [Pages 3 and 213]

Andrew Gill John Launchbury and Simon L Peyton Jones A short cut to deforestation In FPCApages 223ndash232 ACM 1993 [Page 9]

Jean-Yves Girard Paul Taylor and Yves Lafont Proofs and types Cambridge University PressCambridge 1989 [Page 164]

Dieter Gluche Torsten Grust Christof Mainberger and Marc Scholl Incremental updates formaterialized OQL views In Deductive and Object-Oriented Databases volume 1341 of LNCS pages52ndash66 Springer 1997 [Pages 57 127 129 131 and 174]

T Grust and MH Scholl Monoid comprehensions as a target for the translation of oql InWorkshop on Performance Enhancement in Object Bases Schloss Dagstuhl Citeseer 1996a doi10113649251011364481(abstract) [Page 71]

Torsten Grust Comprehending queries PhD thesis University of Konstanz 1999 [Pages 32 and 48]

Torsten Grust and Marc H Scholl Translating OQL into monoid comprehensions Stuck withnested loops Technical Report 3a1996 University of Konstanz Department of Mathematicsand Computer Science Database Research Group September 1996b [Page 32]

Torsten Grust and Marc H Scholl How to comprehend queries functionally Journal of IntelligentInformation Systems 12191ndash218 1999 [Page 21]

Torsten Grust and Alexander Ulrich First-class functions for rst-order database engines In Todd JGreen and Alan Schmitt editors Proceedings of the 14th International Symposium on DatabaseProgramming Languages (DBPL 2013) August 30 2013 Riva del Garda Trento Italy 2013 URLhparxivorgabs13080158 [Page 174]

Torsten Grust Manuel Mayr Jan Rittinger and Tom Schreiber FERRY database-supported programexecution In Proc Intrsquol SIGMOD Conf on Management of Data (SIGMOD) pages 1063ndash1066 ACM2009 [Pages 48 and 174]

Torsten Grust Nils Schweinsberg and Alexander Ulrich Functions Are Data Too Defunction-alization for PLSQL Proc VLDB Endow 6(12)1214ndash1217 August 2013 ISSN 2150-8097 doi101477825362742536279 [Page 174]

Ashish Gupta and Inderpal Singh Mumick Maintenance of materialized views problems techniquesand applications In Ashish Gupta and Iderpal Singh Mumick editors Materialized views pages145ndash157 MIT Press 1999 [Pages v vii 2 128 161 173 and 174]

Elnar Hajiyev Mathieu Verbaere and Oege de Moor CodeQuest Scalable source code queries withDatalog In ECOOP pages 2ndash27 Springer 2006 [Page 49]

236 Bibliography

Matthew A Hammer Georg Neis Yan Chen and Umut A Acar Self-adjusting stack machines InOOPSLA pages 753ndash772 ACM 2011 [Page 173]

Matthew A Hammer Khoo Yit Phang Michael Hicks and Jerey S Foster Adapton ComposableDemand-driven Incremental Computation In Proceedings of the 35th ACM SIGPLAN Conferenceon Programming Language Design and Implementation PLDI rsquo14 pages 156ndash166 New York NYUSA 2014 ACM ISBN 978-1-4503-2784-8 doi 10114525942912594324 [Page 161]

Matthew A Hammer Joshua Duneld Kyle Headley Nicholas Labich Jerey S Foster MichaelHicks and David Van Horn Incremental computation with names In Proceedings of the 2015ACM SIGPLAN International Conference on Object-Oriented Programming Systems Languagesand Applications OOPSLA 2015 pages 748ndash766 New York NY USA 2015 ACM ISBN 978-1-4503-3689-5 doi 10114528142702814305 URL hpdoiacmorg10114528142702814305[Page 161]

Matthew A Hammer Joshua Duneld Dimitrios J Economou and Monal Narasimhamurthy TypedAdapton Renement types for incremental computations with precise names arXiv161000097[cs] October 2016 [Page 161]

Robert Harper Constructing type systems over an operational semantics Journal of Symbolic Com-putation 14(1)71ndash84 July 1992 ISSN 0747-7171 doi 1010160747-7171(92)90026-Z [Page 118]

Robert Harper Practical Foundations for Programming Languages Cambridge University Press 2ndedition 2016 ISBN 9781107150300 [Page 163]

Kyle Headley and Matthew A Hammer The Random Access Zipper Simple Purely-FunctionalSequences arXiv160806009 [cs] August 2016 [Page 73]

Fritz Henglein Optimizing Relational Algebra Operations Using Generic Equivalence Discriminatorsand Lazy Products In Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluationand Program Manipulation PEPM rsquo10 pages 73ndash82 New York NY USA 2010 ACM ISBN978-1-60558-727-1 doi 10114517063561706372 [Page 48]

Fritz Henglein and Ken Friis Larsen Generic Multiset Programming for Language-integratedQuerying In Proceedings of the 6th ACM SIGPLAN Workshop on Generic Programming WGP rsquo10pages 49ndash60 New York NY USA 2010 ACM ISBN 978-1-4503-0251-7 doi 10114518634951863503 [Page 48]

R Hinze and R Paterson Finger trees a simple general-purpose data structure Journal of FunctionalProgramming 16(2)197ndash218 2006 [Pages 73 and 75]

Christian Hofer Klaus Ostermann Tillmann Rendel and Adriaan Moors Polymorphic embeddingof DSLs In GPCE pages 137ndash148 ACM 2008 [Page 44]

Martin Hofmann Conservativity of equality reection over intensional type theory In Types forProofs and Programs volume 1158 of LNCS pages 153ndash164 Springer-Verlag 1996 [Page 182]

David Hovemeyer and William Pugh Finding bugs is easy SIGPLAN Notices 39(12)92ndash106 2004[Pages 10 37 and 49]

Lourdes del Carmen Gonzalez Huesca Incrementality and Eect Simulation in the Simply TypedLambda Calculus PhD thesis Universite Paris Diderot-Paris VII November 2015 [Pages 59and 175]

Bibliography 237

Vojin Jovanovic Amir Shaikhha Sandro Stucki Vladimir Nikolaev Christoph Koch and MartinOdersky Yin-yang Concealing the deep embedding of dsls In Proceedings of the 2014 InternationalConference on Generative Programming Concepts and Experiences GPCE 2014 pages 73ndash82 NewYork NY USA 2014 ACM ISBN 978-1-4503-3161-6 doi 10114526587612658771 URLhpdoiacmorg10114526587612658771 [Page 49]

Christian Kaumlstner Paolo G Giarrusso and Klaus Ostermann Partial preprocessing C code forvariability analysis In Proceedings of the Fifth International Workshop on Variability Modelling ofSoftware-intensive Systems (VaMoS) pages 137ndash140 ACM January 2011a ISBN 978-1-4503-0570-9URL hpwwwinformatikuni-marburgde~kaestnervamos11pdf [Page 4]

Christian Kaumlstner Paolo G Giarrusso Tillmann Rendel Sebastian Erdweg Klaus Ostermann andThorsten Berger Variability-aware parsing in the presence of lexical macros and conditionalcompilation In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-OrientedProgramming Systems Languages and Applications (OOPSLA) ACM October 2011b [Page 4]

Chantal Keller and Thorsten Altenkirch Hereditary Substitutions for Simple Types FormalizedIn Proceedings of the Third ACM SIGPLAN Workshop on Mathematically Structured FunctionalProgramming MSFP rsquo10 pages 3ndash10 New York NY USA 2010 ACM ISBN 978-1-4503-0255-5doi 10114518635971863601 [Page 186]

Gregor Kiczales John Lamping Anurag Menhdhekar Chris Maeda Cristina Lopes Jean-MarcLoingtier and John Irwin Aspect-oriented programming In ECOOP pages 220ndash242 1997[Page 9]

Edward Kmett Mirrored lenses Last accessed on 29 June 2017 2012 URL hpcomonadcomreader2012mirrored-lenses [Page 169]

Christoph Koch Incremental query evaluation in a ring of databases In Symp Principles of DatabaseSystems (PODS) pages 87ndash98 ACM 2010 [Pages 57 161 and 174]

Christoph Koch Yanif Ahmad Oliver Kennedy Milos Nikolic Andres Noumltzli Daniel Lupei andAmir Shaikhha DBToaster higher-order delta processing for dynamic frequently fresh viewsThe VLDB Journal 23(2)253ndash278 2014 ISSN 1066-8888 doi 101007s00778-013-0348-4 URLhpdxdoiorg101007s00778-013-0348-4 [Pages 154 158 161 and 174]

Christoph Koch Daniel Lupei and Val Tannen Incremental View Maintenance For CollectionProgramming In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principlesof Database Systems PODS rsquo16 pages 75ndash90 New York NY USA 2016 ACM ISBN 978-1-4503-4191-2 doi 10114529022512902286 [Pages 2 57 159 160 161 and 175]

Neelakantan R Krishnaswami and Derek Dreyer Internalizing Relational Parametricity in theExtensional Calculus of Constructions In Simona Ronchi Della Rocca editor Computer ScienceLogic 2013 (CSL 2013) volume 23 of Leibniz International Proceedings in Informatics (LIPIcs) pages432ndash451 Dagstuhl Germany 2013 Schloss DagstuhlndashLeibniz-Zentrum fuer Informatik ISBN978-3-939897-60-6 doi httpdxdoiorg104230LIPIcsCSL2013432 [Page 125]

Ralf Laumlmmel Googlersquos MapReduce programming model mdash revisited Sci Comput Program 68(3)208ndash237 October 2007 [Pages 67 and 129]

Daan Leijen and Erik Meijer Domain specic embedded compilers In DSL pages 109ndash122 ACM1999 [Page 48]

238 Bibliography

Yanhong A Liu Eciency by incrementalization An introduction HOSC 13(4)289ndash313 2000[Pages 2 56 160 161 176 and 178]

Yanhong A Liu and Tim Teitelbaum Caching intermediate results for program improvementIn Proceedings of the 1995 ACM SIGPLAN symposium on Partial evaluation and semantics-basedprogram manipulation PEPM rsquo95 pages 190ndash201 New York NY USA 1995 ACM ISBN 0-89791-720-0 doi 101145215465215590 URL hpdoiacmorg101145215465215590 [Pages v vii135 and 136]

Ingo Maier and Martin Odersky Higher-order reactive programming with incremental lists InECOOP pages 707ndash731 Springer-Verlag 2013 [Pages 43 and 174]

Simon Marlow and Simon Peyton Jones Making a fast curry pushenter vs evalapply for higher-order languages Journal of Functional Programming 16(4-5)415ndash449 2006 ISSN 1469-7653 doi101017S0956796806005995 [Page 160]

Conor McBride The derivative of a regular type is its type of one-hole contexts Last accessed on29 June 2017 2001 URL hpwwwstrictlypositiveorgdipdf [Page 78]

Conor McBride Outrageous but Meaningful Coincidences Dependent Type-safe Syntax andEvaluation In Proceedings of the 6th ACM SIGPLAN Workshop on Generic Programming WGPrsquo10 pages 1ndash12 New York NY USA 2010 ACM ISBN 978-1-4503-0251-7 doi 10114518634951863497 [Page 186]

Erik Meijer Brian Beckman and Gavin Bierman LINQ reconciling objects relations and XML inthe NET framework In Proc Intrsquol SIGMOD Conf on Management of Data (SIGMOD) page 706ACM 2006 [Page 47]

John C Mitchell Foundations of programming languages MIT Press 1996 [Page 118]

Adriaan Moors Type constructor polymorphism 2010 URL hpadriaanmgithubcomresearch20101006new-in-scala-28-type-constructor-inference Last accessed on 2012-04-11 [Page 45]

Adriaan Moors Tiark Rompf Philipp Haller and Martin Odersky Scala-virtualized In ProcWorkshop on Partial Evaluation and Semantics-Based Program Manipulation PEPM rsquo12 pages117ndash120 ACM 2012 [Page 44]

Steven S Muchnick Advanced Compiler Design and Implementation Morgan Kaufmann 1997 ISBN1-55860-320-4 [Page 21]

Russell OrsquoConnor Polymorphic update with van Laarhoven lenses Last accessed on 29 June 20172012 URL hpr6cablog20120623T104901Zhtml [Page 169]

M Odersky and A Moors Fighting bit rot with types (experience report Scala collections) InIARCS Conf Foundations of Software Technology and Theoretical Computer Science volume 4 pages427ndash451 2009 [Pages 20 23 30 and 48]

Martin Odersky Pimp my library 2006 URL hpwwwartimacomweblogsviewpostjspthread=179766 Last accessed on 10 April 2012 [Page 25]

Martin Odersky The Scala language specication version 29 2011 [Page 25]

Martin Odersky Lex Spoon and Bill Venners Programming in Scala Artima Inc 2nd edition 2011[Pages 10 23 25 45 and 46]

Bibliography 239

Bruno CdS Oliveira Adriaan Moors and Martin Odersky Type classes as objects and implicits InOOPSLA OOPSLA rsquo10 pages 341ndash360 ACM 2010 [Page 30]

Klaus Ostermann Paolo G Giarrusso Christian Kaumlstner and Tillmann Rendel Revisiting infor-mation hiding Reections on classical and nonclassical modularity In Proceedings of the 25thEuropean Conference on Object-Oriented Programming (ECOOP) Springer-Verlag 2011 URLhpwwwinformatikuni-marburgde~kaestnerecoop11pdf [Page 3]

Scott Owens Magnus O Myreen Ramana Kumar and Yong Kiam Tan Functional Big-StepSemantics In Proceedings of the 25th European Symposium on Programming Languages andSystems - Volume 9632 pages 589ndash615 New York NY USA 2016 Springer-Verlag New York IncISBN 978-3-662-49497-4 doi 101007978-3-662-49498-1_23 [Pages 186 and 201]

Robert Paige and Shaye Koenig Finite dierencing of computable expressions TOPLAS 4(3)402ndash454 July 1982 [Pages v vii 2 57 161 and 174]

Simon L Peyton Jones and Simon Marlow Secrets of the Glasgow Haskell Compiler inliner JFP 12(4-5)393ndash434 2002 [Pages 9 and 32]

F Pfenning and C Elliot Higher-order abstract syntax In PLDI pages 199ndash208 ACM 1988 [Pages 20and 28]

Frank Pfenning Partial polymorphic type inference and higher-order unication In Proc ACMConf on LISP and Functional Programming LFP rsquo88 pages 153ndash163 ACM 1988 [Page 45]

Benjamin C Pierce Types and Programming Languages MIT Press 1st edition 2002 [Pages 33 181183 184 and 201]

Franccedilois Pottier and Nadji Gauthier Polymorphic Typed Defunctionalization In Proceedings ofthe 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages POPL rsquo04pages 89ndash98 New York NY USA 2004 ACM ISBN 1-58113-729-X doi 101145964001964009[Page 217]

G Ramalingam and Thomas Reps A categorized bibliography on incremental computation InPOPL pages 502ndash510 ACM 1993 [Pages 55 161 and 173]

Tiark Rompf and Nada Amin Functional Pearl A SQL to C Compiler in 500 Lines of Code InProceedings of the 20th ACM SIGPLAN International Conference on Functional Programming ICFP2015 pages 2ndash9 New York NY USA 2015 ACM ISBN 978-1-4503-3669-7 doi 10114527847312784760 [Page 1]

Tiark Rompf and Martin Odersky Lightweight modular staging a pragmatic approach to runtimecode generation and compiled DSLs In GPCE pages 127ndash136 ACM 2010 [Pages 4 20 21 22 2544 49 78 and 127]

Tiark Rompf Arvind K Sujeeth HyoukJoong Lee Kevin J Brown Hassan Cha Martin Oderskyand Kunle Olukotun Building-blocks for performance oriented DSLs In DSL pages 93ndash117 2011[Pages 44 46 and 49]

Tiark Rompf Arvind K Sujeeth Nada Amin Kevin J Brown Vojin Jovanovic HyoukJoong LeeManohar Jonnalagedda Kunle Olukotun and Martin Odersky Optimizing data structures inhigh-level programs new directions for extensible compilers based on staging In POPL pages497ndash510 ACM 2013 [Pages 43 48 and 49]

240 Bibliography

Andreas Rossberg Claudio V Russo and Derek Dreyer F-ing Modules In Proceedings of the 5thACM SIGPLAN Workshop on Types in Language Design and Implementation TLDI rsquo10 pages89ndash102 New York NY USA 2010 ACM ISBN 978-1-60558-891-9 doi 10114517080161708028[Page 155]

Amr Sabry and Matthias Felleisen Reasoning about programs in continuation-passing style Lispand symbolic computation 6(3-4)289ndash360 1993 [Page 138]

Guido Salvaneschi and Mira Mezini Reactive behavior in object-oriented applications an analysisand a research roadmap In AOSD pages 37ndash48 ACM 2013 [Pages 1 and 55]

Gabriel Scherer and Didier Reacutemy GADTs meet subtyping In ESOP pages 554ndash573 Springer-Verlag2013 [Page 4]

Christopher Schwaab and Jeremy G Siek Modular type-safety proofs in Agda In Proceedings of the7th Workshop on Programming Languages Meets Program Verication PLPV rsquo13 pages 3ndash12 NewYork NY USA 2013 ACM ISBN 978-1-4503-1860-0 doi 10114524281162428120 [Page 190]

Ilya Sergey Dimitrios Vytiniotis and Simon Peyton Jones Modular higher-order cardinalityanalysis in theory and practice In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium onPrinciples of Programming Languages POPL rsquo14 pages 335ndash347 New York NY USA 2014 ACMISBN 978-1-4503-2544-8 doi 10114525358382535861 [Page 160]

Rohin Shah and Rastislav Bodik Automated incrementalization through synthesis Presented at IC2017 First Workshop on Incremental Computing 2-page abstract available 2017 URL hppldi17sigplanorgeventic-2017-papers-automated-incrementalization-through-synthesis [Page 56]

Norbert Siegmund Marko Rosenmuumlller Christian Kaumlstner Paolo G Giarrusso Sven Apel andSergiy Kolesnikov Scalable prediction of non-functional properties in software product lines InProceedings of the 15th International Software Product Line Conference (SPLC) Los Alamitos CAAugust 2011 IEEE Computer Society URL hpwwwinformatikuni-marburgde~kaestnerSPLC11_nfppdf to appear submitted 27 Feb 2010 accepted 21 Apr 2011 [Page 4]

Norbert Siegmund Marko Rosenmuumlller Christian Kaumlstner Paolo G Giarrusso Sven Apel andSergiy S Kolesnikov Scalable prediction of non-functional properties in software product linesFootprint and memory consumption Information and Software Technology 55(3)491ndash507 2013[Page 4]

Daniel Spiewak and Tian Zhao ScalaQL Language-integrated database queries for Scala In ProcConf Software Language Engineering (SLE) 2009 [Page 48]

R Statman Logical relations and the typed λ-calculus Information and Control 65(2)85ndash97 May1985 ISSN 0019-9958 doi 101016S0019-9958(85)80001-2 [Page 89]

Guy L Steele Jr Organizing Functional Code for Parallel Execution or Foldl and Foldr ConsideredSlightly Harmful In Proceedings of the 14th ACM SIGPLAN International Conference on FunctionalProgramming ICFP rsquo09 pages 1ndash2 New York NY USA 2009 ACM ISBN 978-1-60558-332-7 doi10114515965501596551 [Page 43]

Michael Stonebraker Samuel Madden Daniel J Abadi Stavros Harizopoulos Nabil Hachem andPat Helland The end of an architectural era (itrsquos time for a complete rewrite) In Proc Intrsquol ConfVery Large Data Bases (VLDB) pages 1150ndash1160 VLDB Endowment 2007 [Pages 1 and 9]

Bibliography 241

Arvind K Sujeeth Austin Gibbons Kevin J Brown HyoukJoong Lee Tiark Rompf Martin Oderskyand Kunle Olukotun Forge generating a high performance dsl implementation from a declarativespecication In Proceedings of the 12th international conference on Generative programmingconcepts amp38 experiences GPCE rsquo13 pages 145ndash154 New York NY USA 2013a ACM ISBN 978-1-4503-2373-4 doi 10114525172082517220 URL hpdoiacmorg10114525172082517220[Page 49]

Arvind K Sujeeth Tiark Rompf Kevin J Brown HyoukJoong Lee Hassan Cha Victoria PopicMichael Wu Aleksandar Prokopec Vojin Jovanovic Martin Odersky and Kunle OlukotunComposition and reuse with compiled domain-specic languages In ECOOP pages 52ndash78Springer-Verlag 2013b [Pages 43 and 49]

R S Sundaresh and Paul Hudak A theory of incremental computation and its application In POPLpages 1ndash13 ACM 1991 [Page 174]

L S van Benthem Jutting J McKinna and R Pollack Checking algorithms for Pure Type SystemsIn Henk Barendregt and Tobias Nipkow editors Types for Proofs and Programs number 806 inLecture Notes in Computer Science pages 19ndash61 Springer Berlin Heidelberg January 1994 ISBN978-3-540-58085-0 978-3-540-48440-0 [Page 172]

Jan Vitek and Tomas Kalibera Repeatability reproducibility and rigor in systems research In ProcIntrsquol Conf Embedded Software (EMSOFT) pages 33ndash38 ACM 2011 [Page 36]

Philip Wadler Theorems for Free In Proceedings of the Fourth International Conference on FunctionalProgramming Languages and Computer Architecture FPCA rsquo89 pages 347ndash359 New York NYUSA 1989 ACM ISBN 978-0-89791-328-7 doi 1011459937099404 [Page 89]

Philip Wadler The GirardndashReynolds isomorphism (second edition) Theoretical Computer Science375(1ndash3)201ndash226 May 2007 ISSN 0304-3975 doi 101016jtcs200612042 [Page 163]

Patrycja Wegrzynowicz and Krzysztof Stencel The good the bad and the ugly three ways to use asemantic code query system In OOPSLA pages 821ndash822 ACM 2009 [Page 49]

Darren Willis David Pearce and James Noble Ecient object querying for Java In ECOOP pages28ndash49 Springer 2006 [Page 48]

Darren Willis David J Pearce and James Noble Caching and incrementalisation in the Java QueryLanguage In OOPSLA pages 1ndash18 ACM 2008 [Pages 48 and 174]

Yuan Yu Michael Isard Dennis Fetterly Mihai Budiu Uacutelfar Erlingsson Pradeep Kumar Gunda andJon Currey DryadLINQ a system for general-purpose distributed data-parallel computing usinga high-level language In Proc Conf Operating systems design and implementation OSDIrsquo08 pages1ndash14 USENIX Association 2008 [Page 47]

Abstract

Zusammenfassung

Contents

List of Figures

List of Tables

1 Introduction

11 This thesis

111 Included papers

112 Excluded papers

113 Navigating this thesis

I Optimizing Collection Queries by Reification

2 Introduction

21 Contributions and summary

22 Motivation

221 Optimizing by Hand

3 Automatic optimization with SQuOpt

31 Adapting a Query

32 Indexing

4 Implementing SQuOpt

41 Expression Trees

42 Optimizations

43 Query Execution

5 A deep EDSL for collection queries

51 Implementation Expressing the interface in Scala

52 Representing expression trees

53 Lifting first-order expressions

54 Lifting higher-order expressions

55 Lifting collections

56 Interpretation

57 Optimization

6 Evaluating SQuOpt

61 Study Setup

62 Experimental Units

63 Measurement Setup

64 Results

7 Discussion

71 Optimization limitations

72 Deep embedding

721 What worked well

722 Limitations

723 What did not work so well Scala type inference

724 Lessons for language embedders

8 Related work

81 Language-Integrated Queries

82 Query Optimization

83 Deep Embeddings in Scala

831 LMS

84 Code Querying

9 Conclusions

91 Future work

II Incremental λ-Calculus

10 Introduction to differentiation

101 Generalizing the calculus of finite differences

102 A motivating example

103 Introducing changes

1031 Incrementalizing with changes

104 Differentiation on open terms and functions

1041 Producing function changes

1042 Consuming function changes

1043 Pointwise function changes

1044 Passing change targets

105 Differentiation informally

106 Differentiation on our example

107 Conclusion and contributions

1071 Navigating this thesis part

1072 Contributions

11 A tour of differentiation examples


112 How to design a language plugin

113 Incrementalizing a collection API

1131 Changes to type-class instances

1132 Incrementalizing aggregation

1133 Modifying list elements

1134 Limitations

114 Efficient sequence changes

115 Products

116 Sums pattern matching and conditionals

1161 Optimizing filter

117 Chapter conclusion

12 Changes and differentiation formally



1212 Derivatives

1213 Basic change structures on types



122 Correctness of differentiation

1221 Plugin requirements

1222 Correctness proof

123 Discussion

1231 The correctness statement

1232 Invalid input changes

1233 Alternative environment changes

1234 Capture avoidance



13 Change structures

131 Formally defining oplus and change structures

1311 Example Group changes

1312 Properties of change structures



1322 Nil changes on arbitrary functions

1323 Constraining oplus on functions

133 Families of change structures

1331 Change structures for function spaces

1332 Change structures for products

134 Change structures for types and contexts

135 Development history


14 Equational reasoning on changes

141 Reasoning on changes syntactically

1411 Denotational equivalence for valid changes

1412 Syntactic validity

142 Change equivalence

1421 Preserving change equivalence

1422 Sketching an alternative syntax

1423 Change equivalence is a PER


15 Extensions and theoretical discussion

151 General recursion

1511 Differentiating general recursion

1512 Justification

152 Completely invalid changes


154 Modeling only valid changes


16 Differentiation in practice

161 The role of differentiation plugins

162 Predicting nil changes

1621 Self-maintainability

163 Case study

164 Benchmarking setup

165 Benchmark results


17 Cache-transfer-style conversion

171 Introduction

172 Introducing Cache-Transfer Style

1721 Background

1722 A first-order motivating example computing averages

1723 A higher-order motivating example nested loops

173 Formalization

1731 The source language λAL

1732 Static differentiation in λAL

1733 A new soundness proof for static differentiation

1734 The target language iλAL


1736 Soundness of CTS conversion


1741 Averaging integers bags

1742 Nested loops over two sequences


175 Limitations and future work

1751 Hiding the cache type

1752 Nested bags

1753 Proper tail calls

1754 Pervasive replacement values




176 Related work


18 Towards differentiation for System F

181 The parametricity transformation

182 Differentiation and parametricity

183 Proving differentiation correct externally

184 Combinators for polymorphic change structures

185 Differentiation for System F

186 Related work

187 Prototype implementation


19 Related work

191 Dynamic approaches

192 Static approaches

1921 Finite differencing

1922 λndashdiff and partial differentials

1923 Static memoization

20 Conclusion and future work

Appendixes

A Preliminaries

A1 Our proof meta-language


A2 Simply-typed λ-calculus

A21 Denotational semantics for STLC

A22 Weakening

A23 Substitution

A24 Discussion Our mechanization and semantic style

A25 Language plugins

B More on our formalization

B1 Mechanizing plugins modularly and limitations

C (Un)typed ILC operationally

C1 Formalization

C11 Types and contexts

C12 Base syntax for λA

C13 Change syntax for λΔA

C14 Differentiation

C15 Typing λArarr and λΔArarr

C16 Semantics

C17 Type soundness

C2 Validating our step-indexed semantics

C3 Validity syntactically (λArarr λΔArarr)

C4 Step-indexed extensional validity (λArarr λΔArarr)

C5 Step-indexed intensional validity

C6 Untyped step-indexed validity (λA λΔA)

C7 General recursion in λArarr and λΔArarr

C8 Future work

C81 Change composition

C9 Development history

C10 Conclusion

D Defunctionalizing function changes

D1 Setup

D2 Defunctionalization

D21 Defunctionalization with separate function codes

D22 Defunctionalizing function changes

D3 Defunctionalization and cache-transfer-style

Acknowledgments

Bibliography

Gedruckt mit Genehmigung der Mathematisch-Naturwissenschalichen Fakultaumlt

Tag der muumlndlichen alifikation mm2017

Dekan Prof Dr Wolfgang Rosenstiel

1 Berichterstaer Prof Dr Klaus Ostermann

2 Berichterstaer Prof Dr Fritz Henglein

Abstract

Zusammenfassung

Contents

Abstract v

Zusammenfassung vii

Contents ix

List of Figures xv

List of Tables xvii

x Contents

Contents xi

xii Contents

Contents xiii

Appendixes 181

xiv Contents

Acknowledgments 229

Bibliography 231

List of Figures

xvi List of Figures

semantics 211

List of Tables

Chapter 1

Introduction

Part I

Chapter 2

Introduction

other books )

val records =for

Chapter 3

Chapter 4

Implementing SQuOpt

Chapter 5

LinAo = Exp[A]

arg rArr App(f arg)

Chapter 6

Evaluating SQuOpt

Modularized Scala

Implementation

Hand-opt Scala

Implementation

and indexes

Specification

3- 3o 3x

Performance

ratiosId

Description

3minus

085026

2Explicitgarbage

collectioncall

496258

11761150

3Protected

nalclass11

509262

11501123

5clone()dened

innon-Cloneable

class29

Dubiouscatching

2612800

Uniniteldread

duringconstruction

ofsuper896

3673017

960960

utablestaticeld

declaredpublic

95279511

91159350

935010

88048767

87188700

870010

Inecientuse

oftoArray(O

bject[])3714

19054046

34143414

itiveboxed

andunboxed

forcoercion3905

16725044

32243224

oubleprecision

conversionfrom

38871796

52893010

301022

Privilegedm

ethodused

outsidedoPrivileged

505302

1319337

Mutable

emberofnon-serclass

methodsused

outsideSw

ingthread

116345

odularScalaim

izedScala

oneand(3minus)(3

o)(3x)

refertothe

plementation

izationswith

optimizationsw

ithoptim

ueriesm

arkedw

iththe

Rsuperscriptw

ereselected

byrandom

sampling

Chapter 7

Discussion

72 Deep embedding

Chapter 8

Related work

Chapter 9

Conclusions

Part II

Chapter 10

xs1 = 1 2 3 dxs = 1 xs2 = 1 1 2 3 ys1 = 4 dys = 5

= 10 + 6 = 16

tf(y = v2) We

(y = v1 dy = dv)

tfx dx Then

(y = v1 dy = dv x = a1 dx = da) =

a1 da = df a1 da

D n x o = dx

D n c o = DC n c o

Chapter 11

1 (2 (3 (4 (5 (6 (7 (8 mempty)))))))

((1 2) (3 4)) ((5 6) (7 8))

Chapter 12

∆ι =

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

D n t o

D n x o = dx

D n c o = DC n c o

Derive

If Γ ` t τ then

∆ι =

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

Derive

is derivable

D n x o = dx

D n c o = DC n c o

Derive

is derivable

123 Discussion

∆ (Γx τ ) = ∆Γ dx ∆τ

D n xm o = m(x)

D n cm o = DC n c o

∆mε = ε

Deriversquo

Chapter 13

Change structures

0 = 0ρx=v = 0ρ x = v dx = 0v

0f = f f

(a2 b2) (a1 b1) = (a2 a1 b2 b1)

0ab = (0a 0b)

meaning is coherent

and denitions

ρ = 0n t o ρ

oplus =

(ρ2 x = v2) (ρ1 x = v1) = (ρ2 ρ1 x = v1 dx = v2 v1)0 =

Chapter 14

n t2 o dρ

as required

Chapter 15

f2 (a1 + da) = f1 a1 + df a1 da

nabla f a = df a 0a

Chapter 16

foldBag д f b1

id = λx rarr x

100013

1000013

1k13 2k13 4k13 8k13 16k13 32k13 64k13 128k13 256k13 512k13 1024k13 2048k13 4096k13

Run$me13 (ms)13

Input13 size13

Chapter 17

1721 Background

print yN prime

D ιn x o = dx

D ιn bull o = bull

dv Bk v1 rarr v2

| bull Empty

[TEmptyCache]

F ` bull dArr bull

F ` C x dArr Vc V

yfx ) = V

primec

yfx = V

[TMatchEmptyCache]

Vc Vprimec sim C c

primec )

C n ` o = `

200 400 600 800

input size

illise

soriginalcaching

200 400 600 800

input size

illise

originalcaching

200 400 600 800

input size

originalcaching

200 400 600 800

input size

originalcaching

Chapter 18

typechecking

Chapter 19

Related work

parts t

partx dx= (Ds)

(t partt

partx dx

)parts

partx dx

(t oplus

partx dx

Chapter 20

Appendixes

Appendix A

(a) Syntax

`C c τΓ ` c τ

(b) Typing

(c) Values

(d) Environments

n c o ρ = n c oC

ρ = ε | ρx = v

Γ1 ` t τ Γ1 Γ2

Γ2 ` t τWeaken

Appendix B

Appendix C

C1 Formalization

DC n n o = 0

D n x o = dx

D n c o = DC n c o

Dwf wa

wa D nwa o

(c) Dierentiation

∆ε = ε∆(Γ x τ ) = ∆Γ dx ∆τ

`C n NT-Lit

T-Pair

T-Prim

T-DApp

T-Derive

n NTV-Nat

TV-Pair v τ

TV-Lam

x dArr0 xVarrsquo

dv = | v

v1 oplus dv = v1

T-DRec

Appendix D

Environment changes

Re τ sim τ

elseNothing

Acknowledgments

232 Bibliography

Bibliography 233

234 Bibliography

Bibliography 235

236 Bibliography

Bibliography 237

238 Bibliography

Bibliography 239

240 Bibliography

Bibliography 241

Abstract

Zusammenfassung

Contents

List of Figures

List of Tables

1 Introduction

11 This thesis

111 Included papers

112 Excluded papers



2 Introduction


22 Motivation



31 Adapting a Query

32 Indexing


41 Expression Trees

42 Optimizations

43 Query Execution







56 Interpretation

57 Optimization

6 Evaluating SQuOpt

61 Study Setup



64 Results

7 Discussion


72 Deep embedding


722 Limitations



8 Related work




831 LMS

84 Code Querying

9 Conclusions

91 Future work
















1072 Contributions








1134 Limitations


115 Products







1212 Derivatives







123 Discussion

































1512 Justification









163 Case study





171 Introduction


1721 Background



173 Formalization













1752 Nested bags






176 Related work








186 Related work



19 Related work







Appendixes

A Preliminaries





A22 Weakening

A23 Substitution






C1 Formalization




C14 Differentiation


C16 Semantics

C17 Type soundness







C8 Future work



C10 Conclusion


D1 Setup





Acknowledgments

Bibliography

Abstract

Zusammenfassung

Contents

Abstract v

Zusammenfassung vii

Contents ix

List of Figures xv

List of Tables xvii

x Contents

Contents xi

xii Contents

Contents xiii

Appendixes 181

xiv Contents

Acknowledgments 229

Bibliography 231

List of Figures

xvi List of Figures

semantics 211

List of Tables

Chapter 1

Introduction

Part I

Chapter 2

Introduction

other books )

val records =for

Chapter 3

Chapter 4

Implementing SQuOpt

Chapter 5

LinAo = Exp[A]

arg rArr App(f arg)

Chapter 6

Evaluating SQuOpt

Modularized Scala

Implementation

Hand-opt Scala

Implementation

and indexes

Specification

3- 3o 3x

Performance

ratiosId

Description

3minus

085026

2Explicitgarbage

collectioncall

496258

11761150

3Protected

nalclass11

509262

11501123

5clone()dened

innon-Cloneable

class29

Dubiouscatching

2612800

Uniniteldread

duringconstruction

ofsuper896

3673017

960960

utablestaticeld

declaredpublic

95279511

91159350

935010

88048767

87188700

870010

Inecientuse

oftoArray(O

bject[])3714

19054046

34143414

itiveboxed

andunboxed

forcoercion3905

16725044

32243224

oubleprecision

conversionfrom

38871796

52893010

301022

Privilegedm

ethodused

outsidedoPrivileged

505302

1319337

Mutable

emberofnon-serclass

methodsused

outsideSw

ingthread

116345

odularScalaim

izedScala

oneand(3minus)(3

o)(3x)

refertothe

plementation

izationswith

optimizationsw

ithoptim

ueriesm

arkedw

iththe

Rsuperscriptw

ereselected

byrandom

sampling

Chapter 7

Discussion

72 Deep embedding

Chapter 8

Related work

Chapter 9

Conclusions

Part II

Chapter 10

xs1 = 1 2 3 dxs = 1 xs2 = 1 1 2 3 ys1 = 4 dys = 5

= 10 + 6 = 16

tf(y = v2) We

(y = v1 dy = dv)

tfx dx Then

(y = v1 dy = dv x = a1 dx = da) =

a1 da = df a1 da

D n x o = dx

D n c o = DC n c o

Chapter 11

1 (2 (3 (4 (5 (6 (7 (8 mempty)))))))

((1 2) (3 4)) ((5 6) (7 8))

Chapter 12

∆ι =

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

D n t o

D n x o = dx

D n c o = DC n c o

Derive

If Γ ` t τ then

∆ι =

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

Derive

is derivable

D n x o = dx

D n c o = DC n c o

Derive

is derivable

123 Discussion

∆ (Γx τ ) = ∆Γ dx ∆τ

D n xm o = m(x)

D n cm o = DC n c o

∆mε = ε

Deriversquo

Chapter 13

Change structures

0 = 0ρx=v = 0ρ x = v dx = 0v

0f = f f

(a2 b2) (a1 b1) = (a2 a1 b2 b1)

0ab = (0a 0b)

meaning is coherent

and denitions

ρ = 0n t o ρ

oplus =

(ρ2 x = v2) (ρ1 x = v1) = (ρ2 ρ1 x = v1 dx = v2 v1)0 =

Chapter 14

n t2 o dρ

as required

Chapter 15

f2 (a1 + da) = f1 a1 + df a1 da

nabla f a = df a 0a

Chapter 16

foldBag д f b1

id = λx rarr x

100013

1000013

1k13 2k13 4k13 8k13 16k13 32k13 64k13 128k13 256k13 512k13 1024k13 2048k13 4096k13

Run$me13 (ms)13

Input13 size13

Chapter 17

1721 Background

print yN prime

D ιn x o = dx

D ιn bull o = bull

dv Bk v1 rarr v2

| bull Empty

[TEmptyCache]

F ` bull dArr bull

F ` C x dArr Vc V

yfx ) = V

primec

yfx = V

[TMatchEmptyCache]

Vc Vprimec sim C c

primec )

C n ` o = `

200 400 600 800

input size

illise

soriginalcaching

200 400 600 800

input size

illise

originalcaching

200 400 600 800

input size

originalcaching

200 400 600 800

input size

originalcaching

Chapter 18

typechecking

Chapter 19

Related work

parts t

partx dx= (Ds)

(t partt

partx dx

)parts

partx dx

(t oplus

partx dx

Chapter 20

Appendixes

Appendix A

(a) Syntax

`C c τΓ ` c τ

(b) Typing

(c) Values

(d) Environments

n c o ρ = n c oC

ρ = ε | ρx = v

Γ1 ` t τ Γ1 Γ2

Γ2 ` t τWeaken

Appendix B

Appendix C

C1 Formalization

DC n n o = 0

D n x o = dx

D n c o = DC n c o

Dwf wa

wa D nwa o

(c) Dierentiation

∆ε = ε∆(Γ x τ ) = ∆Γ dx ∆τ

`C n NT-Lit

T-Pair

T-Prim

T-DApp

T-Derive

n NTV-Nat

TV-Pair v τ

TV-Lam

x dArr0 xVarrsquo

dv = | v

v1 oplus dv = v1

T-DRec

Appendix D

Environment changes

Re τ sim τ

elseNothing

Acknowledgments

232 Bibliography

Bibliography 233

234 Bibliography

Bibliography 235

236 Bibliography

Bibliography 237

238 Bibliography

Bibliography 239

240 Bibliography

Bibliography 241

Abstract

Zusammenfassung

Contents

List of Figures

List of Tables

1 Introduction

11 This thesis

111 Included papers

112 Excluded papers



2 Introduction


22 Motivation



31 Adapting a Query

32 Indexing


41 Expression Trees

42 Optimizations

43 Query Execution







56 Interpretation

57 Optimization

6 Evaluating SQuOpt

61 Study Setup



64 Results

7 Discussion


72 Deep embedding


722 Limitations



8 Related work




831 LMS

84 Code Querying

9 Conclusions

91 Future work
















1072 Contributions








1134 Limitations


115 Products







1212 Derivatives







123 Discussion

































1512 Justification









163 Case study





171 Introduction


1721 Background



173 Formalization













1752 Nested bags






176 Related work








186 Related work



19 Related work







Appendixes

A Preliminaries





A22 Weakening

A23 Substitution






C1 Formalization




C14 Differentiation


C16 Semantics

C17 Type soundness







C8 Future work



C10 Conclusion


D1 Setup





Acknowledgments

Bibliography

Abstract

Zusammenfassung

Contents

Abstract v

Zusammenfassung vii

Contents ix

List of Figures xv

List of Tables xvii

x Contents

Contents xi

xii Contents

Contents xiii

Appendixes 181

xiv Contents

Acknowledgments 229

Bibliography 231

List of Figures

xvi List of Figures

semantics 211

List of Tables

Chapter 1

Introduction

Part I

Chapter 2

Introduction

other books )

val records =for

Chapter 3

Chapter 4

Implementing SQuOpt

Chapter 5

LinAo = Exp[A]

arg rArr App(f arg)

Chapter 6

Evaluating SQuOpt

Modularized Scala

Implementation

Hand-opt Scala

Implementation

and indexes

Specification

3- 3o 3x

Performance

ratiosId

Description

3minus

085026

2Explicitgarbage

collectioncall

496258

11761150

3Protected

nalclass11

509262

11501123

5clone()dened

innon-Cloneable

class29

Dubiouscatching

2612800

Uniniteldread

duringconstruction

ofsuper896

3673017

960960

utablestaticeld

declaredpublic

95279511

91159350

935010

88048767

87188700

870010

Inecientuse

oftoArray(O

bject[])3714

19054046

34143414

itiveboxed

andunboxed

forcoercion3905

16725044

32243224

oubleprecision

conversionfrom

38871796

52893010

301022

Privilegedm

ethodused

outsidedoPrivileged

505302

1319337

Mutable

emberofnon-serclass

methodsused

outsideSw

ingthread

116345

odularScalaim

izedScala

oneand(3minus)(3

o)(3x)

refertothe

plementation

izationswith

optimizationsw

ithoptim

ueriesm

arkedw

iththe

Rsuperscriptw

ereselected

byrandom

sampling

Chapter 7

Discussion

72 Deep embedding

Chapter 8

Related work

Chapter 9

Conclusions

Part II

Chapter 10

xs1 = 1 2 3 dxs = 1 xs2 = 1 1 2 3 ys1 = 4 dys = 5

= 10 + 6 = 16

tf(y = v2) We

(y = v1 dy = dv)

tfx dx Then

(y = v1 dy = dv x = a1 dx = da) =

a1 da = df a1 da

D n x o = dx

D n c o = DC n c o

Chapter 11

1 (2 (3 (4 (5 (6 (7 (8 mempty)))))))

((1 2) (3 4)) ((5 6) (7 8))

Chapter 12

∆ι =

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

D n t o

D n x o = dx

D n c o = DC n c o

Derive

If Γ ` t τ then

∆ι =

∆ε = ε

∆ (Γx τ ) = ∆Γx τ dx ∆τ

Derive

is derivable

D n x o = dx

D n c o = DC n c o

Derive

is derivable

123 Discussion

∆ (Γx τ ) = ∆Γ dx ∆τ

D n xm o = m(x)

D n cm o = DC n c o

∆mε = ε

Deriversquo

Chapter 13

Change structures

0 = 0ρx=v = 0ρ x = v dx = 0v

0f = f f

(a2 b2) (a1 b1) = (a2 a1 b2 b1)

0ab = (0a 0b)

meaning is coherent

and denitions

ρ = 0n t o ρ

oplus =

(ρ2 x = v2) (ρ1 x = v1) = (ρ2 ρ1 x = v1 dx = v2 v1)0 =

Chapter 14

n t2 o dρ

as required

Chapter 15

f2 (a1 + da) = f1 a1 + df a1 da

nabla f a = df a 0a

Chapter 16

foldBag д f b1

id = λx rarr x

100013

1000013

1k13 2k13 4k13 8k13 16k13 32k13 64k13 128k13 256k13 512k13 1024k13 2048k13 4096k13

Run$me13 (ms)13

Input13 size13

Chapter 17

1721 Background

print yN prime

D ιn x o = dx

D ιn bull o = bull

dv Bk v1 rarr v2

| bull Empty

[TEmptyCache]

F ` bull dArr bull

F ` C x dArr Vc V

yfx ) = V

primec

yfx = V

[TMatchEmptyCache]

Vc Vprimec sim C c

primec )

C n ` o = `

200 400 600 800

input size

illise

soriginalcaching

200 400 600 800

input size

illise

originalcaching

200 400 600 800

input size

originalcaching

200 400 600 800

input size

originalcaching

Chapter 18

typechecking

Chapter 19

Related work

parts t

partx dx= (Ds)

(t partt

partx dx

)parts

partx dx

(t oplus

partx dx

Chapter 20

Appendixes

Appendix A

(a) Syntax

`C c τΓ ` c τ

(b) Typing

(c) Values

(d) Environments

n c o ρ = n c oC

ρ = ε | ρx = v

Γ1 ` t τ Γ1 Γ2

Γ2 ` t τWeaken

Appendix B

Appendix C

C1 Formalization

DC n n o = 0

D n x o = dx

D n c o = DC n c o

Dwf wa

wa D nwa o

(c) Dierentiation

∆ε = ε∆(Γ x τ ) = ∆Γ dx ∆τ

`C n NT-Lit

T-Pair

T-Prim

T-DApp

T-Derive

n NTV-Nat

TV-Pair v τ

TV-Lam

x dArr0 xVarrsquo

dv = | v

v1 oplus dv = v1

T-DRec

Appendix D

Environment changes

Re τ sim τ

elseNothing

Acknowledgments

232 Bibliography

Bibliography 233

234 Bibliography

Bibliography 235

236 Bibliography

Bibliography 237

238 Bibliography

Bibliography 239

240 Bibliography

Bibliography 241

Abstract

Zusammenfassung

Contents

List of Figures

List of Tables

1 Introduction

11 This thesis

111 Included papers

112 Excluded papers



2 Introduction


22 Motivation



31 Adapting a Query

32 Indexing


41 Expression Trees

42 Optimizations

43 Query Execution







56 Interpretation

57 Optimization

6 Evaluating SQuOpt

61 Study Setup



64 Results

7 Discussion


72 Deep embedding


722 Limitations



8 Related work




831 LMS

84 Code Querying

9 Conclusions

91 Future work
















1072 Contributions








1134 Limitations


115 Products







1212 Derivatives







123 Discussion

































1512 Justification









163 Case study





171 Introduction


1721 Background



173 Formalization













1752 Nested bags






176 Related work








186 Related work



19 Related work







Appendixes

A Preliminaries





A22 Weakening

A23 Substitution






C1 Formalization




C14 Differentiation


C16 Semantics

C17 Type soundness







C8 Future work



C10 Conclusion


D1 Setup





Acknowledgments

Bibliography

Documents

Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation