Ocaml for Scientists

© Flying Frog Consultancy Ltd., 2005

Dedicated to Emma

For more products and services by Flying Frog Consultancy Ltd., visit our website:

http://www.ffconsuItancy.com

Contents

1 Introduction 1

1.1 Good programming style . 1

1.2 A Brief History of OCaml .2

1.3 Benefits of OCaml · ... 2

1.4 Running OCaml programs . 3

1.4.1 Top-level ...... 3

1.4.2 Byte-code compilation 4

1.4.3 Native-code compilation 5

1.5 OCaml syntax. . . . . . . 5

1.5.1 Language overview 5

1.5.1.1 Types .. 5

1.5.1.2 Variables and functions 7

1.5.1.3 Tuples, records and variants 9

1.5.1.4 Lists and arrays 12

1.5.1.5 If · ....... 13

1.5.1.6 Program composition 14

1.5.1.7 More about functions 14

1.5.1.8 Modules. 16

1.5.2 Pattern matching . 16

1.5.2.1 Guarded patterns 18

1.5.2.2 Erroneous patterns. 19

1.5.2.3 Good style ... 20

1.5.2.4 Nested patterns 23

1.5.2.5 Parallel pattern matching 23

1.5.3 Exceptions · ........ 24

1

ii CONTENTS

1.5.4 Polymorphism 26

1.5.5 Currying .... 28

1.6 Functional vs Imperative programming . 28

1.7 Recursion .. 30

1.8 Applicability 32

2 Program Structure 33

2.1 Nesting 33

2.2 Factoring 34

2.3 Modules. 37

2.3.1 Signatures . 39

2.3.2 Structures . 39

2.3.3 Anonymous signatures 41

2.3.4 Use of the IntRange module. 42

2.3.5 Another example 43

2.4 Objects .... 46

2.4.1 Classes. 46

2.4.2 Objects 47

2.4.2.1 Immediate objects 47

2.4.2.2 Classed objects . 48

2.4.2.3 Inheritance 49

2.5 OCaml browser 50

2.6 Compilation .. 51

2.6.1 Linking with libraries 55

2.7 Custom top-levels . ...... 57

3 Data Structures 59

3.1 Algorithmic Complexity 59

3.1.1 Primitive operations 60

3.1.2 Complexity ..... 61

3.1.2.1 Asymptotic complexity 61

3.2 Arrays 64

3.3 Lists 68

3.4 Sets 73

CONTENTS

3.5 Hash tables

3.6 Maps ...

3.7 Summary

3.8 Heterogeneous containers

3.9 Trees .

3.9.1 Balanced trees

3.9.2 Unbalanced trees

4 Numerical Analysis

4.1 Number representation .

4.1.1 Integers .....

4.1.2 Floating-point numbers

4.2 Quirks.

4.3 Algebra

4.4 Interpolation

4.5 Quadratic solutions .

4.6 Mean and variance

4.7 Other forms of arithmetic

4.7.1 Rational arithmetic.

4.7.2 High-precision floating point.

4.7.3 Adaptive precision .

5 Input and Output

5.1 Printing to screen .

5.2 Reading from and writing to disc

5.3 Marshalling . . . . .

5.4 Lexing and Parsing .

5.4.1 Lexing.

5.4.2 Parsing

6 Visualization

6.1 Overview of OpenGL .

6.1.1 GLUT.

6.2 Basic rendering

iii

78

80

83

84

85

90

91

101

· 101

· 101

· 102

· 105

.105

· 107

· 108

· 109

· 111

· 111

.112

.113

115

.115

· 116

.117

.118

· 119

· 123

127

· 127

· 128

· 130

iv

6.2.1 Geometric primitives

6.2.2 Filling...

6.2.3 Projection.

6.2.4 Animation.

6.3 'Iransformations..

6.4 Efficient rendering

6.5 Rendering scientific data .

7 Optimization

7.1 Profiling.

7.2 Algorithmic optimization.

7.3 Lower-level optimizations

7.3.1 Benchmarking data structures.

7.3.2 Automated transformations ..

7.3.2.1 Compiler optimizations

7.3.2.2 Defunctorizing

7.3.3 Manual transformations

7.3.3.1 Tail-recursion.

7.3.3.2 Deforesting ..

7.3.3.3 Terminating early

7.3.3.4 Specializing data structures

7.3.3.5 Avoiding polymorphic numerical functions.

7.3.3.6 Unboxing data structures . . . . . . . . . .

8 Libraries

8.1 Command-line arguments

8.2 Timing ..

8.3 Big arrays

8.4 Vector-Matrix.

8.5 Fourier transform .

CONTENTS

· 131

.133

· 133

· 135

· 136

· 137

· 139

145

.145

.148

· 149

· 149

· 153

.153

· 154

.154

.154

· 157

· 158

· 159

· 160

· 161

165

· 165

· 167

· 169

· 171

.172

CONTENTS v

9 Simple Examples 111

9.1 Arithmetic . .177

9.2 List related · 181

9.2.1 count · 181

9.2.2 position · 182

9.2.3 mapi · 182

9.2.4 chop · 184

9.2.5 dice · 185

9.2.6 replace · 186

9.2.7 sub ... · 186

9.2.8 extract · 187

9.2.9 randomize. · 187

9.2.10 permute .. · 188

9.2.11 Run-length encoding · 189

9.3 String related . . . . . . · 189

9.3.1 string_of_list · 190

9.3.2 DNA sequence 10 · 190

9.3.3 Matrix 10 . · 192

9.4 Array related · 194

9.4.1 map2 . · 194

9.4.2 Double folds. · 194

9.4.3 rotate .... · 195

9.4.4 Matrix trace · 196

9.5 Higher-order functions · 196

9.5.1 Data structures of functions · 196

9.5.2 Tuple related ..... .197

9.5.3 Generalised products . · 198

9.5.4 Converting between container types · 199

10 Complete Examples 201

10.1 Maximum entropy method . .201

10.1.1 Formulation . . . .202

10.1.2 Implementation. .203

vi

10.1.2.1 Lexer .

10.1.2.2 Parser.

10.1.2.3 Main program

10.1.2.4 Compilation

10.1.2.5 Results ...

10.1.2.6 Optimisation

10.2 Global minimization ....

10.2.1 The mutate function

10.2.2 Efficiency ....

10.2.3 Implementation.

10.2.3.1 Lexer .

10.2.3.2 Parser.



10.2.3.5 Results ...

10.3 Finding nth-nearest neighbours

10.3.1 Formulation. .. ..

10.3.2 Implementation .

10.3.2.1 Lexer .

10.3.2.2 Parser.



10.3.3 Results

lOA Eigen problems

10.4.1 Implementation.

10.4.2 Results .....

10.5 Discrete wavelet transform.

Bibliography

CONTENTS

. .203

.204

.205

.208

.209

.209

.210

.211

.213

.214

.214

.215

.216

.221

.222

.222

.222

.224

.226

.226

.228

.233

.233

.234

.235

.236

.237

239

CONTENTS vii

A Advanced Topics 241

A.l Data sharing · ......... .241

A.2 Labelled and optional arguments .242

A.3 Defining binary infix operators .244

AA Installing top-level pretty printers. .245

A.5 Monomorphism .245

A.6 Functors ... .246

A.7 Memoization .247

A.8 Polymorphic variants . .249

A.9 Phantom types · ... .250

A.I0 Exponential type growth. .251

B Troubleshooting 253

B.l Dangerous if . .253

B.2 Scoping subtleties. .255

B.3 Evaluation order .256

BA Constructor arguments . .257

B.5 Recycled types · .... .257

B.6 Mutable array contents . .258

B.7 Polymorphic problems .259

B.8 Local and non-local variable definitions. .259

viii CONTENTS

Preface

This book aims to encourage the scientific community to adopt stricter approaches to computerprogramming, emphasising correctness over performance, beginning with the selection of theObjective CAML language due to its inherent safety. Although scientists are the principaltarget audience, anyone interested in learning more about modern programming techniques islikely to benefit from reading this book.

Due to the widespread adoption of computers for everything from the logging and analysis ofexperimentally observed data to the computationally-intensive simulation of physical systems,the computer is now a vitally important tool for scientists. However, poor approaches toprogramming are endemic in current scientific culture. Specifically, more worth is placedon scientific results than on the creation of generic programs which could have been usedto generate many more such results, leading to the constant redevelopment of disposableprograms. If this can be cured, science will benefit from professional quality (correct, reusableand future-proof) programs and data formats which will greatly accelerate the rate of scientificdiscovery.

This book may be divided into three main parts. Chapters 1-5 introduce the reader tothe syntax of Objective CAML and the creation and execution of working programs basedupon the most important features found in the language. Chapters 6-8 deal with extendedfunctionality available via libraries and optimisation. Finally, chapters 9 and 10 present avariety of examples. In particular, chapter 10 describes the creation of complete programscapable of solving some of the most important types of problem found in computationalscience.

IX

x CONTENTS

Notation

The sets of integers, real and complex numbers are denoted as Z, IR and C respectively.

The unit imaginary number is denoted as i = V-I according to the convention of the physicalsciences (engineers use i). Real and imaginary parts of a complex number z = x+iy, x,y E IRare denoted as Re[z] = x and Im[z] = y respectively. The complex conjugate is denotedz* = x - iy. In complex polar notation z is written z = rei8 , r, () E IR where r = Izl is termedthe modulus of z and () = arg[z] is termed the argument of z.

Vectors are written in bold typeface (e.g. r) and default to r E IR3.

Directed integer rounding functions are referred to as floor and ceiling and are denoted byand formulated as:

lxJ = max{n E Z;n::; x} and fxl = min{n E Z;n 2:: x}

respectively.

Ranges are written using round or square braces to indicate exclusive or inclusive range endsrespectively, e.g. for an integer range [1 ... n) == {I ... n - I}.

Derivatives of functions with respect to the first argument can be written in shorthand notation, e.g. q,1I(X, y) == ~q,(x, y) and q,'(l, y) == ~: IX=l'Inner products of complex-valued functions are written using Dirp,c notation, Le.

(JIg) = f: f*(x)g(x) dx

Readers should be aware that the standard mathematical notation (g, f) == (JIg) is often usedin other literature.

Fourier transforms are denoted and formulated according to convention for the physical sciences, with the forward transform:

and the reverse transform:

Readers should be aware that alternate formulations exist in related fields.

xi

xii CONTENTS

The set operations union, intersection and difference are denoted by the symbols U, n and\ respectively. The cartesian product of two sets A and B is written A x B. For example,{1,2} x {a, b, c} is the set of pairs {{I, a}, {I, b}, {I, c}, {2, a}, {2, b}, {2, en.Function LP norms, denoted by IlfIILP' are defined as:

1

IlfllLP = (i: If(x)IP dX) P

and we default to the L 2 norm, e.g. the Plancherel equality may be written 11111 2 = IIfl12

'lifE L2 (lR). In particular, L2 (lR) is the Hilbert space of functions f : lR ---t C with IIfl12 ER

A function f which maps values from a set A onto a set B is written L: A ---t B. Typically,thisjs used to indicate the argument and return types of a function, e.g. f : lR x lR ---t C (parsedas f : (lR x lR) ---t C) is a function which maps two real numbers (expressed as an element inthe cartesian product of the set of real numbers with itself) onto a complex number.

The variance aJ of a function f satisfying IIfl12 = I is defined as:

aJ = i: t2 f(t) dt - (i: tf(t) dt) 2

where a f is known as the standard deviation.

The r-function r[z] : C ---t C is defined to be:

r[z] = 100

tz-1e-t dt

Glossary

Glossary of terms

A-function an anonymous function.

Abstract type a type with a visible name but hidden implementation. Abstract types arecreated by declaring only the name of the type in a module signature, and not thecomplete implementation of the type as given in the module structure.

Accumulator a variable used to build the result of a computation. The concept of anaccumulator underpins the fold algorithm (introduced on page 36). For example, in thecase of a function which sums the elements of a list, the accumulator is the variablewhich holds the cumulative sum while the algorithm is running.

Algorithm a mathematical recipe for solving a problem. For example, Euler's method is awell-known algorithm for finding the largest common divisor of two numbers.

Array a flat container which provides random access to its elements in 0(1) time-complexity.See section 3.2.

Associative container A container which represents a mapping from keys to values.

Asymptotic complexity an approximation to the complexity of an algorithm, derived in thelimit of infinite input complexity and, typically, as a lower or upper bound. For example,an algorithm with a complexity f(n) = 3+3n+n2 has an asymptotic complexity 0(n2 ).

See section 3.1.

Balanced tree a tree data structure in which the maximum variation in depth can be shownto tend to a finite value in the limit of an infinite number of leaves in the tree. Often thisrestriction is tightened to require that the variation in depth is no more than a singlelevel. See section 3.9.1 for a brief discussion.

Binary tree a tree data structure in which all non-leaf nodes contain exactly two binarytrees.

Byte code a representation of a program which is intermediate between the source codeand machine code. For example, the ocamlc compiler transforms OCaml source codeinto a platform-independent byte-code. Section 1.4.2 describes how to compile OCamlprograms into byte code.

xiii

xiv CONTENTS

Cache an intermediate store used to accelerate the fetching of a subset of data.

Cache hit the quick process of retrieving data which is already in the cache.

Cache miss the slow process of fetching data to fill the cache when a request is made fordata not already in the cache.

Cache coherent accessing data (typically in memory) sequentially, or more sequentially thanrandom access, in order to minimize cache misses.

Cartesian cross product a set-theoretic form of outer product. For example, the cartesiancross product of the set A = {a, b} with the set B = {c, d, e} is the set of pairs A X B ={(a,c),(a,d),(a,e),(b,c),(b,d),(b,e)}.

Class expression definition of values and methods implemented in any object created fromthis class.

Class type declaration of values and methods which any object adhering to this class typemust provide.

Compile-time while a program is being compiled.

Compiler a program capable of transforming other programs. For example, the compilerocamlopt transforms OCaml source code into executable machine code.

Complexity a quantitative indication of the growth of the computational requirements (suchas time or memory) of an algorithm with respect to its input. Algorithmic complexityis described in section 3.1.

Cons the:: operator. When used in a pattern, h: :t is said to decapitate a list, binding h tothe first element of a list (the head) and t to a list containing the remaining elements(the tail). When used in an expression, h: : t prepends the element h onto the list t.See sections 3.3 and 9.2 for example uses of the cons operator.

Container a data structure used to store values. The values held in a data structure areknown as the elements of the data structure. Arrays, lists and sets are examples of datastructures.

Curried function any function which returns a function as its result. See section 1.5.5.

Data structure a scheme for organizing related pieces of information.

Decapitate splitting a list into its first element (the head) and a list containing the remainingelements (the tail).

Exception a programming construct which allows the flow of execution to be altered by theraising of an exception. Execution then continues at the most recently defined exceptionhandler capable of dealing with the exception. See section 1.5.3.

Flat container a non-hierarchical data structure representing a collection of values (elements). For example, arrays and lists.

CONTENTS xv

Fixed point an int and a (possibly implicit) scaling. Used to represent real-valued numbersx E JR. approximately, with a constant absolute error.

Float a type which, in OCaml, represents a double-precision IEEE floating-point number.

Floating point a number representation commonly used to approximate real-valued numbersx E R See section 4.1.2.

Folds a higher-order function which applies its function argument to an accumulator andeach element of a container. Introduced on page 36.

Function a mapping from input values to output values which may be described implicitlyas an algorithm or explicitly, e.g. by a pattern match.

Functional programming a style of programming in which a computation is performed bycomposing the results of expressions without side-effects.

Functional language any programming language which allows functions to be passed asarguments to other functions, returned as the result of functions and stored as values indata structures.

Garbage collection the process of identifying data which are no longer accessible to a running program, destroying them and reclaiming the resources they required.

Generic programming the use of polymorphic functions and types.

Graph a data structure composed of vertices, and edges which link pairs of vertices.

Hash a constant-sized datum computed from an arbitrarily complicated value.

Hash table a data structure providing fast random access to its elements via their associatedhash values.

Head the element at the front of a list.

Higher-order function any function which accepts another function as an argument. Forexample, f is a higher-order function in the definition f(g,x) = g(g(x)) because 9 mustbe a function (as 9 is applied to x and then to g(x)).

Heterogeneous container a data structure capable of storing several elements of differenttypes. See section 3.8.

Homogeneous container a data structure (container) capable of storing several elementsof the same type. For example, an array of integers is a homogeneous container becausean array is a data structure containing elements of a single type, in this case integers.

Imperative programming a style of programming in which the result of a computationis generated by statements which act by way of side-effects, as opposed to functionalprogramming.

xvi CONTENTS

Iteration a homonym with different meanings in different contexts and disciplines. In thecontext of numerical algorithms, an "iterative algorithm" means an algorithm designedto produce an approximate result by progressively converging on the solution. Moregenerally, the word iterative is often used to describe repetitive algorithms, where asingle repeat is known as an iteration.

Impure functional language a language, such as OCaml, which provides both functionaland imperative programming constructs.

Int a type which exactly represents a contiguous subset of the integers Z. See section 4.1.1.

10 input and output operations, such as printing to the screen or reading from disc.

Leaf in the context of tree data structures, a leaf node is a node containing no remainingtrees.

Lex converting a character stream into a token stream. For example, recognising the keywordsin a language before parsing them.

Linked List see list.

List a flat container providing prepend and decapitation operators in 0(1) time-complexity.In OCaml, these are performed by the:: operator, known as the cons operator. A listis traversed by repeated decapitation. See section 3.3.

Maps either a container or a higher-order function:

• A data structure implementing a container which allows key-values pairs to beinserted and keys to be subsequently mapped onto their corresponding values. Seesections 3.5 and 3.6.

• A higher-order function map f {lo, .. ·, In-I} -+ {f(lo), ... , f(ln-I)} which actsupon a container of elements to create a new container, the elements of whichare the result of applying the given function f to each element li in the givencontainer. Sometimes known as inner map.

Module a construct which encapsulates definitions in a structure and, optionally, allow theexternally-visible portion of the definitions to be restricted to a subset of the definitionsby way of a signature. See section 2.3.

Module signature a module interface, declaring the types, exceptions, variables and functions which are to be accessible to code outside a module using the signature. Seesection 2.3.1.

Module structure the body of a module, containing definitions of the constituent types,exceptions, variables and functions which make up the module. See section 2.3.2.

Monomorphic a single, possibly not-yet-known, type. See section A.5.

Mutable can be altered.

CONTENTS XVll

Native code the result of compiling a program into the machine language (machine code) understood natively by the CPU. Section 1.4.3 describes how to compile OCaml programsto native code.

Object-oriented programming the creation of objects, which encapsulate functions anddata, at run-time. In particular, the use of inheritance to specify relationships betweentypes of object.

Parse the act of understanding something formally. Parsing often refers to the recognitionof grammatical constructs. See section 5.4.2.

Partial specialisation the specialisation of a program or function to part of its input data.For example, given a function to compute x n for any given floating-point number x andinteger n, generating a function to compute x 3 for any floating-point number x is partialspecialising the original to n = 3.

Pattern matching a construct in a programming language which allows patterns to be foundand extracted from data structures.

Persistence the ability to reuse old data structures without having to worry about undoingstate changes and unwanted interactions. An advantage of functional programming.

Platform a CPU architecture (e.g. ARM, MIPS, AMD) and operating system (e.g. IRIX,Linux, Mac OS X).

Polymorphic one of any type. In particular, polymorphic functions are generic over thetypes of at least one of their arguments. Variant types can be generic over polymorphictype-arguments. See section 1.5.4.

Primitive operation a low-level function or operation, used to formulate the time-complexityof an algorithm. See section 3.1.1.

Record a tuple with named fields. For example, a record of type:{ x: float; y: float} can have a value { x=1. ; y=2. }.

Regular Expression a form of pattern matching.

Regexp common abbreviation of regular expression.

Root in the context data structures, the root is the origin of the data structure, from whichall other portions may be accessed.

Run-time while a program is being executed.

Side-effect any result of an expression apart from the value which the expression returns,e.g. altering a mutable variable or performing 10.

Signature see Module signature.

Source code the initial, manually-entered form of a program. For example, the source codeto the FFTW library (for computing Fast Fourier Transforms) is written in OCaml.This OCaml code can be compiled and run to generate C code which can, in turn, becompiled into a library and linked into a final program.

xviii CONTENTS

Static typing completely type checking at compile-time such that no type checking is required at run-time.

Structure see Module structure.

Tail the remainder of a list without its front element.

Time-complexity complexity of the time taken to execute an algorithm, specified as thenumber of times a set of primitive operations are performed.

Top-level an interactive OCaml interpreter started by running the ocaml program. Seesection 1.4.1.

Tree a recursive data structure represented by nodes which may contain further trees. Theroot node is the node from which all others may be reached. Leaf nodes are those whichcontain no further trees. Trees are traversed by examining the child nodes of the currentnode recursively.

Tuple a type representing elements in the set of the cartesian cross product of the sets oftypes in the tuple. For example, the 2-tuple of floating-point numbers (x, y) ofthe typefloat * float is typically used to represent the set lR x R

Type the set of possible values of a variable, function argument or result, or the mappingbetween argument and result types of a function.

Variant type explicitly listed sets of possible values. See section 1.5.1.3.

Glossary of acronyms

AST Abstract-syntax tree

BNF Backus-Naur form

CAML Categorical abstract machine language

FFT Fast Fourier transform

FFTW Fastest Fourier transform in the west

GOE Gaussian orthogonal ensemble

INRIA Institut National de Recherche en Informatique et en Automatique

10 Input and output

LCF Logic of computable functions

MEM Maximum entropy method

ML Meta-language

OCaml Objective CAML

CONTENTS

00 Object-oriented

OOP Object-oriented programming

OpenGL Open graphics library

SGI Silicon Graphics Incorporated

VM Virtual machine

XML Extensible markup language

xix

xx CONTENTS

Chapter 1

Introduction

For the first time in history, and thanks to the exponential growth rate of computing power,an increasing number of scientists are finding that more time is spent creating, rather thanexecuting, working programs. Indeed, much effort is spent writing small programs to automateotherwise tedious forms of analysis. In the future, this imbalance will doubtless be addressedby the adoption and teaching of more efficient programming techniques at the cost of lessefficient programs. An important step in this direction is the use of higher-level programminglanguages, such as OCaml, in place of more conventional languages for scientific programmingsuch as Fortran, C, C++ and Java.

In this chapter, we shall begin by laying down some guidelines for good programming whichare applicable in any language before briefly reviewing the history of the OCamllanguage andoutlining some of the features of the language which enforce some of these guidelines and otherfeatures which allow the remaining guidelines to be met. As we shall see, these aspects of thedesign of OCaml greatly improve reliability and development speed. Coupled with the factthat a freely available, efficient compiler already exists for this language, no wonder OCaml isalready being adopted by scientists of all disciplines.

1.1 Good programming style

Regardless of the choice of language, some simple, generic guidelines can be productivelyadhered to. We shall now examine the most relevant such guidelines in the context of scientificcomputing:

Avoid premature optimisation Programs should be written correctly first and optimisedlast.

Structure programs Complicated programs should be hierarchically decomposed into progressively smaller, constituent components.

Factor programs Complicated or common operations should be factored out into separatefunctions.

Explicit interfaces Interfaces should always be made as explicit as possible.

1

2 CHAPTER 1. INTRODUCTION

Avoid magic numbers Numeric constants should be defined once and referred back to,rather than explicitly "hard-coding" their value multiple times at different places in aprogram.

We shall now examine some of the ways OCaml can help in enforcing these guidelines andhow the OCaml compiler can exploit well-designed code.

1.2 A Brief History of OCaml

The Meta-Language (ML) was originally developed at Edinburgh University in the 1970's asa language designed to efficiently represent other languages. The language was pioneered byRobin Milner for the Logic of Computable Functions (LCF) theorem prover. The original ML,and its derivatives, were designed to stretch theoretical computer science to the limit, yieldingremarkably robust and concise programming languages which can also be very efficient.

The Categorical Abstract Machine Language (CAML) was the acronym originally used to describe what is now known as the Caml family of languages. Gerard Huet designed and implemented Caml at Institut National de Recherche en Informatique et en Automatique (INRIA)in France, until 1994. Since then, development has continued as part of projet Cristal, nowled by Xavier Leroy.

Objective Caml (OCamI1 ) is the current flagship language of projet Crista!. The Cristal grouphave produced freely available tools for this language. Most notably, an interpreter which runsOCaml code in a virtual machine (VM) and two compilers, one which compiles OCaml to amachine independent byte-code which can then be executed by a byte-code interpreter andanother which compiles OCaml directly to native code. At the time of writing, the native-codecompiler is capable of producing code for Alpha, Sparc, x86, MIPS, HPPA, PowerPC, ARM,ia64 and x86-64 CPUs and the associated run-time environment has been ported to the Linux,Windows, MacOS X, BSD, Solaris, HPUX, IRIX and Tru64 operating systems.

1.3 Benefits of OCaml

Before delving into the syntax of the language itself, we shall list the main, advantageousfeatures offered by the OCamllanguage:

Safety OCaml programs are thoroughly checked at compile-time such that they are provento be entirely safe to run, e.g. a compiled OCaml program cannot segfault.

Functional Functions may be nested, passed as arguments to other functions and stored indata structures as values.

Strongly typed The types of all values are checked during compilation to ensure that theyare well defined and validly used.

1Pronounced oh-camel.

1.4. RUNNING OCAML PROGRAMS 3

Statically typed Any typing errors in a program are picked up at compile-time by thecompiler, instead of at run-time as in many other languages.

Type inference The types of values are automatically inferred during compilation by thecontext in which they occur. Therefore, the types of variables and functions in OCamlcode does not need to be specified explicitly, dramatically reducing source code size.

Polymorphism In cases where any of several different types may be valid, any such typecan be used. This greatly simplifies the writing of generic, reusable code.

Pattern matching Values, particularly the contents of data structures, can be matchedagainst arbitrarily-complicated patterns in order to determine the appropriate action.

Modules Programs can be structured by grouping their data structures and related functionsinto modules.

Objects Data structures and related functions can also be grouped into objects (objectoriented programming).

Separate compilation Source files can be compiled separately into object files which arethen linked together to form an executable. When linking, object files are automaticallytype checked and optimized before the final executable is created.

1.4 Running OCaml programs

OCaml provides three different ways to execute code. We shall now examine each of thesethree approaches, explaining how code can be executed using them and noting their relativeadvantages and disadvantages.

1.4.1 Top-level

The OCaml top-level interactively interprets OCaml code and is started by running the program ocaml:

$ ocamlObjective Caml version 3.08.0

#

OCaml code may then be entered at this # prompt, the end of which is delimited by ". Forexample, the following calculates 1 + 3 = 4:

# 1 + 3;;- : int = 4#

The top-level will also print the type of the result as well as its value (when the result has avalue). For example, the following defines a variable called sqr which is a function:

4

# let sqr x = x *. x;;val sqr : float -> float = <fun>#

CHAPTER 1. INTRODUCTION

This response indicates that a function called sqr has been defined which accepts a floatand returns a float. In general, the response of the top-level is either of the form:

- : type = value

or consisting of one or more descriptions of the form:

val name: type = value

where - indicates that a value has been returned but was not bound to a variable name, nameis the name of a variable which has been bound, type is the type of the value and value is thevalue itself. Values are described explicitly for many data structures, such as 4 in the formercase, but several other kinds of value are simply classified, such as <fun> to indicate that thevalue is a function in the latter case2 .

Programs entered into the top-level execute almost as quickly as byte-code compiled programs (which is often quite a bit slower than native-code compiled programs). However, theinteractivity of the top-level makes testing the validity of code segments much easier.

In the remainder of this book, we shall write numerous code snippets in this style, as if theyhad been entered into the top-level.

1.4.2 Byte-code compilation

When stored in a plain text file with the suffix ".ml", an OCaml program can be compiled toa machine independent byte-code using the ocamlc compiler. For example, for a file "test.mI"containing the code:

let _ = print_endline "Hello world!"

This file may be compiled at the Unix shell $ prompt into a byte-code executable called "test":

$ ocamlc test.ml -0 test

and then executed:

$ ./testHello world!

In this case, the result was to print the string "Hello world!" onto the screen. Byte-codecompilation is an adequate way to execute OCaml programs which do not perform intensivecomputations. If the time taken to execute a program needs to be reduced then native-codecompilation can be used instead.

2 Abstract types are denoted <abstr>, as we shall see in chapter 2.

1.5. OCAML SYNTAX

1.4.3 Native-code compilation

5

The "test.ml" program could equivalently have been compiled to native code, creating a standalone, native-code executable called "test", using:

$ ocamlopt test.ml -0 test

The resulting executable runs in exactly the same way:

$ ./testHello world!

Programs compiled to native code, particularly in the context of numerically intensive programs, can be considerably faster to execute.

1.5 OCaml syntax

Before we consider the features offered by OCamI, a brief overview of the syntax of thelanguage is instructive, so that we can provide actual code examples later. Other books givemore systematic, thorough and formal introductions to the whole of the OCamllanguage [1].

1.5.1 Language overview

In this section we shall evolve the notions of values, types, variables, functions, simple containers (lists and arrays) and program flow control. These notions will then be used to introducemore advanced features in the later sections of this chapter.

When presented with a block of code, even the most seasoned and fluent programmer will notbe able to infer the purpose of the code. Consequently, programs should contain additionaldescriptions written in plain English, known as comments. In OCaml, comments are enclosedbetween (* and *). They may be nested, i.e. (* (* ... *) *) is a valid comment. Comments are treated as whitespace, i.e. a (* ... *) b is understood to mean a b rather thanabo

Just as numbers can be defined to be members of sets such as integer (E Z), real (E JR.),complex (E C) and so on, so values in programs are also defined to be members of sets. Thesesets are known as types.

1.5.1.1 Types

Fundamentally, languages provide basic types and; often, allow more sophisticated types tobe defined in terms of the basic types. OCamI provides a number of built-in types: unit, int,float, char, string and bool. We shall examine these built-in types before discussing thecompound tuple, record and variant types.

Only one value is of type unit and this value is written 0 and, therefore, conveys no information. This is used to implement functions which require no input or expressions which return


no value. For example, a new line can be printed by calling the print_newline function asprint_newline O. This function requires no input, so it accepts a single argument 0 oftypeunit, and returns the value 0 of type unit.

Integers are written -2, -1, 0,1 and 2. Floating-point numbers are written -2., -1., -0.5,0.,0.5,1. and 2 .. For example:

# 3;;- : int = 3# 5.;;-: float=5.

Arithmetic can be performed using the conventional +, -, *, /, mod binary in:fi.x3 operators overthe integers4 . For example, the following expression is evaluated according to usual mathematical convention regarding operator precedence, with multiplication taking precedence overaddition:

# 1 * 2 + 2 * 3;;- : int = 8

The floating-point infix functions have slightly different names, suffixed by a full-stop: +., - .,* ., /. as well as ** (raise to the power). For example, the following calculates (lx 2)+(2x 3) =

8:

# 1. *. 2. +. 2. *. 3.;;- : float = 8.

The distinct names of the operators for different types arises as the most elegant solution toallowing the unambiguous inference of types in the presence of different forms of arithmetic.The definition of new operators is discussed later, in section A.3. In order to perform arithmeticusing mixed types, functions such as float_oCint can be used to convert between types.

Unlike other languages, OCaml is phenomenally pedantic about types. For example, thefollowing fails because * denotes the multiplication of a pair of integers and cannot, therefore,be applied to a value of type float:

# 2 * ~;;This expression has type float but is here used with type int

Note that the OCaml top-level underlines the erroneous portion of the code.

Explicitly converting the value of type float to a value of type int using the built-in functionint_oCfloat results in a valid expression which the top-level will execute:

# 2 * (int_of_float 2.);;- : int = 4

3 An infix function is a function which is called with its name and arguments in a non-standard order. Forexample, the arguments i and j of the conventional addition operator + appear on either side i + j.

4 As well as bit-wise binary infix operators lsI, lsr, asl, asr, land, lor and lxor described in the manual [2].

1.5. OCAML SYNTAX 7

In general, arithmetic is typically performed using a single number representation (e.g. eitherint or float) and conversions between representations are, therefore, comparatively rare.

Single characters (of type char) are written in single quotes, e.g. 'a', which may also bewritten using a 3-digit decimal code, e.g. '\097'.

Strings are written in double quotes, e.g. "Hello World!". Characters in a string of lengthn may be extracted using the notation s. [i] for i E {O ... n - 1}. For example, the fifthcharacter in this string is "0":

# "Hello world!". [4J ; ;- : char = '0'

The character at index i in a string s may be set to c using the notation s. [i] <- c.

A pair of strings may be concatenated using the - operator5 ;

# "Hello " ~ "world!";;- : string = "Hello world!"

Booleans are either true or false. Booleans are created by the usual comparison functions=, <> (not equal to), <, >, <=, >=. These functions are polymorphic, meaning they may beapplied to pairs of values of the same type for any type6 . The usual, short-circuit-evaluated7

logical comparisons && and II are also present. For example, the following expression teststhat one is less than three and 2.5 is less than 2.7:

# 1 < 3 && 2. 5 < 2. 7 ; ;- : bool = true

Values may be assigned, or bound, to names. As OCaml is a functional language, these valuesmay be expressions mapping values to values - functions. We shall now examine the bindingof values and expressions to variable and function names.

1.5.1.2 Variables and functions

Variables and functions are both defined using the let construct and must be given namesbeginning with lower-case letters8 . For example, the following defines a variable called a tohave the value 2;

# let a = 2;;val a : int = 2

5A list of strings may be concatenated more efficiently than repeated application of the ~ operator by usingthe String. concat function.

6Any attempt to evaluate a comparison function over a value which has the type of a function raises anInvalid_argument exception at run-time.

7Short-circuit evaluation refers to the premature escaping of a sequence of operations (in this case, booleancomparisons). For example, the expression false && expr need not evaluate expr as the result of the wholeexpression is necessarily false due to the preceding false.

BIn particular, names may include the' character which provides an easy way to denote derivative functions,as we shall see at the end of this chapter.


Note that the language automatically infers types. In this case, a has been inferred to be oftype into

Definitions using let can be defined locally using the syntax:

let name = exprl in expr2

This evaluates expr1 and binds the result to the variable name before evaluating expr2. Forexample, the following evaluates a2 in the context a = 3, giving 9:

# let a = 3 in a * a; ;- : int = 9

Note that the value 3 bound to the variable a in this example was local to the expression a *a and, therefore, the global definition of a is still 2:

# a;;- : int = 2

More recent definitions shadow previous definitions. For example, the following supersedesthe definition a = 2 with a = a x a in order to calculate 2 x 2 x 2 x 2 = 16:

# let a = 2 inlet a = a * a ina * a;;

- : int = 16

As OCaml is a functional language, values can be functions and variables can be bound tothem in exactly the same way as we have just seen. Specifically, function definitions appenda list of arguments between the name of the function and the = in the let construct. Forexample, a function called sqr which accepts an argument called x and returns x * x may bedefined as:

# let sqr x = x * x; ;val sqr : int -> int = <fun>

In this case, the use of the integer multiply * results in OCaml correctly inferring the type ofsqr to be int -> int, Le. the sqr function accepts a value of type int and returns a value oftype into

The function sqr may then be applied to an integer as:

# sqr 5;;- : int = 25

Typically, more sophisticated computations require the use of more complicated types. Weshall now examine the three simplest ways by which more complicated types may be constructed.

1.5. OCAML SYNTAX

1.5.1.3 Tuples, records and variants

9

Tuples are the simplest form of compound types, containing a fixed number of values whichmay be of different types. The type of a tuple is written analogously to conventional settheoretic style, using * to denote the cartesian product between the sets of possible valuesfor each type. For example, a tuple of three integers, conventionally denoted by the triple(i, j, k) E Z x Z x Z, can be represented by values (i, j, k) of the type int * int * int.When written, tuple values are comma-separated and enclosed in parentheses. For example,the following tuple contains three different values of type int:

# 0, 2,3);;- : int * int * int = (1, 2, 3)

A tuple containing n values is described as an n-tuple, e.g. the tuple (1, 2, 3) is a 3-tuple.

Records are essentially tuples with named components, known as fields. Records and, inparticular, the names of their fields must be defined using a type construct before they canbe used. When written, record fields are written name : type where name is the name of thefield (which must start with a lower-case letter) and type is the type of values in that field,and are semicolon-separated and enclosed in curly braces. For example, a record containingthe x and y components of a 2D vector could be defined as:

# type vec2 = { x:float; y:float };;type vec2 = { x: float; y: float}

A value of this type representing the zero vector can then be defined using:

# let zero = { x=o. ; y=O. };;val zero : vec2 = {x = O. ; y = O.}

Note that the use of a record with fields x and y allowed OCaml to infer the type of zero asvec2.

Whereas the tuples are order-dependent, i.e. (1,2) i= (2,1), the named fields of a record mayappear in any order, i.e. {x = 1,y = 2} == {y = 2,x = 1}. Thus, we could, equivalently, haveprovided the x and y fields in reverse order:

# let zero = { y=O.; x=o. };;val zero : vec2 = {x = O. ; y = O.}

The fields in this record can be extracted individually using the notation record .field whererecord is the name of the record and field is the name of the field within that record. Forexample, the x field in the variable zero is 0:

# zero .x;;-:float=O.

Also, a shorthand with notation exists for the creation of a new record from an existing recordwith a single field replaced. This is particularly useful when records contain many fields. Forexample, the record {x=l.; y=O.} may be obtained by replacing the field x in the variablezero with 1:

10

# let x_axis = { zero with x=l. };;val x_axis : vec2 = {x = 1. ; Y = 0 .}


Although OCaml is a functional language, OCaml does support imperative programming.Fundamentally, record fields can be marked as mutable, in which case their value may bechanged. For example, the type of a mutable, two-dimensional vector called vee2 may bedefined as:

# type vec2 = { mutable x: float; mutable y: float}; ;type vec2 = {mutable x : float; mutable y : float; }

A value r of this type may be defined:

# let r = { x=l .; y=2. };;val r : vec2 = {x = 1. ; Y = 2.}

The x-coordinate of the vector r may be altered in-place using an imperative style:

# r.x <- 3.;;- : unit = 0

The side-effect of this expression has mutated the value of the variable r, the x-coordinate ofwhich is now 3 instead of 1:

# r;;- : vec = {x = 3.; Y = 2.}

However, a record with a single, mutable field can often be useful. This data structure, calleda reference, is already provided by the type ref. For example, the following defines a variablenamed a which is a reference to the integer 2:

# let a = ref 2;;val a : int ref = {contents = 2}

The type of a is then int ref. The value referred to by a may be obtained using ! a:

# !a;;- : int = 2

The value of a may be set using: =:

# a := 3;- : unit = 0# a;;- : int ref = {contents = 3}

In the case of references to integers, two additional functions are provided, iner and deer,which increment and decrement references to integers, respectively:

1.5. OCAML SYNTAX

#incra;;- : unit = 0# a;;val a : int ref = {contents = 4}

11

The types of values stored in tuples and records are defined at compile-time. OCaml completely verifies the correct use of these types at compile-time. However, this is too restrictivein many circumstances. These requirements can be slightly relaxed by allowing a type tobe defined which can acquire one of several possible types at run-time. These are known asvariant types. OCaml still verifies the correct use of variant types as far as is theoreticallypossible.

Variant types are defined using the type construct with the possible constituent types referredto by constructors (the names of which must begin with upper-case letters) separated by theI character. For example, a variant type named button which may adopt the values On orOff may be written:

# type button = On I Off;;type button = On I Off

The constructors On and Off may then be used as values of type button:

# On;;- : button = On# Off;;- : button = Off

In this case, the constructors On and Off convey no information in themselves (i.e. like thetype unit, On and Off do not carry data) but the choice of On or Off does convey information.Note that both expressions were correctly inferred to have results of type button.

More usefully, constructors may take arguments, allowing them to convey information bycarrying data. The arguments are defined using of and are written in the same form asthat of a tuple. For example, a replacement button type which provides an On constructoraccepting two arguments may be written:

# type button = On of int * string I Off;;type button = On of int * string I Off

The On constructor may then be used to create values of type button by appending theargument in the style of a tuple:

# On (1, "mine");;- : button = On (1, "mine")# On (2, "hers");;- : button = On (2, "hers")# Off;;- : button = Off

Types can also be defined recursively, which is very useful when defining more sophisticateddata structures, such as trees. For example, a binary tree contains either zero or two binarytrees and can be defined as:


# type binary_tree = Leaf I Node of binary_tree * binary_tree;;type binary_tree = Leaf I Node of binary_tree * binary_tree

A value of type binary_tree may be written in terms of these cons~ructors:

# Node (Node (Leaf, Leaf), Leaf);;- : binary_tree = Node (Node (Leaf, Leaf), Leaf)

Of course, we could also place data in the nodes to make a more useful data structure. Thisline of thinking will be pursued in chapter 3. In the mean time, let us consider two specialdata structures which have notations built into the language.

1.5.1.4 Lists and arrays

Lists are written [a; b; c] and arrays are written [I a; b; c I] . As we shall see in chapter 3,lists and arrays have different merits.

The types of lists and arrays of integers, for example, are written int list and int array,respectively:

# [1; 2; 3J ; ;- : int list = [1; 2; 3J# [11; 2; 3IJ;;- : int array = [11; 2; 3 IJ

In the case of lists, the infix cons operator :: provides a simple way to prepend an elementto the front of a list. For example, prepending 1 onto the list [2; 3] gives the list [1; 2; 3]:

# 1 :: [2; 3J;;- : int list = [1; 2; 3J

In the case of arrays, the notation array. (i) may be used to extract the i + 1th element. Forexample, [11; 2; 3 I] . (1) gives the second element 2:

# [11; 2; 31J .(1);;- : int = 2

Also, a short-hand notation can be used to represent lists or arrays of tuples by omitting theparentheses. For example, [(a, b); (c, d)] may be written [a, b; c, d]:

# [1, 2; 3, 4J ; ;- : (int * int) list = [(1, 2); (3, 4)J# [11,2; 3, 4IJ;;- : (int * int) array = [I (1,2); (3,4) IJ

The use and properties of lists, arrays and several other data structures will be discussed inchapter 3. In the mean time, we shall examine programming constructs which allow moreinteresting computations to be performed.

1.5. OCAML SYNTAX

1.5.1.5 If

13

Like many other programming languages, OCaml provides an if construct which allows aboolean "predicate" expression to determine which of two expressions is evaluated and returned, as well as a special if construct which optionally evaluates an expression of typeunit:

if exprl then expr2if exprl then expr2 else expr3

In both cases, expr1 must evaluate to a value of type bool. In the former case, expr2 isexpected to evaluate to the value of type unit. In the latter case, both expr2 and expr3 mustevaluate to values of the same type.

The former evaluates the boolean expression exprl and, only if the result is true, evaluatesthe expression expr2. Thus, the former is equivalent to:

if exprl then expr2 else ()

The latter similarly evaluates expr1 but returning the result of either expr2, if expr1 evaluatedto true, or of expr3 otherwise.

For example, the following function prints "Less than three" if the given argument is less thanthree:

# let f x = if x < 3 then print_endline "Less than three";;val f : int -> unit:::: <fun># f 1;;Less than three- : unit:::: ()# f 5;;- : unit = 0

The following function returns the string "Less" if the argument is less than 3 and "Greater"otherwise:

# let f x = if x < 3 then "Less" else "Greater"; ;val f : int -> string:::: <fun># f 1;;- : string = "Less"# f 5;;- : string:::: "Greater"

The parts of the language we have covered can already be used to write some interestingprograms. However, attention should be paid to the way in which programs are constructedfrom these parts.


1.5.1.6 Program composition

As we have seen, program segments may be written in the top-level which replies by recitingthe automatically inferred types and executing expressions. However, the ;; used to forcethe top-level into producing output is not necessary in programs compiled with ocamlc andocamlopt. For example, the two previous functions can be defined simultaneously, with onlya single ;; at the end:

# let f1 x = if x < 3 then print_endline "Less than three"let f2 x = if x < 3 then "Less" else "Greater" ; ;

val f1 : int -> unit = <fun>val f2 : int -> string = <fun>

Note that OCaml has determined that this input corresponds to two separate function definitions. In fact, when written for the ocamlc or ocamlopt compilers, programs can be writtenentirely without ;;, such as:

let f1 x = if x < 3 then print_endline "Less than three"let f2 x = if x < 3 then "Less" else "Greater"

As we have seen, expressions which act by way of a side-effect (such as printing) produce thevalue 0 of type unit. Many situations require a sequence of such expressions to be evaluated.Expressions of type unit may be concatenated into a compound expression by using the;separator. For example, a function to print "A", "B" and then "C" on three separate lines couldbe written:

# let f () =print_endline "A";print_endline "B";print_endline "G";;

val f : unit -> unit = <fun>

Note that there is no final;, only the delimiting; ;, so the value 0 of type unit producedby the final call to print_endline is returned by our f function.

1.5.1.7 More about functions

Functions can also be defined anonymously, known as A-abstraction in computer science.For example, the following defines a function f(x) = x x x which has a type representing9

f:Z~Z:

# fun X -> X * x; ;- : int -> int = <fun>

This is an anonymous equivalent to the sqr function defined earlier. The type of this expressionis also inferred to be int -> into This anonymous function can be called as if it were thename of a conventional function. For example, applying the function f to the value 2 gives2 x 2 = 4:

9We say "representing" because the OCaml type int is, in fact, a finite subset of Z, as we shall see inchapter 4.

1.5. OCAML SYNTAX

# (fun x -> x * x) 2;;val: int::: 4

Consequently, we could have defined sqr equivalently as:

# let sqr = fun x -> x * x;;val sqr : int -> int = <fun>

15

Once defined, this version of the sqr function is indistinguishable from the original.

The let ... in construct allows definitions to be nested, including function definitions. Forexample, the following function ipow3 raises a given int to the power three using a sqr

function nested within the body of the ipow3 function:

# let ipow3 x =let sqr x = x * x inx * sqr x;;

val ipow3 : int -> int = <fun>

Note that the function application sqr x takes precedence over the multiplication.

The let construct may also be used to define the elements of a tuple simultaneously. Forexample, the following defines two variables, a and b, simultaneously:

# let (a, b) = (3,4);;val a : int::: 3val b : int::: 4

This is particularly useful when factoring code. For example, the following definition of theipow4 function contains an implementation of the sqr function which is identical to that inour previous definition of the ipow3 function:

# let ipow4 x:::let sqr x ::: X * x in(sqr x) * (sqr x);;


Just as common subexpressions in a mathematical expression can be factored, so the ipow3

and ipow4 functions can be factored by sharing a common sqr function and returning theipow3 and ipow4 functions simultaneously in a 2-tuple:

# let (ipow3, ipow4) =let sqr x ::: X * x in((fun x -> x * (sqr x)), fun X -> (sqr x) * (sqr x));;

val ipow3 : int -> int ::: <fun>val ipow4 : int -> int = <fun>

Factoring code is an important way to keep programs manageable. In particular, programscan be factored much more aggressively in the presence of higher-order functions - somethingwhich can be done in OCaml but not Java, C++ or Fortran. We shall discuss such factoring


of OCaml programs as a means of code structuring in chapter 2. In the mean time, we shallexamine functions which perform computations by applying themselves.

As we have already seen, variable names in variable and function definitions refer to theirpreviously defined values. This default behaviour can be overridden using the ree keyword,which allows a variable definition to refer to itself. This is necessary to define a recursivefunction10 . For example, the following implementation of the ipow function, which computesnm for n, m 2:: 0 E Z, calls itself recursively with smaller m to build up the result until thebase-case n m = 1 for m = 0 is reached:

# let ree ipow n m:= if m:= 0 then 1 else n * ipow n (m - 1);;val ipow : int -> int -> int := <fun>

For example, 216 = 65,536:

# ipow 2 16;;- : int:= 65536

All of the programming constructs we have just introduced may be structured into modules.

1.5.1.8 Modules

In OCaml, modules are the most commonly used means to encapsulate related definitions. Forexample, many function definitions pertaining to lists are encapsulated in the List module.Visible definitions in modules may be referred to by the notation module. name where moduleis the name of the module and name is the name of the type or variable definition. Forexample, the List module contains a function length which returns the number of elementsin the given list:

# List . length ["one", "two", "three"];;- : int:= 3

The Pervasives module contains many common definitions, such as sqrt, and is automatically opened before a program starts so these definitions are available immediately.

The OCaml module system and program structuring in general are examined in chapter 2.We shall now examine some of the more advanced features of OCaml in more detail.

1.5.2 Pattern matching

As a program is executed, it is quite often necessary to choose the future course of action basedupon the value of a previously computed result. As we have already seen, a two-way choice canbe implemented using the if construct. However, the ability to choose from several differentpossible actions is often desirable. Although such cases can be reduced to a series of if tests,languages typically provide a more general construct to compare a result with several differentpossibilities more succinctly, more clearly and sometimes more efficiently than manually nested

10A recursive function is a function which calls itself, possibly via other functions.

1.5. OCAML SYNTAX 17

ifs. In Fortran, this is the SELECT CASE construct. In C and C++, it is the switch caseconstruct.

Unlike conventional languages, OCaml allows the value of a previous result to be comparedagainst various patterns - pattern matching. As we shall see, this approach is considerablymore powerful than the conventional approaches.

The most common pattern matching construct in OCaml is in the mat ch ... with ... expression:

match expr withpatternl -> exprlpattern2 -> expr2pattern3 -> expr3

This evaluates expr and compares the resulting value firstly with pattern1 then with pattern2 and so on, until a pattern is found which matches the value of expr, in which casethe corresponding expression (e.g. expr2) is evaluated and returned. A pattern is an expression composed of constants and variable names. When a pattern matches an argument, thevariables are bound to values of the corresponding expressions.

Patterns may contain arbitrary data structures (tuples, records, variant types, lists and arrays)and, in particular, the cons operator:: may be used in a pattern to decapitate a list. Also,the pattern _ matches any value without assigning a name to it. This is useful for clarifyingthat part of a pattern is not referred to in the corresponding expression.

For example, the following function compares its argument with several possible patterns oftype int, returning the expression of type string corresponding to the pattern which matches:

# let f i = match i witho -> "Zero"3 -> "Three"

-> "Neither zero nor three";;

Applying this function to some expressions of type int demonstrates the functionality of thematch construct:

# f 0;;- : string = "Zero"# f 1;;- : string = "Neither zero nor three"# f (1 + 2);;- : string = "Three"

As pattern matching is such a fundamental concept in OCaml programming, we shall provideseveral more examples using pattern matching in this section.

A function is_empty_list which examines a given list and returns true if the list is emptyand false if the list contains any elements, may be written without pattern matching bysimply testing equality with the empty list:

18

# let is_empty_list 1 =

1 = [J;;val is_empty_list: 'a list -> bool = <fun>


Using pattern matching, this example may be written using the match ... with ... constructas;

# let is_empty_list 1 = match 1 with[J -> true

I _ -> false;;val is_empty_list: 'a list -> bool = <fun>

Note the use of the anonymous _ pattern to match any value, in this case accounting for allother possibilities.

The is_empty_list function can also be written using the function ... construct, used tocreate one-argument A-functions which are pattern matched over their argument:

# let is_empty_list = function[J -> true

I _ -> false;;val is_empty_list: 'a list -> bool = <fun>

In general, functions which pattern match over their last argument may be rewritten moresuccinctly using function. Let us now consider some additional sophistication supported byOOaml's pattern matching.

1.5.2.1 Guarded patterns

Patterns can also have arbitrary tests associated with them, written using the when construct.Such patterns are referred to as guarded patterns and are only allowed to match when theassociated boolean expression evaluates to true. For example, the following function evaluatesto true only for lists which contain three integers, i, j and k, satisfying the equality i-j-k = 0:

# let f = function[i; j; kJ when i - j - k = 0 -> true

I _ -> false;;val f : int list -> bool = <fun># f [2; 3J;;- : bool = false# f [5; 2; 3J;;- : bool = true# f [1; 2; 3J;;- : bool = false

Subsequent patterns sharing the same variable bindings and corresponding expression may bewritten in the short-hand notation:

match ... withpattern1 I pattern2 I ... -> ...

I ...


For example, the following function returns true if the given integer is in the set {-I, 0, I}and false otherwise:

# let is_sign = function

-1 I °I 1 - > trueI _ -> false;;

val is_sign: int -> bool = <fun>

The sophistication provided by pattern matching may be misused. Fortunately, the OCamlcompilers go to great lengths to enforce correct use, even brashly criticising the programmersstyle when appropriate.

1.5.2.2 Erroneous patterns

Sequences of patterns which match to the same corresponding expression are required to sharethe same set of variable bindings. For example, although the following function makes sense toa human, the OCaml compilers object to the patterns (a, 0.) and (0. , b) binding differentsets of variables ({a} and {b}, respectively):

# let product a b = match (a, b) withlli 0.) 1~ b) -> 0.

I (a, b) -> a*. b;;Variable a must occur on both sides of this I pattern

In this case, this function could be corrected by using the anonymous _ pattern as neither anor b is used in the first case:

# let product a b = match (a, b) withC,O.) 1(0.,_)->0.

I (a,b)->a*.b;;val product : float -> float -> float = <fun>

This actually conveys useful information about the code. Specifically, that the values matchedby _ are not used in the corresponding expression.

OCaml uses type information to determine the possible values of expression being matchedover. If the set of pattern matches fails to cover all of the possible values of the input then,at compile-time, the compiler emits:

Warning: this pattern-matching is not exhaustive

followed by examples of values which could not be matched. If a program containing such pattern matches is executed and no matching pattern is found at run-time then the Mat ch_failure

exception is raised. Exceptions will be discussed in section 1.5.3.

For example, in the context of the variant type:

# type int_option = None I Some of int; ;type int_option = None I Some of int


The OCaml compiler will warn of a function matching only Some ... values and neglectingthe None value:

# let extract = function Some i -> .!; ;Warning: this pattern-matching is not exhaustive.Here is an example of a value that is not matched:Noneval extract: int_option -> int = <fun>

This extract function then works as expected on Some '" values:

# extract (Some 3);;- : int = 3

but causes a Match_failure exception to be raised at run-time if a None value is given, asnone of the patterns in the pattern match of the extract function match this value:

# extract None;;Exception: Match_failure ("", 5, -40).

As some approaches to pattern matching lead to more robust programs, some notions of goodand bad programming styles arise in the context of pattern matching.

1.5.2.3 Good style

The compiler cannot provel1 that any given pattern match covers all eventualities in thegeneral case. Thus, some style guidelines may be productively adhered to when writing patternmatches, to aid the compiler in its proofs:

• Guarded patterns should be used only when necessary. In particular, in any givenpattern matching, at least one pattern should be unguarded.

• Unless all eventualities are clearly covered (such as [] and h: : t which, between them,match any list) the last pattern should be general.

As proof generation cannot be automated in general, the OCaml compilers do not try to provethat a sequence of guarded patterns will match all possible inputs. Instead, the programmeris expected to adhere to a good programming style, making the breadth of the final matchexplicit by removing the guard. For example, the OCaml compilers do not prove that thefollowing pattern match covers all possibilities:

# let sign = functioni when i ~ O. ->.::1.

I O. -> QI i when i ~ O. -> 1; ;

Warning: this pattern-matching is not exhaustive.Here is an example of a value that is not matched:1.(However, some guarded clause may match this value.)val sign: float -> int = <fun>

llIndeed, it can be proven that the act of proving cannot be automated in the general case.

1.5. OCAML SYNTAX

In this case, the function should have been written without the guard on the last pattern:

# let sign'" functioni when i < O. ->-1

I O. -> 0I _ -> 1;;

val sign : float -> int = <fun>

21

Also, the OCaml compilers will try to determine any patterns which can never be matched.If such a pattern is found, the compiler will emit a warning. For example, in this case thefirst match accounts for all possible input values and, therefore, the second match will neverbe used:

# let product a b = match (a, b) with(a, b) -> a *. b

I LO·)l~d.->O.;;Warning: this match case is unused.val product : float -> float -> float = <fun>

When matching over the constructors of a type, all eventualities should be caught explicitly,Le. the final pattern should not be made completely general. For example, in the context ofa type which can represent different number representations:

# type number'" Integer of int I Real of float; ;type number = Integer of int I Real of float

A function to test for equality with zero could be written in the following, poor style:

# let bad_is_zero = functionInteger 0 -> true

I Real O. -> trueI _ -> false;;

val bad_is_zero : number -> bool = <fun>

When applied to various values of type number, this function correctly acts a predicate to testfor equality with zero:

# bad_is_zero (Integer (-1»;;- : bool = false# bad_is_zero (Integer 0);;- : bool = true# bad_is_zero (Real 0.);;- : bool = true# bad_is_zero (Real 2.6);;- : bool = false

Although the bad_is_zero function works in this case, a better style would be to extract thenumerical values from the constructors and test their equality with zero, avoiding the finalcatch-all case in the pattern match:

22

# let good_is_zero = functionInteger i -> i = 0

I Real x -> x = O. ; ;val good_is_zero : number -> bool = <fun>


Not only is the latter style more concise but, more importantly, this style is more robust. Forexample, if whilst developing our program, we were to supplement the definition of our numbertype with a new representation, say of the complex numbers z = x + iy E C:

# type number = Integer of int I Real of float I Complex of float * float;;type number = Integer of int I Real of float I Complex of float * float

the bad_ is_zero function, which is written in the poor style, would compile without warningto give incorrect functionality:

# let bad_is_zero = functionInteger 0 -> true

I Real O. -> trueI _ -> false;;

val bad_is_zero : number -> bool = <fun>

Specifically, this function treats all values which are not zero-integers and zero-reals as beingnon-zero. Thus, zero-complex z = 0 + Oi is incorrectly deemed to be non-zero:

# bad_is_zero (Complex (0., 0.));;- : bool = false

In contrast, the good_is_zero function, which was written using the good style, would allowthe compiler to spot that part of the number type was no longer being accounted for in thepattern match:

# let good_is_zero = functionInteger i -> i =- Q

I Real x -> !. =- ~; ;Warning: this pattern-matching is not exhaustive.Here is an example of a value that is not matched:Complex <-, _)val good_is_zero : number -> bool = <fun>

The programmer could then supplement this function with a case for complex numbers:

# let good_is_zero = functionInteger i -> i = 0

I Real x -> x = o.I Complex (x, y) -> X = O. && Y= O. ;;

val good_is_zero : number -> bool = <fun>

The resulting function would then provide the correct functionality:

# good_is_zero (Complex (0., 0.));;- : bool = true


Clearly, the ability have such safety checks performed at compile-time can be very valuable.This is another important aspect of safety provided by the OCamllanguage, which results inconsiderably more robust programs.

Due to the ubiquity of pattern matching in OCaml programs, the number and structureof pattern matches can be non-trivial. In particular, patterns may be nested and may beperformed in parallel.

1.5.2.4 Nested patterns

In some cases, nested pattern matches may be desirable. Inner pattern matches may bebundled either into parentheses (. . .) or, equivalently, into a begin ... end construct. Whensplit across multiple lines, the begin ... end construct is the conventional choice. For example,the following function tests equality between two values of type number12 :

# let number_equal a b = match a withInteger i ->

beginmatch b with

Integer j when i = j -> trueI Complex (x, 0.) I Real x when x = float_of_int i -> trueI Integer _ I Real _ I Complex _ -> false

endReal x I Complex (x, 0.) ->

beginmatch b with

Integer i when x = float_of_int i -> trueI Complex (y, 0.) I Real y when x = y -> trueI Integer _ I Real _ I Complex _ -> false

endComplex (xi, yi) ->

beginmatch b with

Complex (x2, y2) when xi = x2 && yi = y2 -> trueI Integer _ I Real _ I Complex _ -> false

end; ;val number_equal: number -> number -> bool = <fun>

In many cases, nested patterns may be written more succinctly and, in fact, more efficient,when presented as a single pattern match which matches different values simultaneously.

1.5.2.5 Parallel pattern matching

In many cases, nested pattern matches may be combined into a single pattern match. Thisfunctionality is often obtained by combining variables into a tuple which is then matched over.This is known as parallel pattern matching. For example, the previous function could havebeen written:

12Note that the built-in, polymorphic equality = could be used to compare values of type number but thiswould perform a structural comparison rather than a numerical comparison, e.g. the expression Real 1. =

Complex (1., 0.) evaluates to false.


# let number_equal a b = match (a, b) with(Integer i, Integer j) -> i = j(Integer i, (Real x I Complex (x, 0.)))«Real x I Complex (x, 0.)), Integer i) -> X = float_of_int i«Real xi I Complex (xi, 0.)), (Complex (x2, 0.) I Real x2)) -> xi = x2«Integer _ I Real _), Complex _)(Complex _, (Integer _ I Real _)) -> false

I (Complex (xi, yi), Complex (x2, y2)) -> xi = x2 && yi = y2;;val number_equal: number -> number -> bool = <fun>

As a core feature of the OCamllanguage, pattern matching will be used extensively in the remainder ofthis book, particularly when dissecting data structures in chapter 3. One remainingform of pattern matching in OCaml programs appears in the handling of exceptions.

1.5.3 Exceptions

In many programming languages, program execution can be interrupted by the raising13 of anexception. This is a useful facility, typically used to handle problems such as failing to open afile or an unexpected flow of execution (e.g. due to a program being given invalid input) butexceptions are also useful as an efficient means to escape a computation, as we shall see insection 7.3.3.3.

Like a variant constructor in oCaml, the name of an exception must begin with a capitalletter and an exception mayor may not carry an associated value. Before an exception can beused, it must declared. An exception which does not carry associated data may be declaredas:

exception Name

An exception which carries associated data of type type may be declared:

exception Name of type

Exceptions are raised using the raise construct. For example, the following raises a built-inexception called Failure which carries a string:

# raise (Failure "My problem");;Exception: Failure "My problem".

Exceptions may also be caught using the syntax:

tryexpr

withpatternl - > expr1pattern2 -> expr2pattern3 -> expr3

13Sometimes known as throwing an exception, e.g. in the context of the C++ language.


where expr is evaluated and its result returned if no exception was raised. If an exception wasraised then the exception is matched against the patterns and the value of the correspondingexpression (if any) is returned instead.

Note that, unlike other pattern matching constructs, patterns matching over exceptions neednot account for all eventualities - any uncaught exceptions simply continue to propagate.

For example, an exception called ZeroLength, which does not carry associated data, may bedeclared as:

# exception ZeroLength;;exception ZeroLength

A function to normalise a 2D vector r = (x, y) to create a unit 2D vector r = r/ IrI, catchingthe erroneous case of a zero-length vector, may then be written:

# let norm (x, y) =

let 1 = sqrt (x*.x +. y*.y) inif 1 = O. then raise ZeroLength elselet il = 1. I. 1 in(il* .x, iH .y);;

val norm: float * float -> float * float = <fun>

Applying the norm function to a non-zero-length vector produces the correct result to withinnumerical error (a subject discussed in chapter 4):

# norm (3., 4.);;- : float * float = (0.600000000000000089, 0.8)

Applying the norm function to the zero vector raises the ZeroLength exception:

# norm (0., 0.);;Exception: ZeroLength.

A "safe" version of the norm function might catch this exception and return some reasonableresult in the case of a zero-length vector:

# let safe_norm r = try norm r with ZeroLength -> (0., 0.);;val safe_norm: float * float -> float * float = <fun>

Applying the safe_norm function to a non-zero-length vector causes the result of the expressionnorm r to be returned:

# safe_norm (3. , 4.); ;- : float * float = (0.600000000000000089, 0.8)

However, applying the safe_norm function to the zero vector causes the norm function to raisethe ZeroLength exception which is then caught within the safe_norm function which thenreturns the zero vector:

26

# safe_norm (0.,0.);;-: float * float = (0.,0.)


The use of exceptions to handle unusual occurrences, such as in the safe_norm function,is one important application of exceptions. This functionality is exploited by many of thefunctions provided by the core OCaml library, such as those for handling files (discussed inchapter 5). The safe_norm function is a simple example using exceptions which could havebeen written using an if expression. However, exceptions are much more useful in morecomplicated circumstances, where many if expressions would be required in order to achievethe same effect.

Another important application is the use of exceptions to escape computations. The usefulnessof this way of exploiting exceptions cannot be fully understood without first understandingdata structures and algorithms and, therefore, this topic will be discussed in much more detailin chapter 3 and again, in the context of performance, in chapter 7.

The Pervasives module defines two exceptions, Invalid_argument and Failure, as well astwo functions which simplify the raising of these exceptions. Specifically, the invalid_argand failwith functions raise the Invalid_argument and Failure exceptions, respectively,using the given string.

Support for exceptions is not uncommon in modern languages. However, the automatic generalisation of functions over all types of data for which they are valid is rather unusual and isdiscussed next.

1.5.4 Polymorphism

As we have seen, OCaml will infer types in a program. But what if a specific type cannot beinferred? In this case, OCaml will create a polymorphic function which can act on any suitabletype. For example, the following defines a higher-order function f which accepts function gand a value x, and applies g to the result of applying g to x:

# let f g x = g (g x);;val f : (' a -> 'a) -> 'a -> 'a = <fun>

Note that OCaml uses the notation 'a (conventionally written a) when writing the type ofthe function f. This OCaml function may then be used for any type 'a. For example, thefollowing uses the polymorphic function to calculate 24, first using the type int and thenusing the type float:

# f (fun X -> x * x) 2;;- : int = 16# f (fun X -> x *. x) 2. ; ;- : float = 16.

Types may be constrained by specifying types in a definition using the syntax (expr : type).For example, specifying the type of the argument x to be a floating-point value in the definitionof the function f results in OCaml inferring all of the previously polymorphic types to be float:

1.5. OCAML SYNTAX

# let f g ex : float) = g (g x);;val f : (float -> float) -> float -> float = <fun>

Although omitting the brackets results in the same types being inferred in this case:

# let f g x : float = g (g x);;val f : (float -> float) -> float -> float = <fun>

27

The syntax of this latter form actually constrains the return type of the function f to befloat, rather than constraining the type of the last argument, as in the former example.

Variant types may contain polymorphic types, in which case the name of the variant typemust be preceded by its polymorphic type arguments. For example, the polymorphic option:

# type 'a option = None I Some of 'atype 'a option = None I Some of 'a

can have values such as Some 1 or Some 2 (for which the type is written int option) and thevalue None (for which the type defaults to 'a option). For example, this 3-tuple is allowed tohave elements of different types:

# (Some 1, Some 2, None);;- : int option * int option * 'a option = (Some 1, Some 2, None)

In contrast, the elements of a list must all be of the same type and, therefore, a None presentedas an alternative to a Some 1 will be inferred to be of type int option:

# [Some 1; Some 2; None] ; ;- : int option list = [Some 1; Some 2; None]

The polymorphic option type 'a opt ion is actually already in the Pervasive s module.

Many polymorphic functions are provided by the language. Most notably the comparisonoperators =, <>, <, >, <=, >=. Also the total ordering function compare which provides conventional ordering over the int and float types as well as lexicographic ordering over lists,arrays and strings:

{

-1

compare ab = ~

# compare 1 2;;- : int = -1# compare "slug" "plug";;- : int = 1

a<ba=ba>b

The min and max functions use polymorphic comparison to find the smaller and larger of twogiven arguments, respectively:

# max 1 2;;- : int = 2# min "slug" "plug";;- : string = "plug"

Before completing this introduction to OCaml, we have one remaining exotic topic to cover.


1.5.5 Currying

A curried function is a function which returns a function as its result. Curried functions axebest introduced as a more powerful alternative to the conventional (non-curried) functionsprovided by imperative programming languages.

Effectively, imperative languages only allow functions which accept a single value (often atuple) as an axgument. For example, a raise-to-the-power function for integers would have toaccept a single tuple as an axgument which contained the two values used by the function:

# let rec ipow (x, n) = if n = 0 then 1. else x *. ipow (x, n - 1);;val ipow : float * int -> float = <fun>

But, as we have seen, OCaml also allows:

# let rec ipow x n = if n = 0 then 1. else x *. ipow x (n - 1);;val ipow : float -> int -> float = <fun>

This latter approach is actually a powerful generalization of the former, only available infunctional programming languages.

The difference between these two styles is subtle but important. In the latter case, the typecan be understood to be:

val ipow : float -> (int -> float)

Le. this ipow function accepts an floating-point number and returns a function which raisesthat number to the given power. A function which returns a function is referred to as acurried function. As the curried style is more general than the non-functional style, functionsare written in curried form by default in functional languages.

Now that we have examined the syntax of the OCaml language, we shall explain why theexotic programming styles offered by OCaml are highly relevant in the context of scientificcomputing.

1.6 Functional vs Imperative programming

The vast majority of current programmers write in imperative languages and use an imperativestyle. This refers to the use of statements or expressions which axe designed to act by way ofa side- effect.

For example, the following declaxes a mutable variable called x, executes a statement whichhas the effect of modifying the value of x (squaring it) and then examines the resulting valueof x:

# let x = ref 2;;val x : int ref = {contents = 2}

# x := !x * !x;;- : unit = 0# !x;;- : int = 4

1.6. FUNCTIONAL VB IMPERATIVE PROGRAMMING 29

The only action of the statement "x : = ! x * !x" is to modify the value of an existing variable.This is its side-effect. In this case, the statement has no other effect and, consequently, returnsthe value 0 of type uni t.

The functional equivalent to this imperative style is to define a new value (in this case, of thesame name so that the old value is superseded):

# let x = 2;;

val x : int = 2# let x = x * x; ;val x : int = 4# x;;- : int = 4

Purely functional programming has several advantages over imperative programming:

• easier to determine variable values, as they cannot be altered

• easier proofs of correctness

• typically more concise in terms of the quantity of source code required to perform agiven task

• the ability to reuse old data structures (known as persistence) without having to worryabout undoing state changes and unwanted interactions

• trivial multi-threading of programs due to the lack of data structure interdependencies

OCaml supports both functional and imperative programming and, hence, is known as animpure functional programming language. In particular, the OCaml core library providesimplementations of several imperative data structures (strings, arrays and hash tables) as wellas functional data structures (lists, sets and maps). We shall examine these data structuresin detail in chapter 3.

In addition to mutable data structures, the OCaml language provides looping constructs forimperative programming. The while loop executes its body repeatedly while the conditionis true, returning the value of type unit upon completion. For example, this while looprepeatedly decrements the mutable variable x, until it reaches zero:

# let x = ref 5;;val x : int ref = {contents = 5}# while ! x > 0 do

decr x;done; ;

- unit = 0# !x;;- : int = 0

The for loop introduces a new loop variable explicitly, giving the initial and final values ofthe loop variable. For example, this for loop runs a loop variable called i from 1 to five,incrementing the mutable value x five times in total:

30

# for i = 1 to 5 doincr X;

done; ;- unit = ()

# !x;;- : int = 5


Thus, while and for loops in OCaml are analogous to those found in most imperative languages.

In practice, the ability to choose between imperative and functional styles when programmingin OCaml is very productive. Many programming tasks are naturally suited to either animperative or a functional style. For example, portions of a program which deal with userinput, such as mouse movements and key-presses, are likely to benefit from an imperativestyle where the program maintains a state and user input may result in a change of state. Incontrast, functions dealing with the manipulation of complex data structures, such as treesand graphs, are likely to benefit from being written in a functional style, using recursivefunctions and immutable data, as this greatly simplifies the task of writing such functionscorrectly. In both cases, functions can refer to themselves - recursive functions. However,recursive functions are pivotal in functional programming, where they are used to implementfunctionality equivalent to the while and for looping constructs we have just examined.

1.7 Recursion

When a programmer is introduced to the concept of functional programming for the first time,the way to implement simple programming constructs such as loops does not appear obvious.If the loop variable cannot be changed then how can the loop proceed?

In essence, the answer to this question lies in the ability to convert looping constructs, such asmathematical sums and products, into recursive constructs, such as recurrence relations. Forexample, the factorial function is typically considered to be a product with the special caseO! = 1:

{1 n=O

n! = n.Ili =l ~ = 1 x 2 x ... x (n - 1) x n n> 0

However, this may also be expressed as a recurrence relation:

{1 n= 0n l -

. - n x (n - i)! n> 0

Both the product and recurrence-relation forms of the factorial function may be expressedin OCam!. The product form is most obviously implemented in an imperative style, usingmutable variables which are iteratively updated to accumulate the value of the product:

# let factorial n =let ans = ref 1 and n = ref n inwhile (!n > 1) do ( ans := !ans * !n; decr n) done;!ans;;

val factorial: int -> int = <fun># factorial 5;;- val int = 120

1.7. RECURSION 31

In contrast, the recurrence relation can be implemented more simply, as a recursive function:

# let rec factorial n = if n < 1 then 1 else n * factorial (n - 1);;val factorial: int -> int = <fun># factorial 5;;- val int = 120

In the case of the factorial function, the functional style is considerably more concise and,more importantly, is much easier to reason over, Le. the functional version is more easily seento be correct. For sophisticated and intrinsically complicated computations, these advantagesresult in functional programs often being both simpler and more reliable than their imperativeequivalents.

However, functional programming is not always preferable to imperative programming. Manyproblems naturally lend themselves to either imperative or functional styles. Clearly thefactorial function is most easily implemented when considered as a recurrence relation. Othercomputations are most naturally represented as sums and products. For example, the dotproduct a· b of a pair of d-dimensional vectors a and b is most naturally represented as asum:

d

a.b=I::aiXbii=l

This sum can be computed by a rather obfuscated recursive function:

# let dot a b =let len = Array. length a inif len <> Array . length b then invalid_arg "dot" elselet rec aux i accu =

if i < len then aux (i+1) (accu +. a. (i) *. b. (i)) else accu inaux 0 0.;;

val dot : float array -> float array -> float = <fun>

or by a clearer iterative function:

# let dot a b =let len = Array. length a inif len <> Array . length b then invalid_arg "dot" elselet r = ref O. infor i = 0 to len - 1 do

r := !r +. a. (i) *. b. (i)done;!r;;

val dot: float array -> float array -> float = <fun>

For example, (1,2,3) . (2,3,4) = 20:

# dot [11.; 2. ; 3. I] [12.; 3.; 4. I] ; ;- : float = 20.

In this case, the imperative form of the vector dot product is easier to understand than therecursive form. Regardless of the choice of functional or imperative style, structured designand implementation is an important way to manage complicated problems.

Finally, this introductory chapter would not be complete without providing a taste of thevalue of OCaml in the context of scientific computing.


1.8 Applicability

Conventional languages vehemently separate functions from data. In contrast, OCaml allowsthe seamless treatment of functions as data. Specifically, OCaml allows functions to be storedas values in data structures, passed as arguments to other functions and returned as the resultsof expressions, including the return-values of functions. As we shall now demonstrate, thisability can be of direct relevance to scientific applications.

Many numerical algorithms are most obviously expressed as a function which accepts and actsupon another function. For example, consider a function called d which calculates a numericalapproximation to the derivative of a given, one-argument function. The function d accepts afunction f : R ~ R and a value x and returns a function to compute an approximation to thederivative ~ given by d: (R ~ R) ~ (R ~ R):

d[f](x) = f(x + &)~ f(x - &) ~ ~~

This is easily written in OCaml as the curried function d14:

# let d f x:let eps = sqrt epsilon_float in((f (x +. eps)) -. (f (x -. eps)));. (2. *. eps);;

val d: (float -> float) -> float -> float = <fun>

For example, consider the function f(x) = x 3 - x-I:

# let f x = x *. x *. x -. x -. 1.;;val f : float -> float = <fun>

The higher-order function d can be used to approximate (lil = 11:ax x=2

# d f 2.;;- : float = 10.9999999701976776

More importantly, as d is a curried function, we can use d to create derivative functions. Forexample, the derivative f'(x) = ~:

# let f' = d f;;val f' : float -> float = <fun>

The function f' can now be used to calculate a numerical approximation to the derivative off for any x. For example, f'(2) = 11:

# f' 2.;;- : float = 10.9999999701976776

As this demonstrates, fUllctional programming languages such as OCaml offer many considerable improvements over conventional languages used for scientific computing. Before continuing, readers should be warned that, once learned, the techniques presented in this book soonbecome indispensable and, therefore, there is no going back after this chapter.

14The value epsilon_float, defined in the Pervasives module, is the smallest floating-point number which,when added to 1, does not give 1. The square root of this value can be shown to give optimal properties whenused in this way.

Chapter 2

Program Structure

In this chapter, we introduce some programming paradigms designed to improve programstructure. As the topics addressed by this chapter are vast, we shall provide only overviewsand references to literature containing more thorough descriptions and evaluations of therelative merits of the various approaches.

Structured programming is all about managing complexity. Modern computational approachesin all areas of science involve great intrinsic complexity. Consequently, the efficient structuringof programs is vitally important if this complexity is to be managed in order to produce robust,working programs.

Historically, many different approaches have been used to facilitate the structuring of programs. The simplest approach involves splitting the source code of the program betweenseveral different files, known as compilation units. A marginally more sophisticated approachinvolves the creation of namespaces, allowing variable and function names to be hierarchical,and structures, allowing types to be combined and variables to be grouped. More recently, anapproach known as object-oriented (00) programming has become widespread. As we shallsee, the OCaml language supports all of these approaches as well as others. Consequently,OCaml programmers are encouraged to learn the relative advantages and disadvantages ofeach approach in order that they may make educated decisions regarding the design of newprograms.

Structured programming is not only important in the context of large, complicated programs.In the case of simple programs, understanding the concepts behind structured programmingcan be instrumental in making efficient use of existing libraries.

2.1 Nesting

The concept of nested OCaml definitions is best introduced to scientists by drawing an analogywith the nesting of definitions in science:

1. When asked to define an animal, scientists from all disciplines would reply with verysimilar definitions - the definition of "animal" is global.

33

34 CHAPTER 2. PROGRAM STRUCTURE

2. When asked to define a circuit, a physicist is likely to define an electronic circuit but ananaesthesiologist is likely to define an anaesthetic circuit - there are multiple, differentdefinitions of "circuit" which are local to different scientific disciplines.

3. When asked to define the space-time metric, cosmologists and astrophysicists are likely togive similar definitions whereas scientists from other disciplines are likely to be unable toanswer - the definition of "space-time metric" is local to the study of general relativity.

4. When asked to define a species, a school-level scientist is likely to reply with a simple,broad and probably unworkable definition whereas scientists in different specialised fieldsare likely to reply with different, specialised definitions - the definition of "species" isrefined separately in separate scientific disciplines.

Analogously, function and variable definitions can be structured hierarchically within anOCaml program, allowing some definitions to be globally visible, others to be defined separately in distinct portions of the hierarchy, others to be visible only within a single branchof the hierarchy and others to be refined, specialising them within the hierarchy.

Compared to writing a program as a flat list of function and variable definitions, structuring aprogram into a hierarchy of definitions allows the number of dependencies within the programto be managed as the size and complexity of the programs grows. This is achieved by nestingdefinitions. Thus, nesting is the simplest approach to the structuring of programs l .

For example, the ipow3 function defined in the previous chapter contains a nested definitionof a function sqr:

# let ipow3 x =

let sqr x = x * x inx * sqr x;;


Nesting can be more productively exploited when combined with the factoring of subexpressions, functions and higher-order functions.

2.2 Factoring

The concept of program factoring is best introduced to a scientist in relation to the conventional factoring of mathematical expressions. When creating a complicated mathematicalderivation, the ability to factor subexpressions, typically by introducing a substitution, is aproductive way to manage the incidental complexity of the problem.

For example, the following function definition contains several duplicates of the subexpressionx-I:

f(x) = (x -1- (x -1) (x - 1))X-l

1Remarkably, many other languages, including C and C++, do not allow nesting of function and variabledefinitions.

2.2. FACTORING

By factoring out a subexpression a(x), this expression can be simplified:

a(x) = x-I

f(a) = (a - a2t

35

The factoring of subexpressions, such as x-I, is the simplest form of factoring available to aprogrammer. The OCaml function equivalent to the original, unfactored expression is:

# let f x = (x -. 1. -. (x -. 1.) *. (x -. 1.)) ** (x -. 1.);;val f : float -> float = <fun># f 5.;;

- : float = 20736.

The OCaml function equivalent to the factored form is:

# let f x = let a = x -. 1. in (a -. a *. a) ** a; ;val f : float -> float = <fun># f 5.;;

- : float = 20736.

By simplifying expressions, factoring is a means to manage the complexity of a program.However, the previous example only factors a subexpression. In functional languages, such asOCaml, the ability to factor out higher-order functions is much more powerful than subexpression factoring.

Whenever several definitions share functionality but implement this functionality independently, it is likely that a higher-order function may be factored out. In the context of functional programming, this form of factoring is most often seen with algorithms which act overdata structures. As we shall see in chapter 3, the higher-order map and fold functions are usedso commonly with data structures that implementations typically provide these functions. Indeed, these functions are provided by the OCaml core library for the implementations of lists,arrays, maps, sets and hash tables. In the mean time, let us consider a kind of fold functionwhich does not act over an explicit data structure but, instead, acts over an implicit list ofconsecutive integers.

The following functions compute the sum and the product of a semi-inclusive range of integers[l, u):

# let rec sum_range 1 u =

if 1 int -> int = <fun># let rec product_range 1 u =

if 1 int -> int = <fun>

For example, the product_range function may be used to compute 5! as the product of theintegers [1,6):

# product_range 1 6;;- : int = 120


fold_range f accu 1 9

Figure 2.1: The fold_range function can be used to accumulate the result of applying afunction f to a contiguous sequence of integers, in this case the sequence [1,9).

The sum_range and product_range functions clearly share some functionality. Specifically,they both apply a function (integer add + and multiply *, respectively) to u - 1 beforerecursively applying themselves to the smaller range [l, u - 1) until the range contains nointegers l = u. This shared functionality can be factored out as a higher-order functionfOld_range:

# let rec fold_range f accu 1 u =

if 1 int -> 'a) -> 'a -> int -> int -> 'a = <fun>

The fOld_range function accepts a function f, an accumulator accu and a range specifiedby two integers land u. Application of the fOld_range function to mimic the sum_range orproduct_range functions begins with a base case in accu (0 or 1, respectively). If l 2 u thenaccu is returned as the result. Otherwise, the fOld_range function recurses with the smallerrange [l, u - 1) and a new accumulator given by applying the function f to the old accu andto u, which will give accu + u in the case of the sum_range function or accu * u in the caseof product_range. This process, known as a right fold because f is applied to the rightmostinteger u-1 first, is illustrated in figure 2.1.

The sum_range and product_range functions may then be expressed more simply in terms ofthe fOld_range function by supplying the integer addition or multiplication operators2 andbase case:

# let rec sum_range 1 u = fold_range ( + ) 0 1 u;;val sum_range: int -> int -> int = <fun># let rec product_range 1 u = fold_range ( * ) 1 1 u;;val product_range : int -> int -> int = <fun>

Furthermore, these forms of the sum_range and product_range functions can be furthersimplified by performing what is known in computer science as 1]-reduction, cancelling thefinal arguments 1 and u from both sides of a curried function definition:

2The non-infix form of the + operator is written ( + ), Le. ( + ) a b is equivalent to a + b. Note thespaces to avoid ( * ) from being interpreted as the start of a comment.

2.3. MODULES

# let rec sum_range = fold_range ( + ) 0;;val sum_range : int -> int -> int = <fun># let rec product_range = fold_range ( * ) 1;;val product_range: int -> int -> int = <fun>

These functions work in exactly the same way as the originals:

# product_range 1 6;;- : int = 120

37

but their common functionality has been factored out into the fOld_range function.

In addition to simplifying the definitions of the sum_range and product_range functions, thefOld_range function may also be used in new function definitions. For example, the followinghigher-order function, given a length n and a function f, creates the list containing the nelements (f(0), f(l), ... , f(n -1)):

# let list_init n f = fold_range (fun 1 i -> f i :: 1) [J 0 n;;val list_init : int -> (int -> 'a) -> 'a list = <fun>

This list_init function uses the fOld_range function with a A-function, an accumulatorcontaining the empty list [] and a range [0, n). The A-function prepends each f i onto theaccumulator 1 to construct the result.

This is actually the list equivalent of the Array. init function for arrays. For example, thesetwo applications of these functions both create the sequence Xi = i for i = O... 9:

# Array.init 10 (fun i -> i);;- : int array = [10; 1; 2; 3; 4; 5; 6; 7; 8; 91J# list_initiO (fun i -> i) ; ;- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J

As we have seen, the nesting and factoring of functions and variables can be used to simplifyprograms and, therefore, to help manage intrinsic complexity. In addition to these approachesto program structuring, the OCaml language also provides two constructs designed to encapsulate program segments. We shall now examine the methodology and relative benefits of theapproaches offered by modules and objects.

2.3 Modules

We have already encountered several modules. In particular, the Pervasives module whichencapsulates many function and type definitions initialised before a program is executed, suchas operators on the built-in int, float, bool and string types. Also, the Array moduleencapsulates function and type definitions related to arrays, such as functions to create arrays(e.g. Array.init) and to count the number elements in an array (Array.length).

In general, modules are used to encapsulate related definitions, Le. function, type, data, module and object definitions. In addition to simply allowing related definitions to be grouped


together, the OCaml module system allows these groups to form hierarchies and allows interfaces between these groups to be defined, the correct use of which is then enforced by thecompiler at compile-time. This can be used to add "safety nets" when programming, improving program reliability. We shall begin by examining some of the modules available in the coreOCamllibrary before describing how a program can be structured using modules, stating thesyntactic constructs required to create modules and, finally, developing a new module.

The OCaml system comes with many modules, implementing a broad spectrum of functionalities, from file handling to implementations of various data structures. As we shall see laterin this chapter, these modules are provided in both source and compiled form. The sourcecode to these modules is a faithful source of wisdom, having been written by expert OCamlprogrammers. Moreover, the bundled modules are easily supplemented by user written modules which, particularly if well written, can be shared between several programs and evendistributed or sold.

Some conventions adhered to by the built-in modules, can be productively adhered to whencreating new modules. If a module's design is based around a single type, this type is conventionally named t, e.g. the Complex module in the core library provides a data type calledComplex. t for storing complex-valued numbers and several functions for acting upon them(neg, conj, sub, add, mul, inv, div, sqrt, norm, norm2, arg, polar, exp, log and pow).Also, the most fundamental function to construct a value of this type is conventionally namedmake, e.g. the Array .make and String.make functions create arrays and strings, respectively,containing repetitions of a given element.

In the interests of clarity and correctness, modules provide a well-defined interface known as asignature. The signature of a module declares the contents of the module which are accessibleto code which uses the module. For example, the type t in the Complex module is declaredin the signature3 . The code implementing a module is defined in the module structure.

The concepts of module signatures and structures are best introduced by example. Considera module called IntRange which encapsulates code for handling contiguous ranges of integersand includes the functionality of the fold_range function derived in the previous section.This module will include:

• an abstract type t to represent a contiguous range of integers.

• a function make to construct a range from a given pair of integers.

• a function mem to test a given integer for membership in a given range.

• functions fold_left and fOld_right to implement folds in different directions overranges.

• a function map_into_list to map a range into a list, applying a given function to eachelement.

• a function to_list to convert a range into a list of consecutive integers.

We shall begin by defining the interface to the module rigorously as a module signature.

3Specifically, in the core library file "complex.mli".

2.3. MODULES

2.3.1 Signatures

39

Module signatures declare the interfaces to modules. Modules which adhere to a given signature must define all of the constructs declared in the signature but are free to define additionalconstructs. However, only those constructs declared in the signature will be accessible, or visible, from code outside the module (any additional constructs are hidden).

Signatures may contain several different kinds of declaration:

• type declarations in the form type ....

• exception declarations in the form exception. '"

• variable and function type declarations in the form val. '"

• open the namespaces other signatures using open statements.

• replicate the contents of other signatures using include statements.

• other signature declarations, to nest signatures.

Signatures are declared using the syntax:

module type NAME = sig ... end

where the name of the signature (NAME) is conventionally written entirely in capital letters.

For example, the interface to our IntRange module may be defined rigorously as the modulesignature denoted (according to convention) INTRANGE:

module type INTRANGE =sig

type tval make : int -> int -> tval mem : int -> t -> boolval fold_left: ('a -> int -> 'a) -> 'a -> t -> 'aval fold_right : (int -> 'a -> 'a) -> t -> 'a -> 'aval map_into_list : (int -> 'a) -> t -> ' a listval to_list : t -> int list

end

The declarations made in this signature may then be implemented in a module structure whichadheres to this signature.

2.3.2 Structures

Module structures may contain several different kinds of definition which, combined, implement the internals of the module:

• type definitions in the form type ....


• exception definitions in the form exception ....

• variable and function definitions in the form let ....

• open the namespaces other module structures using open statements.

• replicate the contents of other module structures using include statements.

• other module signature and structure definitions, to nest modules.

Module structures are defined using the syntax:

module Name = struet ... end

where the name of the structure (Name) is required to begin with a capital letter.

Also, a module structure Name may be defined as adhering to an existing signature NAMEusing the syntax:

module Name : NAME = struet ... end

For example, the IntRange module may be defined as a structure adhering to the INTRANGEsignature:

module IntRange : INTRANGE =

struettype t = { 1 : int; u : int}

let make 1 u =if 1 <= u then {l = 1; u = u} else invalid_arg "IntRange.make"

let mem i r = r. 1 <= i && i < r. u

let fold_left f aeeu r =

let ree aux aeeu 1 u =

if 1 < u then aux (f aeeu 1) (1 + 1) u else aeeu inaux aeeu r.l r.u

let fold_right f r aeeu =

let ree aux 1 u aeeu =if 1 i)end

The foldJight function is the same as the fOld_range function, applying f to the sequenceof integers in decreasing order. The fold_left function applies f to 1 first (shrinking the range

2.3. MODULES 41

to [l + 1, u)) and, therefore, applying f to the sequence of integers in increasing order. As weshall see in chapter 3, many data structures provide fold_left and fold_right functions.We shall examine the use of the IntRange module in section 2.3.4.

In many cases, a signature is created for a specific module, in which case an anonymoussignature and corresponding structure may be productively defined simultaneously, using ananonymous signature.

2.3.3 Anonymous signatures

An anonymous signature and corresponding structure may be defined simultaneously as:

module Name : sig ... end:::: struct ... end

The IntRange module may be implemented as an anonymous signature and compliant structure as follows:

module IntRangesig

type tval make : int -> int -> tval mem : int -> t -> boolval fold_left: ('a -> int -> 'a) -> 'a -> t -> 'aval fold_right: (int -> 'a -> 'a) -> t -> 'a -> 'aval map_into_list : (int -> 'a) -> t -> 'a listval to_list : t -> int list

end ::::struct

type t :::: { 1 : int; u int}

let make 1 u ::::if 1 <:::: U then { 1 :::: 1; u = u } else invalid_arg "IntRange .make "

let mem i r = r.l <= i && i < r. u

let fold_left f accu r =

let rec aux accu 1 u =

if 1 < u then aux (f accu 1) (l + 1) u else accu inaux accu r.l r.u

let fold_right f r accu =

let rec aux 1 u accu =

if 1 i)end

The ability to declare an anonymous signature as the interface to a module structure is usefulwhen the signature has been designed specifically for the given module, rather than to enforcea consistent interface for several modules.


2.3.4 Use of the IntRange module

The IntRange module may be used to create and perform operations upon values of an abstracttype representing a range of consecutive integers. Abiding by convention, the type IntRange. tis used to represent a range of consecutive integers. Internally, this type is represented as arecord containing the lower and upper bounds of the range. Externally, this type is visible butabstract because the type name is declared in the signature as type t, i.e. without giving theinternals of the type. The module defines a make function to construct a value of type t froma given pair of integers, testing that they form a valid range. For example, having defined themodule in a top-level, a value of type IntRange. t may be constructed:

# let r = IntRange .make 0 10;;val r : IntRange. t = <abstr>

Note that the value is described as <abstr> to denote the contents of an abstract type.

The input to the make function is validated to ensure that a meaningful range is specified,Le. that l ~ u. Attempting to create an invalid range will result in an exception being raisedat run-time:

# IntRange.make 100;;Exception: Invalid_argument "IntRange .make " .

The mem function tests an integer for membership in an integer range. For example, 5 E [0,10)and 15 tJ. [0,10):

# IntRange.mem 5 r;;- : bool = true# IntRange.mem 15 r;;- : bool = false

The module then defines useful higher-order functions fold_left and fold_right, whichapply a given function over a range of integers in increasing and decreasing order, respectively.For example, these functions can be used to create a list of integers by using the cons operator

to prepend each integer onto a list, starting with the empty list []:

# IntRange. fold_left (fun 1 i -> i :: 1) [J r;;- : int list = [9; 8; 7; 6; 5; 4; 3; 2; 1; OJ# IntRange. fold_right (fun i 1 -> i :: 1) r [J;;- : int list = [0; 1; 2; 3; 4; 5; 6; 7; ~; 9J

Note that the fold_left function prepended 0 first whereas the fOld_right function prepended9 first.

This functionality is exploited by the map_into_list function which applies a given functionto each integer before prepending it onto a list, starting with the empty list. For example,this can be used to create a list of squares Xi = i 2 by supplying a suitable A-function:

# IntRange .map_into_list (fun i -> i * i) r;;- : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81J

2.3. MODULES 43

Finally, the map_into_list function is used to create a simple to_list function which suppliesan identity function in order to create a list of consecutive integers:

# IntRange.to_list r;;- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J

The functionality provided by this IntRange module can be used in several ways. For example,the list_init function, which computes the list (f(O), f(1), .. . ,f(n -1)) given the length nand function f, may be written in terms of IntRange .map_into_list:

# let list_init n f = IntRange.map_into_list f (IntRange.make 0 n);;val list_init : int -> (int -> 'a) -> 'a list

A list of lists may be constructed by applying the list_ini t function to create a row for eachcolumn:

# let matrix_init n m f =

let iniLrow f = lisLinit n f inlet init_col m= init_row (fun n -> f n m) inlist_init m init_col;;

val matrix_init : int -> int -> (int -> int -> 'a) -> 'a list list = <fun>

Matrices may then be created by passing the appropriate initialising function. For example,a function to create n x n identity matrices:

# let matrix_identity n =

matrix_init n n (fun i j -> if i = j then 1 else 0);;val matrix_identity: int -> int list list = <fun># matrix_identity 3;;- : int list list = [[1; 0; OJ; [0; 1; OJ; [0; 0; 1] J

The IntRange module which we have just developed may be thought of as an implicit datastructure, as the integers in a range are not stored explicitly (e.g. in a list) but, rather, areimplied by a pair of integers specifying the bounds of the range. As we shall see in chapter3, much of the functionality provided by our IntRange module is available in explicit datastructures.

2.3.5 Another example

Before concluding our introduction to modules, consider a module encapsulating type, variableand function definitions relating to ranges [l ... u) c JR. into a module. This module willinclude:

• A type t representing a range [l ... u) of real-valued numbers which is represented internally by two floating-point values representing l and u but abstracted by the signatureso that the contents of the internal representation can only be altered using functionsdefined in this module.


• A function make which creates a range [l ... u) from two floating-point values representingl and u.

• A function to_pair which converts a range [l ... u) into a 2-tuple of floating-point values.

• A function subrange which tests a pair of ranges to determine if the latter range is asubset of the former range.

• A function union which calculates the set union of a pair of ranges, returning a list ofranges.

• A function inter which calculates the set intersection of a pair of ranges, returning alist of ranges.

This module may be defined as:

module FloatRange :sig

type tval make: float -> float -> tval to_pair: t -> float * floatval subrange : t -> t -> boolval union: t -> t -> t listval inter: t -> t -> t list

end =struct

type t = float * float

let make 1 u = if u < 1 then invalid_arg "Range.make" else (1, u)

let to_pair r = r

let subrange (11, ul) (12, u2) = 11 <= 12 && ul >= u2

let order (11, ul) (12, u2) =

if 11 < u2 then «12, u2), (11, ul)) else «11, ul), (12, u2))

let union rl r2 = let «11, ul), (12, u2)) = order rl r2 inif ul < 12 then [11, ul; 12, u2] else [min 11 12, max ul u2]

let inter rl r2 = let «11, ul), (12, u2)) = order rl r2 inif ul < 12 then [] else [max 11 12, min ul u2]

end

This FloatRange module shares several design similarities with the IntRange module. We shallconsider the similarities and differences between these modules before providing examples ofthe use of the FloatRange module.

Like the IntRange module, the FloatRange module also defines a type t which, in this case,represents a range of numbers on the real-line. Again, the type t is abstract. The FloatRangemodule also defines a function make to construct a value of the type t.

2.3. MODULES 45

Unlike the IntRange module, the internal representation of a FloatRange. t is chosen to bea 2-tuple of floating-point numbers rather than a record. From the point of view of codesize and clarity, this is a comparatively inconsequential design decision4 . More importantly,the FloatRange module makes use of a function, order, which appears in the structure ofthe module but not in the signature. Consequently, this function is only visible to definitionswhich appear in the module, after the definition of the order function. This has been donebecause the functionality of the order function has been factored out from both the unionand the inter functions.

In this case, the effect of factoring out the order function whilst hiding it from code outsidethe FloatRange module could have been achieved using nesting by defining the union andinter functions simultaneously:

let (union, inter) =let order (11, ut) (12, u2) =

if 11 < u2 then ((12, u2), (11, ut)) else ((11, ut), (12, u2)) in

let union rl r2 = let ((11, ul), (12, u2)) = order rl r2 inif ul < 12 then [11, ul; 12, u2J else [min 11 12, max ul u2J in

let inter rl r2 = let ((11, ul), (12, u2)) = order rl r2 inif ul < 12 then [J else [max 11 12, min ul u2J in

(union, inter)

The FloatRange module may be used to create and perform operations upon values of anabstract type representing a range of real-valued numbers. For example, a pair of ranges maybe created:

# let a = FloatRange. make 1. 3. and b = FloatRange. make 2. 5.;;val a : Range. t = <abstr>val b : Range. t = <abstr>

The contents of these values can be extracted using the to_pair function:

# Range.to_pair a;;- : float * float = [(1., 3.) J# Range.to_pair b;;- : float * float = [(2., 5.) J

The union and intersection ofthese ranges may then be calculated using the union and interfunctions provided by the FloatRange module. In order to see the result we must extract theranges as pairs of floating-point numbers by mapping the to_pair function over the resultinglists. For example, [1,3) U [2,5) = [1,5):

# List.map Range.to_pair (Range.union a b);;- : (float * float) list = [(1., 5.)J

4However, as we shall see in chapter 7, records containing fields which are all of the type float are represented more efficiently by the ocamlopt compiler.

46

and [1,3) n [2,5) = [2,3)

CHAPTER 2. PROGRAM STRUCTURE

# List.map Range.to_pair (Range. inter a b);;- : (float * float) list = [(2., 3.)J

As we shall see in the chapter 3, the OCaml core library provides many useful data structureswhich implement the functionalities presented here. Moreover, these implementations are alsoencapsulated into separate modules. However, the OCaml language provides an additionalmeans of encapsulating definitions.

2.4 Objects

Object-oriented (00) programming is a much touted approach for the structuring of programs. However, the hype surrounding this notion is primarily driven by the fact that recent,high-profile languages (particularly Java and C++) provide some support for objects but donot provide support for modules or other constructs to aid with the structuring of programs.

Despite this social aspect to 00 approaches, the subject has a rigorous mathematical background [3, 4]. The OCamllanguage draws upon this foundation to provide a very carefullyconstructed and expressive object system which is particularly well suited to the writing ofextensible programs. However, because the module system provides a safer alternative toencapsulation, 00 programming is not as prolific in OCaml as it is in other languages.

Fundamental concepts in 00 programming are the ability to define types of object (classtypes), define implementations adhering to these types (class expressions), define relationships between class types, instantiate class expressions to produce objects at run-time andinterrogate objects via their methods (which are often functions).

As objects encapsulate program and data, they are somewhat similar to data structures composed of tuples or records containing these values. However, static typing requires a tuple orrecord value to be from a well-defined set of possible values (its type). In contrast, the typeof an object is the interface to which it adheres. A single object which satisfies many differentinterfaces may then be used in many different contexts, whereas a tuple or record cannot.

2.4.1 Classes

Like module signatures, class types may contain several different kinds of declaration:

• values declaratons in the form val ....

• methods declaratons in the form method ....

• type constraints using the constraint keyword.

• inheritance using the inherit keyword.

Class types are declared using the syntax:

2.4. OBJECTS

class type name = object ... end

where the name of the class type (name) must begin with a lower-case letter.

Like module structures, class expressions may contain several different kinds of definition:

• values definitions in the form val ....

• method definitions in the form method ....

• type constraints, using the constraint keyword.

• initializers, using the initializer keyword.

Class expressions are declared. using the syntax:

class name = object ... end

where the name of the class expression (name) must begin with a lower-case letter.

2.4.2 Objects

47

Objects are instantiated (created) at run-time, either as immediate objects or as classed objects.

2.4.2.1 Immediate objects

Objects can be instantiated. independently of classes, known as immediate objects5 . Thefollowing function accepts two floating-point values x and y, representing real and imaginaryparts respectively, and creates an immediate object representing a complex-valued numberz = x + iy:

# let z x y =

objectval x : float = xval y : float = ymethod re '" xmethod im = y

end; ;val z : float -> float -> < im : float; re : float> = <fun>

Note that the type of the object returned by the function z is defined only by the methodvariables im and re which it implements.

The function z can be used. to instantiate an object called a which represents the complexnumber a = 2 + 3i:

5This feature is new in OCaml version 3.08.

48

# let a = z 2. 3.;;val a : < im : float; re : float> = <obj>


The real and imaginary parts ofthe complex-valued number represented by a may be extractedusing the re and im members of a, referred to as a#re and a#im, respectively:

# a#re;;- : float = 2.# a#im;;-: float =3.

In addition to immediate objects, objects of the same type may be expressed by instantiatingfrom a single class expression.

2.4.2.2 Classed objects

The type of the immediate object a could have been expressed by the class type:

# class type number =

objectmethod re : floatmethod im : float

end; ;class type number = object method im : float method re : float end

Although not yet implemented, the class type number may be used as if it were a normal type.For example, the following declares a function which maps a number onto a float:

# let abs_number (z : number) =

let sqr x = x *. x insqrt (sqr z#re +. sqr z#im);;

val abs_number : number -> float = <fun>

Before being able to use this function we must define a class expression which adheres to theclass type number. For example, the following class expression complex implements complexnumbers in way which adheres to the interface required by number:

# class complex x y =

objectval x = xval y = ymethod re float = xmethod im : float = y

end; ;class complex :

float ->float ->object val x : float val y float method im : float method re float end

2.4. OBJECTS 49

Objects may be instantiated from class expressions using the new keyword. For example, anobject of type complex may be instantiated using:

# let b = new complex 2. 3.;;val b : complex = <obj>

Note that the type of the object b is denoted by the name of the class, complex. The resultingobject has the same properties as the original a object. Specifically, the members can beaccessed equivalently:

# b#re;;- : float = 2.# b#im;;- : float = 3.

Also, the abs_number function can be applied to b as complex adheres to the interface prescribed by normal:

# abs_number b;;- : float = 3.60555127546398912

Classed objects have an important advantage over immediate objects - relationships may bedefined between classes.

2.4.2.3 Inheritance

A class representing real-valued numbers may be derived from our complex class using theinheri t keyword:

# class real x =

objectinherit complex x O.

end; ;class real:

float ->object val x : float val y : float method im : float method re : float end

Objects of this class can then be instantiated from a single floating-point value:

# let c = new real 5.;;val c : real = <obj>

Resulting objects always return an imaginary part of zero:

# c#re;;- : float = 5.# c#im;;- : float = O.


Figure 2.2: The ocamlbrowser program allows the contents of modules to be examinedgraphically. The ability to examine the types of library functions is particularly useful.The type of the Array. fold_left function is illustrated here.

The ability to derive new types of object from existing types makes 00 programming ideallysuited to the creation of extensible programs.

Many more sophisticated uses of the OCaml object system, including multiple inheritance,parameterized classes, polymorphic methods, coercions, cloning, mutually recursive classes,binary methods and friends, are described in the OCaml reference manual [2]. As the OCamlobject system is much more powerful than the 00 approaches used in other languages, suchas C++ and Java, the ways in which OCaml objects may be exploited is a current researchtopic in scientific programming.

2.5 OCaml browser

The standard OCaml distribution contains an ocamlbrowser program which can be used toexamine available modules using a graphical interface, in particular the core library. Whilstdeveloping an OCaml program, the ability to review the contents of libraries can be veryuseful. In particular, the ability to find the type of a function at the click of a button, or toexamine the documentation in the corresponding interface file, can greatly speed development.

By default, ocamlbrowser shows only the contents of the core library modules and any OCamlsources found in the directory in which ocamlbrowser was started. Selecting a module nameallows the contents of a particular module to be examined. Selecting a type (t) or value (v) in amodule presents the type, or the type of the value at the bottom of the ocamlbrowser window(illustrated in figure 2.2). The Impl and Intf buttons in the main window open an additionalwindow for browsing source code, displaying the contents of the implementation (".ml") orinterface (".mli") files respectively, targeting the selected content (illustrated in figure 2.3).

In many cases, the ability to use the ocamlbrowser program to peruse the contents of other,

2.6. COMPILATION 51

file .Editand the element itself as second argument. *)

pal fold_left, ('a -) 'b -) 'a) -) 'a -) 'b array -) 'a(** [Array. fold left f x a] computes

[f ( ... (f (f' x a. (0)) a. (1)) ... ) a. (n-1)],where [n] is the length of the array [a]. *)

lIal fold right: ('b -) 'a -) 'a) -) 'b array -) 'a -) 'a(** [Array. fold_right fax] computes

[f a. (0) (f a. (1) ( ... (f a. (n-1) x) ... ))],where [n] is the length of the array [a]. *)

(** {6 Sorting} *)

lIal sort: ('a -) 'a -) int) -) 'a array -) unit(** Sort an array in increasing order according to a comparison

function. The comparison function must return 0 if its argumentscompare as equal, a positive integer if the first is greater,and a negative inte~er if the first is smaller (see below for acomplete specificat10n). For example, {!Pervasives.compare} isa suitable comparison function, provided there are no floating-pointNaN values in the data. After calling [Array. sort], the

,"'}i;f;/£;}7J:j£;:.;:i'JZ;:K.;;':'§j)iJJ;,g;/",';;-'SJ;:;'8ffifJ7:;;!(;j;1ii.0:si{ffiCUM·'2.1fi.JZ;.":,o:'SJJB7£,;{;:r-:$iZ;.w~'r$?2g::&JZ2c'1.,~'F.'JJf.'£tv'2it

Figure 2.3: The ocamlbrowser program contains a simple editor which can be used toexamine the contents of OCaml source code. In particular, this simplifies the task ofexamining documentation in the interface files of libraries, e.g. the description of theArray. fold_left function shown here.

non-core libraries can be desirable. Other modules can be added to the default selection byselecting Modules I> Path editor... choice from the menu and adding new search paths.

The final step in creating workable programs based around modules is the compilation of theparts of a programs into complete executables.

2.6 Compilation

In almost all cases, the code for a program will be split between separate files. This is typicallybecause a program will make use of libraries to provide part of its functionality. For example,a program dealing with arbitrary precision arithmetic is likely to make use of the Nat, Num orBig_int modules or an interface to the GNU MP library.

The separate files used to store parts of an OCaml program are treated as modules. Pairs offiles with the same name except for the suffixes ".mli" and ".ml" are treated as the signaturesand structures of a single module, respectively, the name of which is given by the filenamewith its first letter capitalized. Single files with the suffix ".ml" are treated similarly but asmodule structures without signatures (i.e. all definitions in the module will be visible).

The process of creating an executable from source code consists of two separate stages. Thefirst stage, known as compilation, converts the human-readable source code into an intermediate form known as object code6 . The second stage, known as linking, combines object codesto create an executable.

As we have already seen, OCaml programs can be compiled either into byte-code or into nativecode executables. The OCaml compilers generate names for the object files with the same

6This is nothing to do with object orientation.


Figure 2.4: Dependencies between the First, Second and Main modules in the exampleprogram.

name as the source files but with suffixes based upon the mode of compilation. Specifically,source code in files with the suffixes ".mli" and ".ml" are compiled either into byte-code objectfiles with suffixes ".cmi" and ".cmo", respectively, or into native-code object files with suffixes".cmi" and ".cmx".

When linking, the order in which object files are specified can be important. The OCamlcompilers consider the list of object files in the order in which they are specified. Consequently,object files must be specified after any other object files which they refer to.

For example, consider a program composed of three separate compilation units named "first","second" and ''main'' split between five files: ''first.mli'', ''first.ml'', "second.mli", "second.ml"and ''main.ml''. The ''first.ml'' file contains:

let sentence = "This string is in the first compilation unit."let sentence2 = "This string is also in the first compilation unit."

The "first.mli" file contains:

val sentence : string

The "second.ml" file contains:

let sentence = "This string is in the second compilation unit."

The "second.mli" file again contains:

val sentence : string

Finally, the ''main.ml'' file contains:

letprint_endline First. sentence;print_endline Second. sentence

The source code contained in these five files may be compiled into a byte-code executable byfirst compiling the interface (".mli") files, then compiling the implementation (".ml") files and,finally, linking the resulting object files (".cmo") to form an executable. The dependenciesbetween the three modules are illustrated in figure 2.4. The dependencies between the fiveinitial and five generated files are illustrated in figure 2.5.

The interface files ''first.mli'' and "second.mli", which represent module signatures, may becompiled to object form for a byte-code executable using:

2.6. COMPILATION 53

acamic -c main.ml

ocaml c -c first. ml

ocamic -c second.rnl

acamic first. mii

acamic first. mli

Figure 2.5: Dependencies between the example files used to make the ''first.cmo", "second.cmo" and "main.cmo" files before they are linked to create the final executable program.

$ ocamlc first.mli$ ocamlc second.mli

This generates the files ''first.cmi'' and "second.cmi". The implementation files ''first.ml'', "second.mI" and ''main.ml'', representing module structures, may be compiled to object form fora byte-code executable by supplying the -c flag to supress the generation of an executable:

$ ocamlc -c first.ml$ ocamlc -c second.ml$ ocamlc -c main.ml

The ''first.cmi'' and "second.cmi" interface files are used to enforce the proper use of the corresponding modules. This generates the files ''first.cmo'', "second.cmo" and ''main.cmo''. Finally,a byte-code executable may then be created by linking the resulting".cmo" files, in this caseto form an executable named "test":

$ ocamlc first.cmo second.cmo main.cmo -0 test

Executing test prints the two strings defined in the First and Second modules, as expected:

$ ./testThis string is in the first compilation unit.This string is in the second compilation unit.

The source code can be compiled into native-code, rather than byte-code, by using theocamlopt compiler to compile the source code and then link the ".cmx" files instead:

$ ocamlopt first.mli$ ocamlopt second.mli$ ocamlopt -c first.ml$ ocamlopt -c second.ml$ ocamlopt -c main.ml$ ocamlopt first.cmx second.cmx main.cmx -0 test

54

Suffix I File type


name.ml Structure of the Name modulename.mli Signature of the Name modulename.cmi Compiled signaturename.cmo Byte-code compiled structurename.cmx Native-code compiled structurename.cma Byte-code archive

name.cmxa Native-code archivename Final executable

Table 2.1: Types offile handled by the OCaml compilers.

The resulting executable runs in exactly the same way.

When linking, the relative order of the implementations of the First and Second modules isnot important, provided they are specified before the Main module which depends upon them.For example, when linking, we could have specified the First and Second modules in reverseorder with no ill-effect:

$ ocamlopt second.cmx first.cmx main.cmx -0 test

However, as the Main module depends upon both the First and Second modules, trying tolink without these two modules preceding the Main module will fail:

$ ocamlopt first.cmx main.cmx second.cmx -0 testNo implementations provided for the following modules:

Second referenced from main.cmx$

Note also that the contents of the interface file ''first.mIi'' hides the sentence2 variable definedin the ''first.mI'' file from external code (Le. from the code in ''main.ml''). Consequently,attempting to access this variable from the implementation of the Main module:

let =

print_endline First.sentence2

would have caused a compile-time error from the compiler:

$ ocamlopt first.mli$ ocamlopt second.mli$ ocamlopt -c first.ml$ ocamlopt -c second.ml$ ocamlopt -c main.mlFile "main.ml", line 3, characters 16-31:Unbound value First. sentence2$

2.6. COMPILATION 55

Figure 2.6: Dependencies between the compilation units of the core library.

Several different types of file are used in the process of creating an executable from OCamlsource code. Table 2.1 lists the types of file handled by the OCaml compilers.

In the case of complicated programs or libraries, containing many modules and dependencies,tools to visualise the dependencies can be useful. The ocamldep program can be used to createa graph-theoretic representation of dependencies between compilation units. For example, thefollowing generates a graph representing the dependencies between all of the ".mI" files in thecurrent directory:

$ ocamldep *.ml >dep.dep

The resulting data can be converted into the format used by the freely-available, generalpurpose graph plotting program dot using the freely-available ocamldot program:

$ ocamldot <dep.dep >dep.dot

The dot program may then be used to generate a diagram in PostScript format:

$ dot -Tps dep.dot >dep.ps

For example, the dependency graph between the some of the compilation units which make upthe core OCamllibrary (in the "stdlib" directory of the distribution) is illustrated in figure 2.6.

The OCamI compilers check that compilation commands, as well as the programs themselves,are correct.

2.6.1 Linking with libraries

The ability to perform the compilation and linking stages separately is more important whencompiling programs which make use of libraries supplied in the form of object code. Forexample, consider the program, assumed to be in the file "fact.ml":


open Numlet ree factorial n =

if n=O then (num_oLint 1) else (num_oLint n) *1 factorial (n-1);;print_endline ("100! ="~ (string_of_num (factorial 100)));;

This program uses the Num module (which implements arbitrary precision integer arithmetic).The namespace of the Num module is opened in order to use the infix operators +/, -I, */,/ / (equivalent to +, -, * and / for machine-precision integers, respectively). A function tocompute the factorial of a machine-precision integer to give an arbitrary precision integer isthen written in terms of */. Finally, the value of 100! is printed.

As this program depends upon the Num module, attempting to compile this program withoutfirst specifying the object files which it depends upon will fail:

$ ocamlc fact.ml -0 factError while linking fact.erno: Reference to undefined global 'Num'

The Num module is not part of the OCaml core library but is provided as a separate library"nums.cma" (byte-code) and ''nums.cmxa'' (native-code) which we must link in. Thus, we cancompile our program by specifying "nums.cma" before ''fact.ml'':

$ ocamlc nums.cma fact.ml -0 fact$ . Ifact100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

In fact, the OCaml byte-code can be loaded dynamically, as programs are running or whilethe top-level is in use. In the case of the top-level, the functionality required to use the Nummodule may be obtained using the #load directive:

# #load "nums. cma" ; ;

The source code in the ''fact.ml'' file may then be executed using the #use directive:

# #use "fact.ml";;val factorial: int -> Num.num = <fun>100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000- : unit = 0

Although we can now compile arbitrary programs into executables and dynamically load bytecode into a running top-level, we do not yet know how to build a custom-made top-level whichincludes extended functionality, such as the Num module.

2.7. CUSTOM TOP-LEVELS

2.7 Custom top-levels

57

The interactivity of the OCaml top-level can be very useful when developing programs. However, the default top-level is bare in terms of the modules it provides. For example, attemptingto execute the previous program in the default top-level fails because the Num module, and allthat it depends upon, is unavailable:

$ ocaml main.mlReference to undefined global 'Num'

In order to execute this program in a top-level, we can first create a customised top-levelwhich includes the necessary functionality. Top-levels are compiled in a way similar to bytecode-compiled programs but using the ocamlmktop compiler instead of ocamlc. Thus, we canbuild an appropriate top-level, which we choose to call "num.top", using:

$ ocamlmktop nums.cma -0 num.top

Executing the resulting "num.top" file gives us a top-level which includes the functionality ofthe Num module:

$ ./num.topObjective Caml version 3.08.0

#

The program contained in the ''main.ml'' file may then be executed in the running num. toptop-level using the #use directive:

# #use "main.ml";;val factorial: int -> Num.num = <fun>100! = 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000- : unit = 0

We shall discuss libraries relevant to scientific computing in more detail later, in chapter 8.In particular, we shall use the -cclib option for the compilers to include the functionality oflibraries written in other languages. Meanwhile, let us examine some of the sophisticated datastructures and algorithms provided with OCaml, all of which are encapsulated in modules.


Chapter 3

Data Structures

Scientific applications are the most computationally intensive programs in existence. Thisplaces enormous emphasis on the efficiency of such programs. However, much time can bewasted by optimising fundamentally inefficient algorithms and concentrating on low-level optimisations when much more productive higher-level optimisations remain to be exploited.

Too commonly, a given problem is shoe-horned into using arrays because more sophisticateddata structures are prohibitively complicated to implement in many common languages!. Examples of this problem, endemic in scientific computing, are rife. For example, Finite elementmaterials simulations, numerical differential equation solvers, numerical integration, implicitsurface tesselation and simulations of particle or fluid dynamics based around uniformly subdivided arrays when they should be based around adaptively subdivided trees.

Occasionally, the poor performance of these inappropriately-optimised programs even drivesthe use of alternative (often approximate) techniques. Examples of this include the use ofpadding to round vector sizes up to integer-powers of two when computing numerical Fouriertransforms (Fourier series). In order to combat this folklore-based approach to optimisation,we shall introduce a more formal approach to quantifying the efficiency of computations. Thisapproach is well known in computer science as complexity theory.

The single most important choices determining the efficiency of a program are the selection ofalgorithms and of data structures. Before delving into the broad spectrum of data structuresaccessible from OOamI, we shall study the notion of algorithmic complexity. This conceptquantifies algorithm efficiency and, therefore, is essential for the objective selection of algorithms and data structures based upon their performance. Studying algorithmic complexityis the first step towards drastically improving program performance.

3.1 Algorithmic Complexity

In order to compare the efficiencies of algorithms meaningfully, the time requirements of analgorithm must first be quantified. Although it is theoretically possible to predict the exacttime taken to perform an arbitrarily complicated problem given details of the computer andof the input data, such an approach quickly becomes intractable.

1Primarily Fortran.

59

60 CHAPTER 3. DATA STRUCTURES

Consequently, exactness is often relinquished in favour of an approximate but still quantitativemeasure of the time taken for an algorithm to execute. This approximation, the conventionalnotion of algorithmic complexity, is derived as an upper- or lower-bound or average-case2 ofthe amount of computation required, measured in units of some suitably chosen primitiveoperations. Furthermore, asymptotic algorithmic efficiency is derived by considering theseforms in the limit of infinite algorithmic complexity.

We shall begin by describing the notion of the primitive operations of an algorithm beforederiving a mathematical description for the asymptotic complexity of an algorithm. Finally,we shall demonstrate the usefulness of algorithmic complexity in the optimization of a simplefunction.

3.1.1 Primitive operations

In order to derive an algorithmic complexity, it is necessary to begin by identifying somesuitable primitive operations. The complexity of an algorithm is then measured as the totalnumber of these primitive operations it performs. In order to obtain a complexity whichreflects the time required to execute an algorithm, the primitive operations should ideallyterminate after a constant amount of time. However, this restriction cannot be satisfied inpractice (due to effectively-random interference from cache effects etc.), so primitive operationsare typically chosen which terminate in a finite amount of time for any input, as close to aconstant amount of time as possible.

For example, a first version of a function to raise a floating-point value x to a positive, integerpower n may be implemented naIvely as:

# let rec ipow_1 x n = if n = 0 then 1. else x *. ipow_1 x (n - 1); ;val ipow_1 : float -> int -> float = <fun>

The ipow_1 function executes an algorithm described by this recurrence relation:

{1 n=O

x n =X X xn- l otherwise

Consequently, this algorithm performs the floating-point multiply operation exactly n timesin order to obtain its result, i.e. xO = 1, xl = X x 1, x 2 = X X X x 1 and so on. Thus, thebuilt-in floating-point multiplication function:

val ( *. ) : float -> float -> float

is a logical choice of primitive operation. Moreover, this function multiplies finite-precisionnumbers and the algorithms used to perform this operation in practice (which are almostalways implemented as dedicated hardware) always perform a finite number of more primitiveoperations at the bit level, regardless of their input. Thus, this choice of primitive operationwill execute in a finite time regardless of its input.

We shall now examine an approximate but practically very useful measure of algorithmiccomplexity before exploiting this notion in the optimisation of the i pow_1 function.

2Average-case complexity is particularly useful when statistics are available on the likelihood of differentinputs.

3.1. ALGORITHMIC COMPLEXITY

3.1.2 Complexity

61

The complexity of an algorithm is the number of primitive operations it performs. For example,the complexity of the ipmcl function is T(n) = n.

As the complexity can be a strongly dependent function of the input, the mathematical derivation of the complexity quickly becomes intractable for reasonably complicated algorithms.

In practice, this is addressed in two different ways. The first approach is to derive the tightest possible bounds of the complexity. If such bounds cannot be obtained then the secondapproach is to derive bounds in the asymptotic limit of the complexity.

3.1.2.1 Asymptotic complexity

An easier-to-derive and still useful indicator of the performance of a function is its asymptoticalgorithmic complexity. This gives the asymptotic performance of the function in the limit ofinfinite execution time.

Three notations exist for the asymptotic algorithmic complexity of a function f(x):

O(g(x)) =? C1 < lim f(x)- x->oo g(x)

O(g(x)) =? lim f(x) < C2x->oo g(x) -

8(g(x)). f(x)

=? C1:S hm TI :S C2x->oo g X

for some constants C1, C2 E R

The 8 form of asymptotic complexity is more restrictive and, therefore, conveys more information. In particular, "f(x) is 8(g(x))" implies both "f(x) is O(g(x))" and "f(x) is O(g(x))".The 0 notation is more commonly encountered as it represents the practically more importantnotion of the upper-bound of the complexity.

The formulation of the asymptotic complexity of a function leads to some simple but powerfulmanipulations:

• f(x) is O(ag(x)), a> 0 =? f(n) is O(g(x)), Le. constant prefactors can be removed.

• f(x) is O(xa+xb), a> b > 0 =? f(x) is O(xa), Le. the polynomial term with the largestexponent dominates all other polynomial terms.

• f(x) is O(xa + bX), a > 0, b > 0 =? f(n) is O(bX), i.e. exponential terms dominate anypolynomial terms.

• f(x) is O(aX+ bX), a > b > 0 =? f(n) is O(aX), Le. the exponential term aX with thelargest mantissa a dominates all other exponential terms.

These rules can be used to simplify an asymptotic complexity.


GJ Computation IT(n) I0 1 01 x 02 xxx 13 x X x'L 2

4 (x2)~ 2

5 x X (x2 ,~

3

6 (x X x2 ,~

3

7 x x (x X x2)'L 4

8 ((x2)2 'L 3

n=On=ln> 1andnodd

n> 1 and neven

Table 3.1: Complexity of the ipow_2 function measured as the number of multiply operations performed.

As the complexity of the ipow_l function is T(n) = n, the asymptotic complexities are clearlyO(n), S1(n) and, therefore, e(n).

The algorithm behind the ipow_l function can be greatly improved upon by reducing thecomputation by a constant proportion at a time. In this case, this can be achieved by tryingto halve n repeatedly, rather than decrementing it. The following recurrence relation describessuch an approach:

# let rec ipow_2 x n =

if n = 0 then 1. else if n = 1 then x elselet x2 = ipow_2 x (nj2) in let x2 = x2 *. x2 inif n mod 2 = 1 then x *. x2 else x2; ;

val ipow_2 : float -> int -> float = <fun>

This variant is clearly more efficient as it avoids the re-computation of previous results, e.g. x4

is factored into (x2)2 to use two floating-point multiplications instead of four. Quantifyingexactly how much more efficient is suddenly a bit of a challenge!

We can begin by expanding the computation manually for some small n (see Table 3.1) aswell as computing and plotting the exact number of integer multiplications performed for bothalgorithms as a function of n (shown in figure 3.1).

Lower and upper bounds of the complexity can be derived by considering the minimum andmaximum number of multiplies performed in the body of the ipow_2 function, and the minimum and maximum depths of recursion.

The minimum number of multiplies performed in the body of a single call to the ipow_2function is 0 for n ::; 1, and 1 for n > 1. The function recursively halves n, giving a depth ofrecursion of 1 for n::; 1, and at least Llog2 nJ for n > 1. Thus, a lower bound of the complexityis 0 for n ::; 1, and log2(n) - 1 for n> 1.

3.1. ALGORITHMIC COMPLEXITY 63

T(n)100

80

60

40

20

20 40 60 80n

100

Figure 3.1: Complexities of the ipow_1 and ipow_2 functions in terms of the number T(n)of multiplies performed.

T(n)16

14

12

10 ....8 ......6 .......

... ... .4

2

n20 40 60 80 100

Figure 3.2: Complexities of the ipow_2 function in terms of the number of multipliesperformed, showing: exact complexity T(n) (green dots) and lower- and upper-boundsalgorithmic complexities log2(n) - 1 ~ T(n) ~ 2(1 + log2 n) for n > 1 (black lines).

The maximum number of multiplies performed in the body of a single call to the ipow_2function is 2. The depth of recursion is 1 for n ~ 1 and does not exceed flOg2 n1for n > l.Thus, an upper bound of the complexity is 0 for n ~ 1, and 2(1 + log2 n) for n> 1.

From these lower and upper bounds, the asymptotic complexities of the ipow_2 function areclearly n(1nn), O(1nn) and, therefore, 9(1nn). The logarithmic complexity of ipow_2 (illustrated in figure 3.2) originates from the divide-and-conquer strategy, reducing the computationrequired by a constant factor (halving n) at each stage rather than by a constant absoluteamount (decrementing n).

The actual performance of these two versions of the ipow function can be measured (seeFigure 3.3). As expected from the algorithmic complexity, we find that the ipow_2 functionis considerably faster for large n.

Asymptotic algorithmic complexity, as we have just described, should be considered first when


80

'.

604020

.II ._ •••••~ Il:la ID

II •• II II II

.' ..Il rI' •• r/'ttII "rl ••.....

" "." .•~ II •- ~....:""...."-..."'~'--- -"----- n

100

t (ps)6

5

4

3

2

1

Figure 3.3: Measured performance of the ipow_l and ipow_2 functions which haveasymptotic algorithmic complexities of 8(n) and 8 (In n), respectively.

LITIIITTI ...~Figure 3.4: Arrays are the simplest data structure, allowing fast, random access (readingor writing) to the i th element 'IIi E {O ... n - I} where n is the number of elements in thearray. Elements cannot be added or removed without copying the whole array.

trying to choose an efficient algorithm or data structure. On the basis of this, we shall nowexamine some of the wide variety of data structures accessible from OCaml in the context ofthe algorithmic complexities of operations over them.

3.2 Arrays

Of all the data structures, arrays will be the most familiar to the scientific programmer. Arraysare containers of fixed size which allow the i th element to be extracted in 0(1) time (illustratedin figure 3.4). This makes them ideally suited in situations which require a container withfast random access. As the elements of arrays are typically stored contiguously in memory,they are often the most efficient container for iterating over the elements in order. This is theprincipal alluring feature which leads to their (over!) use in numerically intensive programs.

As we have already seen, the OCamllanguage provides a notation for describing arrays:

# let a = [11; 2 IJlet b = [13; 4; 51Jlet c = [16; 7; 91 J ; ;

val a : int array = [11; 21 Jval b : int array = [13; 4; 51Jval c : int array = [16; 7; 91J

In OCaml, arrays are mutable, meaning that the elements in an array can be altered in-place.The element at index i of an array b may be read using the short-hand syntax b. (i):

# b. (1); ;

- : int = 4

3.2. ARRAYS

Note that array indices run from {O ... n - 1} for an array containing n elements.

Array elements may be set using the syntax used for mutable record fields, namely:

# c.(2) <- 8;;- : unit = 0

The contents of the array c have now been altered:

# c;;- : int array = [16; 7; 81J

65

Any attempt to access an array element which is outside the bounds of the array results in anexception being raised at run-time:

# c. (3) <- 8;;Exception: Invalid_argument "index out of bounds".

The mutability of arrays typically leads to the use of an imperative style when arrays arebeing used.

The core OCamllibrary provides several functions which act upon arrays in the Array module.We shall examine some of these functions before looking at the more exotic array functionsoffered by OCam!'

The append function concatenates two arrays:

# Array. append a b;;- : int array = [11; 2; 3; 4; 51J

The append function has complexity 8(n) where n is the length of the resulting array.

The concat function concatenates a list of arrays:

# let e = Array.concat [a; b; cJ;;val e : int array = [11; 2; 3; 4; 5; 6; 7; 81J

The concat function has complexity 8(n + m) where n is the length of the resulting arrayand m is the length ofthe supplied list.

A new variable created from an existing array refers to the existing array. Thus the complexityof creating a new variable which refers to an existing array is 8(1), Le. independent of thenumber of elements in the array. However, all alterations to the array are visible from anyvariables which refer to the array. For example, the following creates a variable called d whichrefers to the same array as the variable called c:

# let d = c;;val d : int array = [16; 7; 81 J

The effect of altering the array via either c or d can be seen from both c and d:


• • •

Figure 3.5: The higher-order Array. init function creates an array ai

{O ... n - I} using the given function f.

# d.(O) <-17;;- : unit = 0# (c, d);;- : int array * int array = ([ 117; 7; 81 J, [117; 7; 81 J )# c. (0) <- 6;

(c, d);;- : int array * int array = ([ 16; 7; 81 J, [16; 7; 81 J )

f(i) for i E

The copy function returns a new array which contains the same elements as the given array.For example, the following creates a variable d (superseding the previous d) which is a copyof c:

# let d = Array. copy c;;val d : int array = [16; 7; 81J

Altering the elements of the copied array d does not alter the elements of the original array c:

# d. (0) <- 17; (c, d);;- : int array * int array = ([ 16; 7; 81 J, [117; 7; 81 J )

The sub function returns a new array which contains a copy of the range of elements specifiedby a starting index and a length. For example, the following copies a sub-array of 5 elementsstarting at index 2 (the third element):

# Array.sub e 2 5;;- : int array = [I 3; 4; 5; 6; 7 IJ

In addition to these conventional array functions, OCaml offers some more exotic functions.We shall now examine these functions in more detail.

The higher-order init function creates an array, filling the elements with the result of applyingthe given function to the index i E {O ... n -I} of each element (illustrated in figure 3.5). Forexample, the following creates the array ai = i 2 for i E {O ... 3}:

# let a = Array.init 4 (fun i -> i*i);;val a : int array = [10; 1; 4; 91 J

The Array.init function is analogous to the list_init function we defined on page 37.

The higher-order function iter executes a given function on each element in the given arrayin turn and returns the value of type uni t. The purpose of the function passed to this higherorder function must, therefore, lie in any side-effects it incurs. Hence, the iter function isonly of use in the context of imperative, and not functional, programming. For example, thefollowing prints the elements of the array a:

3.2. ARRAYS

• • •

• • •

67

Figure 3.6: The higher-order Array .map function creates an array containing the resultof applying the given function f to each element in the given array a.

• • •

Figure 3.7: The higher-order Array.fold_left function repeatedly applies the givenfunction f to the current accumulator and the current array element to produce a newaccumulator to be applied with the next array element.

# Array. iter (fun e -> print_endline (string_of_int e)) a;;o149

unit = 0

The map function applies a given function to each element in the given array, returning anarray containing each result (illustrated in figure 3.6). For example, the following creates thearray bi = at:

# let b = Array.map (fun e -> e * e) a;;val b : int array = [10; 1; 16; 811 ]

The higher-order fold_left and fold_right functions are more general and more useful thanmap. The fold functions accumulate the result of applying their function arguments to eachelement in turn. The fold_left function is illustrated in figure 3.7.

In the simplest case, the fold functions can be used to accumulate the sum or product of theelements of an array by folding the addition or multiplication operators over the elements ofthe array respectively, starting with a suitable base case. For example, the sum of the elementsof the array referred to by the variable b is 3 + 4 + 5 = 12:

# Array. fold_left ( + ) 0 b;;- : int = 12


We have already encountered this functionality in the context of the IntRange module developed in section 2.3.3. However, arrays may contain arbitrary data of arbitrary types.

For example, an array could be converted to a list by prepending elements using the consoperator:

# let to_list a = Array. foldJight (fun e 1 -> e ., 1) a [J ; ;val to_list: 'a array -> 'a list = <fun># to_list [10; 1; 4; 9IJ;;- : int list = [0; 1; 4; 9J

This to_list function uses fold_right to cumulatively prepend to the list 1 each element eof the array a in reverse order, starting with the base-case of an empty list [J. The result isa list containing the elements of the array in the correct order. The to_list function is, infact, already in the Array module:

# Array.to_list [10; 1; 4; 9IJ;;- : int list = [0; 1; 4; 9J

Although slightly more complicated than iter and map, the fold_left and fold_right functions are very useful because they can produce results of any type, including accumulatedprimitive types as well as different data structures.

When pattern matching, the contents of arrays can be used in patterns. For example, thevector cross product a x b could be written:

# let vec_cross a b = match (a, b) with([Ixl; yl; zllJ, [lx2; y2; z21J) ->

[ly1*.z2 -. z1*.y2; z1*.x2 -. x1*.z2; x1*.y2 -. y1*.x2 IJI _ -> raise (Invalid_argument "cross");;

val vec_cross : float array -> float array -> float array = <fun>

Thus, patterns over arrays can be used to test the number and value of all elements, forminga useful complement to the map and fold algorithms.

3.3 Lists

Arguably the simplest and most commonly used data structure, lists allow two fundamentaloperations (see figure 3.8). The first is decapitation of a list into two parts, the head (the firstelement of the list) and the tail (a list containing the remaining elements). The second is thereverse operation of prepending an element onto a list to create a new list. The complexities ofboth operations are 8(1), Le. the time taken to perform these operations is independent of thenumber of elements in the list. Thus, lists are ideally suited for the creation of a data structurecontaining an unknown number of elements (such as the loading of an arbitrarily-long sequenceof numbers) .

As we have already seen, OCaml implements lists in the language itself. In particular, thecons operator :: can be used both to prepend an element and to represent decapitated lists

3.3. LISTS 69

Tail-----Head

Figure 3.8: Lists are the simplest, arbitrarily-extensible data structure. Decapitationsplits a list li i E {O ... n - 1} into the head element h and the tail list ti i E {O ... n - 2}.

in patterns. Unlike arrays, the implementation of lists is functional, so operations on listsproduce new lists.

The List module contains the append, iter, map, fold_left and fold_right functions,equivalent to those for arrays, an append function and a flatten function, which providesequivalent functionality to that of Array. concat (Le. to concatenate a list of lists into a singlelist). In particular, the append function has the pseudonym @:

# [1; 2J © [3; 4J;;- : int list = [1; 2; 3; 4J

The List module also contains several functions for sorting and searching. The contents ofa list may be sorted using the higher-order List . sort function, into an order specified by agiven total order function (the compare function in the Pervasives module, in this case):

# List. sort compare [1; 5; 3; 4; 7; 9J ; ;- : int list = [1; 3; 4; 5; 7; 9J

An element may be tested for membership in a list using the List .mem function:

# List.mem 4 [1; 3; 4; 5; 7; 9J;;- : bool = true# List.mem 6 [1; 3; 4; 5; 7; 9J;;- : bool = false

Similarly, the first element matching a given predicate function may be extracted using thehigher-order function Li st .find:

# List.find (fun i -> (i-6)*i > 0) [1; 3; 4; 5; 7; 9J;;- : int = 7

This function raises the Not_found exception if all the elements in the given list fail to matchthe given predicate:

# List.find (fun i -> (i-6)*i = 0) [1; 3; 4; 5; 7; 9J;;Exception: Not_found.


The contents of a list of key-value pairs may be searched using the List. assoc function to findthe value corresponding to the first matching key. For example, the following list 1 contains(i, i2) key-value pairs:

# let 1 = List.map (fun i -> (i, i*i)) [1; 2; 3; 4; 5J;;vall: (int * int) list = [(1,1); (2,4); (3,9); (4, 16); (5, 25)J

Searching 1 for the key i = 4 using the List. assoc function finds the corresponding i 2 = 16value:

# List.assoc 4 1;;- : int = 16

As we shall see in sections 3.5 and 3.6, equivalent functionality is provided with considerablybetter asymptotic performance by the hash table and map data structures.

The ability to grow lists makes them ideal for filtering operations based upon arbitrary predicate functions. The List. parti tion function splits a given list into two lists containing thoseelements which match the predicate and those which do not. The following example uses thepredicate Xi ~ 3:

# List.partition (fun x -> x <= 3) [1; 2; 3; 4; 5J;;- : int list * int list = ([1; 2; 3J, [4; 5J)

Similarly, the List. fil ter function returns a list containing only those elements whichmatched the predicate, Le. the first list that List. parti tion would have returned:

# List.filter (fun x -> x <= 3) [1; 2; 3; 4; 5J;;- : int list = [1; 2; 3J

The partition and filter functions are ideally suited to arbitrarily extensible data structuressuch as lists because the length of the output(s) cannot be precalculated.

In addition to the conventional higher-order iter, map, fold, sorting and searching functions,the List module contains several functions which act upon pairs of lists. These functions allassume the lists to be of equal length. If they are found to be of different lengths then anInvalid_argument exception is raised. We shall now elucidate these functions using examplesfrom vector algebra.

The higher-order function map2 applies a given function to each pair of elements from twoequal-length lists, producing a single list containing the results. The type of the map2 functionis:

val map2 : ('a -> 'b -> 'c) -> 'a list -> 'b list -> 'c list

The map2 function can be used to write a function to convert a pair of lists into a list of pairs:

# let list_combine a b = List.map2 (fun a b -> (a, b)) a b;;val list_combine : 'a list -> 'b list -> ('a, 'b) list = <fun>

3.3. LISTS 71

Applying the list_combine function to a pair of lists of equal lengths combines them into alist of pairs:

# list_combine [1; 2; 3J [2; 3; 4J;;-: (int * int) list = [(1,2); (2,3); (3, 4)J

Applying the list_combine function to a pair oflists of unequal lengths causes an exceptionto be raised by the map2 function:

# list_combine [1; 2; 3J [2; 3; 4; 5J;;Exception: Invalid_argument "List.map2".

In fact, the functionality of this list_combine function is already provided by the combinefunction in the Li st module.

Vector addition can be written in terms of the map2 function:

# let veeadd = List.map2 (+. );;val veeadd : float list -> float list -> float list = <fun>

When given a pair of lists a and b of floating-point numbers, this function creates a listcontaining the sum of each corresponding pair of elements from the two given lists, i.e. a + b:

# vec3dd [1.; 2.; 3.J [2.; 3.; 4.J;;-: float list = [3.; 5.; 7.J

The higher-order fold_left2 and fold_right2 functions in the List module are similar tothe fold_left and fold_right functions, except than they act upon two lists simultaneouslyinstead of one. The types of these functions are:

val fold_left2: ('a -> 'b -> 'c -> 'a) -> 'a -> 'b list -> 'c list -> 'aval fold_right2: (' a -> 'b -> 'c -> 'c) -> 'a list -> 'b list -> 'c -> 'c

Thus, the fold_left2 and fold_right2 functions can be used to implement many algorithmswhich consider each pair of elements in a pair of lists in turn. For example, the vector dotproduct could be written succinctly using fold_left2 by accumulating the products of elementpairs from the two lists:

# let vecdot = List.fold_left2 (fun dab -> d +. a *. b) 0.;;val vec_dot : float list -> float list -> float = <fun>

When given two lists, a and b, of floating-point numbers, this function accumulates the products ai x bi of each pair of elements from the two given lists, Le. the vector dot producta·b:

# vec_dot [1.; 2.; 3.J [2.; 3.; 4.J;;- : flo at = 20.


The ability to write such functions using maps and folds is clearly an advantage in the contextof scientific computing. Moreover, this style of programming can be seamlessly converted tousing much more exotic data structures, as we shall see later in this chapter. In some cases,algorithms over lists cannot be expressed easily in terms of maps and folds. In such cases,pattern matching can be used instead.

Patterns over lists can not only reflect the number and value of all elements, as they can inarrays, but can also be used to test initial elements in the list using the cons operator :: todecapitate the list. In particular, pattern matching can be used to examine sequences of listelements. For example, the following function "downsamples" a signal, represented as a list offloating-point numbers, by averaging pairs of elements:

# let rec downsample = function[J -> [J

I hi :: h2 :: t -> (hi +. h2) /. 2. :: downsample tI [_J -> invalid_arg "downsample";;

val downsample : float list -> float list = <fun>

This is a simple, recursive function which uses a pattern match containing three patterns. Thefirst pattern downsamples the empty list to the empty list. This pattern acts as the base-casefor the recursive calls of the function (equivalent to the base-case of a recurrence relation).The second pattern matches the first two elements in the list (hi and h2) and the remainderof the list (t). Matching this pattern results in prepending the average of hi and h2 ontothe list resulting from downsampling the remaining list t. The third pattern matches a listcontaining any single element, raising an exception if this erroneous input is encountered:

# downsample [5. J ; ;Exception: Invalid_argument "downsample".

As these three patterns are completely distinct (any input list necessarily matches one andonly one pattern) they could, equivalently, have been presented in any order in the patternmatch.

The downsample function can be used to downsample an eight-element list into a four-elementlist by averaging pairs of elements:

# downsample [0.; 1.; 0.; -1.; 0.; 1.; 0.; -1.J;;- : float list = [0.5; -0.5; 0.5; -0.5J

The ability to perform pattern matches over lists is extremely useful, resulting in a very concisesyntax for many operations which act upon lists.

Note that, in the context of lists, the iter and map functions can be expressed succinctly interms of fold_left:

let iter f 1 = List. fold_left (fun accu e -> f e) 0 1let map f 1 = List. fold_left (fun accu e -> f e :: accu) [J 1

and that all of these functions can be expressed, albeit more verbosely, using pattern matching.The iter function simply applies the given function f to the head h of the list and then recursesto iterate over the element in the tail t:

3.4. SETS

let rec iter f 1 = match 1 withh: : t - > f h; iter f t

I [J -> 0

73

The map function applies the given function f to the head h of the list, prepending the resultf h onto the result map f t of recursively mapping over the elements in the tail t:

let rec map f 1 = match 1 withh: :t -> f h :: map f t

I [J -> 0

The f old_left function applies the given function f to the current accumulator accu andthe head h of the list, passing the result as the accumulator for folding over the remainingelements in the tail t of the list:

let rec fold_left f accu 1 = match 1 withh: :t -> fold_left f (f accu h) t

I [J -> accu

The fold_right function applies the given function f to the head h of the list and the resultof recursively folding over the remaining elements in the tail t of the list:

let rec fold_right f 1 accu = match 1 withh: :t -> f h (fold_right f t accu)

I [J -> accu

Thus, the map and fold functions can be thought of as higher-order functions which havebeen factored out of many algorithms. In the context of scientific programming, factoringout higher-order functions can greatly increase clarity, often providing new insights into thealgorithms themselves. In chapter 9, we shall pursue this, developing several functions whichsupplement those in the core library.

As the algorithms provided over arrays refer to those using lists, the two functions for converting between lists and arrays are both in the Array module. The Array. oLlist functioncreates an array from the given list and the Array. to_list function creates a list from thegiven array.

Having examined the two simplest containers provided by the OCaml language itself, we shallnow examine some more sophisticated containers which are provided in the core library.

3.4 Sets

In the context of data structures, the term "set" typically means a sorted, unique, associativecontainer. Sets are "sorted" containers because the elements in a set are stored in orderaccording to a given comparison function. Sets are "unique" containers because they do notduplicate elements (adding an existing element to a set results in the same set). Sets are"associative" containers because elements determine how they are stored (using the specifiedcomparison function).


The OCaml core library provides sets which are implemented as balanced binary trees3 . Thisallows a single element to be added or removed from a set containing n elements in O(ln n)time. Moreover, the OCaml implementation also provides functions union, inter and difffor performing the set-theoretic operations union, intersection and difference, respectively.

In order to implement the set-theoretic operations between sets efficiently, the sets used mustbe based upon the same comparison function. The set implementation in the OCaml corelibrary enforces this requirement using a construct called a functor. Whereas functions mapvalues to values, functors map modules to modules.

The Set. Make functor transforms a simple module, which must implement the element typet and a total-ordering function compare for comparing pairs of elements, into a complicatedmodule which implements a set of elements of type t using the comparison function compare.

For example, elements in a set of integers may be representing by the following module whichwe choose to call Key:

# module Key =struct

type t = intlet compare i j = if i < j then -1 else if i = j then 0 else 1

end; ;module Key : sig type t = int val compare : int -> int -> int end

The type t is used to specify the type of an element in the set. The compare function providesa total-ordering over the elements of the set. This function returns an integer value whichmust be less than zero, zero or greater than zero when the given pair of elements compareto be less than, equal or greater than, respectively. In this case, we have chosen to specify acomparison function specific to values of type int (the type int -> int -> int of the comparefunction is inferred from the use of the - operator and number 0). In general, the slowerbut simpler and polymorphic comparison function compare, implemented in the Pervasivesmodule, can be used to compare pairs of values of various types. The polymorphic comparisonfunction compare is equivalent to:

# let compare i j = if i < j then -1 else if i = j then 0 else 1;;val compare: 'a -> 'a -> int = <fun>

and could have been used in the Key module by specifying:

let compare = compare

A module IntSet, representing a set of integers, may then be created by applying the Set .Makefunctor to our Key module in a single line of code, producing a module implementing a substantial number of functions:

3Balanced binary trees will be discussed in more detail later in this chapter.

3.4. SETS

# module IntSet = Set .Make(Key);;module IntSet :

sigtype elt = Key. ttype t = Set. Make (Key) . tval empty: tval is_empty : t -> boolval mem : elt -> t -> boolval add : elt -> t -> tval singleton : elt -> tval remove : elt -> t -> tval union : t -> t -> tval inter : t -> t -> tval diff : t -> t -> tval compare : t -> t -> intval equal : t -> t -> boolval subset : t -> t -> boolval iter: (elt -> unit) -> t -> unitval fold: (elt -> 'a -> 'a) -> t -> 'a -> 'aval for_all: (elt -> bool) -> t -> boolval exists : (elt -> bool) -> t -> boolval filter : (elt -> bool) -> t -> tval partition: (elt -> bool) -> t -> t * tval cardinal : t -> intval elements : t -> elt listval min_elt : t -> eltval max_elt : t -> eltval choose : t -> eltval split: elt -> t -> t * bool * t

end

75

The IntSet module can be used to manipulate sets of integers. We shall only make use ofsome of the functions provided. Naturally, all of the functions are documented in the OCamlmanual [2] and in the Set. Make functor of the core library itself, which can be read usingocamlbrowser.

The type IntSet. t represents a set of elements of the type IntSet. elt (which is the same asthe specified type Key. t, Le. int).

The set of integers containing no elements can be obtained as:

# IntSet. empty; ;- : IntSet. t = <abstr>

In order to demonstrate the use of sets, we shall define some helper functions to add a list ofintegers to a given set and to convert between lists and sets of integers. A function to add alist 1 of elements to an existing set s is most easily obtained by folding the add function ofthe IntSet module over the list:

# let add_list 1 s = List. fold_right IntSet. add 1 s;;val add_list: IntSet. elt list -> IntSet. t -> IntSet. t = <fun>

A function to create a set from a given list 1 can then be written in terms of the add_listfunction by adding the list of elements to the empty set:


# let oLlist 1 = add_list 1 IntSet. empty; ;val of_list: IntSet. elt list -> IntSet. t = <fun>

The elements of a set may be extracted using the elements function.

A set containing a single element is called a singleton set and can be created using theIntSet . singleton function:

# let s = IntSet. singleton 3;;val s : IntSet. t = <abstr># IntSet.elements s;;- : IntSet. elt list = [3J

As sets are implemented in a functional style, adding an element to a set returns the resultingset:

# let s = IntSet. add 5 s;;val s : IntSet. t = <abstr># IntSet.elements s;;- : IntSet. elt list = [3; 5J

By adding some more integers to our set we can check that duplicates are removed:

# let s = add_list [10; 1; 9; 2; 8; 4; 7; 4; 6; 7; 7J s;;val s : IntSet. t = <abstr>

The number of elements in a set, known as the cardinality of the set, is given by:

# IntSet.cardinal s;;- : int = 10

This indicates that the duplicates have been removed, leaving only the integers {i ... lO}. Inorder to check this we can convert the set back into a list:

# IntSet.elements s;;- : IntSet.elt list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10J

Note that the IntSet. fold function provided the elements in the set in the order prescribedby our Key. compare function.

We can also demonstrate the set-theoretic union, intersection and difference operations. Forexample:

{i, 3, 5} U{3, 5, 7} = {i, 3, 5, 7}

# IntSet. elements (IntSet. union (oLlist [1; 3; 5J) (of _list [3; 5; 7J));;

- : int list = [1; 3; 5; 7J

{i, 3, 5} n {3, 5, 7} = {3,5}

3.4. SETS

# IntSet.elements (IntSet.inter (of_list [1; 3; 5J) (of_list [3; 5; 7J));;

- : int list = [3; 5J

{1,3,5} \ {3,5, 7} = {1}

# IntSet.elements (IntSet.diff (of_list [1; 3; 5J) (of_list [3; 5; 7J));;- : int list = [1J

The subset function tests if A c B. For example, {4,5,6} c {1 ... lD}:

# IntSet.subset (of_list [4; 5; 6J) s;;- : bool = true

77

The spli t function divides a set into those elements less than a given element and thoseelements greater than the given element, as well as providing a boolean indicating whetherthe given element was present. For example, splitting s at 5 results in the sets {1 ... 4} and{6 ... 10} and the boolean true because 5 E s:

# let (l, e, u) = IntSet. split 5 s in(IntSet.elements 1, e, IntSet.elements u);;

([1; 2; 3; 4J, true, [6; 7; 8; 9; 10J)

Applying the built-in polymorphic comparison functions «, <=, =, >=, >, <> and compare) tomany data structures of abstract types, such as sets, will produce unpredictable behaviour.The reason is simply that these functions perform structural comparison of the data structuresand, in many cases, different structures will be used internally to convey the same semanticmeaning. For example, the internal structure of a value of type IntSet. t depends upon theorder in which the elements were inserted and, consequently, sets containing the same contentsmay have different internal structure and, therefore, may compare as unequal when using thepolymorphic comparison functions despite being semantically equivalent:

# (oClist [1; 2; 3; 4; 5J) = (of_list [5; 4; 3; 2; 1]);;- : bool = false

Sets can be compared correctly (semantically) using the compare function provided by theSet. Make functor. In this case:

# IntSet.compare (of_list [1; 2; 3; 4; 5J) (of_list [5; 4; 3; 2; lJ);;

- : int = 0

In the context of scientific computing, set data structures are useful for a variety of reasons.The set-theoretic operations union, intersection and difference are considerably faster whenperformed between set data structures than when performed between unsorted arrays and lists.Inserting new values such that ordering is preserved is also much faster for a set data structurethan for arrays and lists. Consequently, whenever a task requires the use of a sorted container,the set data structure will, most likely, be more efficient than array- or list-based alternatives.In chapter 10, we shall use a set data structure to compute the set of nth-nearest neighboursin a graph and apply this to atomic-neighbour computations on a simulated molecule. In themean time, we have more data structures to discover.


3.5 Hash tables

A hash table is an associative container mapping keys to corresponding values. We shall referto the key-value pairs stored in a hash table as the elements of the hash table.

In terms of utility, hash tables are an efficient way to implement a mapping from one kindof value to another. For example, to map strings onto functions. In order to provide theirfunctionality, hash tables provide add and remove functions to insert and delete mappings,respectively, and a find function which looks-up and returns the value corresponding to agiven key.

Internally, hash tables compute an integer value, known as a hash, from each given key. Thishash of a key is used as an index into an array in order to find the value corresponding to thekey. The hash is computed from the key such that two identical keys share the same hash andtwo different keys are likely (but not guaranteed) to produce different hashes. Moreover, hashcomputation is restricted to 8(1) time complexity, typically by terminating if a maximumnumber of computations is reached. Assuming that no two keys in a hash table produce thesame hash, finding the value corresponding to a given key takes 8(1) time4 •

The OCaml core library contains an imperative implementation of hash tables in the Hashtblmodule. Hash tables may be used in two different ways. The simplest approach, which we shallexamine here, simply uses polymorphic hash tables generated by the Hashtbl. create function. A more sophisticated approach involves using the Hashtbl.Make functor to generate amodule implementing ahash table which uses customised equality and hashing functions. Thelatter approach is required if the built-in polymorphic equality (=) and hash (Hashtbl.hash)functions are not applicable. These built-in functions are not applicable to any types of datastructures for which different internal structures can be equal, e.g. two different balanced binary trees inside two equivalent sets should compare to be equal but the built-in equalityfunction (=) will indicate that they are not equal because they do not have the same structure(as discussed in section 3.4).

For example, a hash table mapping strings to floating-point numbers may be constructed byfirst creating a monomorphic hash table over, as yet, unknown types (denoted' _a and' _b inOCaml):

# let ill = Hashtbl. create 5;;val ill: (, _a, '_b) Hashtbl. t = <abstr>

The integer passed to the Hashtbl. create function is intended to indicate the number of keyslikely to be in the hash table.

Mappings may be added to the hash table m using the Hashtbl. add function:

# Hashtbl.add ill "Hydrogen" 1.0079;Hashtbl. add ill "Carbon" 12.011;Hashtbl.add ill "Nitrogen" 14.00674;Hashtbl.add ill "Oxygen" 15.9994;Hashtbl. add ill "Sulphur" 32.06;;

- : unit = 0

4Computing the hash in 8(1) time and then using it to access an array element, also in 8(1) time.

3.6. MAPS 79

Note that, as an imperative data structure, adding elements alters the hash table in-placeand, therefore, need not return the hash table.

The resulting hash table m, of type (string, float) Hashtbl. t, represents the followingmapping from strings to floating-point values:

Hydrogen --t 1.0079

Carbon --t 12.011

Nitrogen --t 14.00674

Oxygen --t 15.9994

Sulphur --t 32.06

Having been filled at run-time, the hash table may be used to look-up the values correspondingto given keys. For example, we can find the average atomic weight of carbon:

# Hashtbl. find m "Carbon";;- : float = 12.011

Ifnecessary, we can also delete mappings from the hash table, such as the mapping for Oxygen:

# Hashtbl.remove m "Oxygen";;- : unit = 0

The remaining mappings in the hash table are most easily printed using the iter function inthe Hashtbl module:

# let aux spec weight =print_endline (spec-" -> "-(string_of_float weight» in

Hashtbl.iter aux m;;Carbon -> 12.011Nitrogen -> 14.00674Sulphur -> 32.06Hydrogen -> 1.0079- : unit = 0

Note that the order in which the mappings are supplied hy Hashtbl.iter (and map and fold)are effectively random. In fact, the order is related to the hash function. Also, hashing isanother form of structural comparison and, like the polymorphic comparison functions, shouldnot be applied to many abstract types. For example, a hash table cannot be used with setsas keys as the hash of a set depends upon its internal structure and, therefore, semanticallyequivalent sets are likely to produce different hashes.

Hash tables can clearly be useful in the context of scientific programming. However, a functional alternative to these imperative hash tables can sometimes be desirable. We shall nowexamine a functional data structure which uses different techniques to implement the samefunctionality of mapping keys to corresponding values.

80

-5

-10

1000 2000 3000

CHAPTER 3. DATA STRUCTURES

n4000

Map

Hash Table

Figure 3.9: Measured performance (time t in seconds) for inserting key-value pairs intohash tables and functional maps containing n - 1 elements. Although the hash tableimplementation results in better average-case performance, the D(n) time-complexityincurred when n = 2P - 1 tj p > 0 E Z produces much slower worst-case performance bythe hash table.

3.6 Maps

We described the functional implementation of the set data structure provided by OCaml insection 3.4. The core OCaml library provides a similar data structure known simply as amap5.

Much like a hash table, the map data structure associates keys with corresponding values.Consequently, the map data structure also provides add and remove functions to insert anddelete mappings, respectively, and a find function which returns the value corresponding toa given key.

Unlike hash tables, maps are represented internally by a balanced binary tree, rather than anarray, and maps differentiate between keys using a specified total ordering function, ratherthan a hash function.

Due to their design differences, maps have the following advantages over hash tables:

• Functional. Programs using maps are easier to reason over as maps cannot be mutated.

• Persistent. Old versions of maps may be kept and used. Thanks to the functionalprogramming style, data is magically reused between versions.

• Stable D(1n n) time-complexity for inserting and removing a mapping, compared tounstable, amortized e(l) time-complexity in the case of hash tables (which may takeup to D(n) for some insertions, as illustrated in figure 3.9).

• Customised comparison and hashing functions are required for non-trivial types of keyand comparison functions are often simpler to obtain or define than custom hashingfunctions.

5Not to be confused with the higher-order map function provided with many data structures.

3.6. MAPS 81

• The map data structure can be iterated over using iter, map, map i and fold functions,all of which present mappings ordered by their keys according to the total-order functionof the map.

However, maps also have the following disadvantages compared to hash tables:

• Logarithmic O(ln n) time-complexity for finding a mapping, compared to 0(1) timecomplexity6 in the case of hash tables (see figure 3.9).

• Maps require a total ordering over keys to be specified as a function.

In the same way that the Set module contains the Set. Make functor, so the Map modulecontains a Map .Make functor. Also analogously, this functor transforms a module implementingthe key type, and a comparison function giving a total ordering over keys, into a moduleimplementing a map data structure with keys of this type and polymorphic correspondingvalues.

We shall now demonstrate the functionality of the map data structure reusing the exampleof mapping strings to floating-point values. In this case, the type of a key is string and,therefore, we must begin by implementing a Key module providing this type and a totalordering over this type. This may be written as follows, making use of the String. comparefunction to provide a total ordering over strings:

# module Key =

structtype t = stringlet compare = String. compare

end; ;module Key:

sig type t = string val compare : String. t -> String. t -> int end

A mapping with keys of type Key. t, i.e. string, may be created from the Key module usingthe Map. Make functor:

# module Weights = Map.Make(Key);;module Weights :

sigtype key = Key. ttype 'at = 'a Map.Make(Key).tval empty: 'a tval is_empty: 'a t -> boolval add : key -> 'a -> 'a t -> 'a tval find: key -> 'a t -> 'aval remove: key -> 'a t -> 'a tval mem : key -> 'a t -> boolval iter: (key -> 'a -> unit) -> 'a t -> unitval map: ('a -> 'b) -> 'a t -> 'b tval mapi : (key -> 'a -> 'b) -> 'a t -> 'b tval fold: (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'bval compare: ('a -> 'a -> int) -> 'a t -> 'a t -> intval equal: ('a -> 'a -> bool) -> 'a t -> 'a t -> bool

end

6 Assuming no two keys in the hash table share the same hash.


A map data structure containing no mappings is then represented by the value:

# let m = Weights. empty; ;val m: 'a Weights. t = <abstr>

Note that this value is polymorphic, representing a mapping from keys of type string tocorresponding values of any type. This value can then be used to construct mappings fromstrings to any concrete type, such as float in this example.

Mappings may be added to musing the Weights. add function. As a functional data structure,adding elements to a map returns a map containing both the new and old mappings. Hencewe repeatedly supersede the old data structure m with a new data structure m:

# let m = Weights. add "Hydrogen" 1.0079 m inlet m = Weights. add "Carbon" 12.011 m inlet m = Weights. add "Nitrogen" 14.00674 minlet m = Weights. add "Oxygen" 15.9994 m inlet m = Weights. add "Sulphur" 32.06 m;

val m : float Weights. t = <abstr>

Note that specifying mappings to floating-point values has caused the type attributed to mtochange from a mapping of type' a Weights. t to a mapping of type float Weights. t, Le. tovalues of type float.

In fact, there is some subtlety involved here. As we saw in the previous section, newly createdhash tables are monomorphic, their types ossified upon first use:

# let h = Hashtbl. create 1;;val h: (, _a, '_b) Hashtbl. t = <abstr># Hashtbl.add h 1 2.;;- : unit = 0# h;;- : (int, float) Hashtbl. t = <abstr>

In contrast, an empty map is polymorphic and the same empty map can be used to createseparate lineages with different types:

# let parent = Weights. empty; ;val parent : 'a Weights. t = <abstr># let child1 = Weights. add "A string" 3 parent

and child2 = Weights. add "A string" 3. parent;;val child1 : int Weights. t = <abstr>val child2 : float Weights. t = <abstr>

The differences between monomorphic and polymorphic types will be discussed in more detailin section A.5.

We can use the map m to find the average atomic weight of carbon:

# Weights. find "Carbon" m;;- : float = 12.011

3.7. SUMMARY 83

I a set I (a, {3) hash table I (a, {3) map Ia list

Create n init - - - -Insert - - add replace add

Find - find find find find

Remove - remove_assoc remove remove remove

Sort sort sort - N/A -Mapping get nth N/A find find

I Functions I a array I

Table 3.2: Functions implementing common operations over data structures. In the caseof set and map data structures, the functions are implemented in the module created bythe Set. Make or Map. Make functors.

Deleting mappings from the functional map data structure produces a new data structurecontaining most of the old data structure:

# Weights. remove "Oxygen" m; ;- : float Weights. t = <abstr>

The remaining mappings are most easily printed using the Weights. iter function:

# let aux spec weight =

print_endline (speC-" -> ,,- (string_of _float weight)) inWeights.iter aux m;;

Carbon -> 12.011Hydrogen -> 1. 0079Nitrogen -> 14.00674Oxygen -> 15.9994Sulphur -> 32.06- : unit = 0

Note that m still contains the entry for oxygen as we ignored the result of removing this entry,thus leaving ill intact.

The ability to evolve the contents of data structures along many different lineages duringthe execution of a program can be very useful. This is clearly much easier when the datastructure provides a functional, rather than an imperative, interface. In an imperative style,such approaches would most likely involve the inefficiency of explicitly duplicating the datastructure (e.g. using the Hashtbl. copy function) at each fork in its evolution. In contrast,functional data structures provide this functionality naturally and, in particular, will sharedata between lineages.

Having examined the data structures provided with OCaml, we shall now summarise therelative advantages and disadvantages of these data structures using the notion of algorithmiccomplexity developed in section 3.1.

3.7 Summary

As we have seen, the complexity of operations over data structures can be instrumental inchoosing the appropriate data structure for a task. Consequently, it is beneficial to compare


a set I (a, (3) hash table I (a, (3) mapa lista array

Create n 8(n) 8(n)T O(nlnn)T O(nln n)T O(nln n)TInsert 8(n)t ith 8(i)t O(lnn) 8(1)TT O(1n n)Find O(n)t O(n) O(1nn) 8(1)TT O(1n n)

Remove 8(n)T itn 8(i) O(1nn) 8(1) O(lnn)Sort O(nlnn) O(nlnn) 8(1)T N/A 8(1)1

Mapping l[ ~ a : 8(1) l[~a:8(i) N/A a ~ (3: 0(1) a ~ {3 : 8 (In n)

I Complexities I

Table 3.3: Asymptotic algorithmic complexities of operations over different data structures. The set l[ denotes valid indices i E {O ... n - 1} of an array or list containing nelements.t not provided with the core OCaml distribution.tt amortized complexity.

the asymptotic complexities of common operations over various data structures. Table 3.2shows the functions provided by the core library to perform common operations. Table 3.3gives the asymptotic complexities of the algorithms used by these functions, for data structurescontaining n elements.

Having examined the various containers built-in to OCaml, we shall now examine a generalisation of these containers which is easily handled in OCaml before considering the creation ofnew data structures.

3.8 Heterogeneous containers

The array, list, set, hash table and map containers are all homogeneous containers, i.e. forany such given container, the elements must all be of the same type. Containers of elementswhich may be of one of several types can also be useful. These are known as heterogeneouscontainers.

Heterogeneous containers can be defined by first creating a variant type which unifies thetypes allowed in the container. For example, values of this variant type number may containdata representing elements from the sets il, lR or C:

# type number = Integer of int I Real of float I Complex of float * float; ;type number = Integer of int I Real of float I Complex of float * float

A homogeneous container over this unified type may then be used to implement a heterogeneous container. The elements of a container of type number list can contain Integer, Real

or Complex values:

# let nums = [Integer 1; Real 2.; Complex (3. , 4.) J ; ;val nums : number list = [Integer 1; Real 2.; Complex (3., 4.)J

Let us consider a simple function to act upon the effectively heterogeneous container typenumber list. A function to convert values of type number to the built-in type Complex. t

may be written:

3.9. TREES

# let complex_of_number = functionInteger i -> { Complex. re = float_of_int i; im=O. }

I Real x - > { Complex. re = x; im = O. }I Complex (re, im) -> {Complex.re = re; im = im};;

val complex_of _number: number -> Complex. t = <fun>

85

For example, mapping the complex_oLnumber function over the number list called numsgives a Complex. t list:

# List.map complex_of_number nums;;-: Complex.t list =[{Complex.re = 1. ; Complex. im = O.}; {Complex. re = 2. ; Complex. im = O.};{Complex.re = 3.; Complex.im= 4.}]

The list, set, hash table and map data structures can clearly be useful in a scientific context, inaddition to conventional arrays and the heterogeneous counterparts of these data structures.However, a major advantage of OCamllies in its ability to create and manipulate new, custommade data structures. We shall now examine this aspect of scientific programming in OCaml.

3.9 Trees

In addition to the built-in data structures, the ease with which the OCaml language allowstuples, records and variant types to be handled makes it an ideal language for creating andusing new data structures. 'frees are the most common such data structure.

A tree is a self-similar data structure used to store data hierarchically. The origin of a tree is,therefore, itself a tree, known as the root node. As a self-similar, or recursive, data structure,every node in a tree may contain further trees. A root which contains no further trees marksthe end of a lineage in the tree and is known as a leaf node.

The simplest form of tree is a recursive data structure containing an arbitrarily-long list oftrees. This may be represented in OCaml by the type:

# type tree = Node of tree list;;type tree = Node of tree list

A balanced binary tree of depth d is represented by an empty node for d = 0 and a nodecontaining two balanced binary trees, each of depthd - 1, for d> O. This simple recurrencerelation is most easily implemented as a purely functional, recursive function:

# let rec balanced_tree = functiono -> Node []

I n -> Node [balanced_tree (n-i); balanced_tree (n-i)] ;;val balanced_tree : int -> tree = <fun>

The tree depicted in figure 3.10 may then be constructed using:


Figure 3.10: A perfectly-balanced binary tree of depth x = 3 containing 2x+1 - 1 = 15nodes, including the root node and 2X = 8 leaf nodes.

# let example = balanced_tree 3;;val example : tree =

Node[Node [Node [Node []; Node []]; Node [Node []; Node []]];Node [Node [Node []; Node []]; Node [Node []; Node []]]]

We shall use this example tree to demonstrate more sophisticated computations over trees.

Functions over the type tree are easily written. For example, the following function countsthe number of leaf nodes:

# let rec leaf_count = functionNode [] -> 1

I Node 1-> List.fold_left (fun s t -> S + leaf_count t) °1;;val leaf_count: tree -> int = <fun># leaf_count example;;- : int = 8

'frees represented by the type tree are of limited utility as they cannot contain additionaldata in their nodes. An equivalent tree which allows arbitrary, polymorphic data to be placedin each node may be represented by the type:

# type 'a ptree = PNode of 'a * 'a ptree list;;type' a ptree = PNode of 'a * 'a ptree list

As a trivial example, the following function traverses a value of type tree to create an equivalent value of type ptree which contains a zero in each node:

# let rec boring_ptree_of_tree = functionNode 1 -> PNode (0, List.map boring_ptree_oCtree 1);;

val boring_ptree_of_tree: tree -> int ptree = <fun>

For example:

# boring_ptree_of _tree (Node [Node []; Node []]);;- : int ptree = PNode (0, [PNode (0, [J); PNode (0, [])])

As a slightly more interesting example, the following function converts a value of type treeto a value of type ptree, storing unique integers in each node of the resulting tree:

3.9. TREES 87

Figure 3.11: The result of inserting an integer counter into each node of the tree depictedin figure 3.10 using the counted_ptree_oCtree function.

# let counted_ptree_of_tree t =

let rec aux n = functionNode 1 ->

let aux2 t (n, 1) =let (n2, t) = aux (n+1) t in(n2, PNode (n, t) :: 1) in

List. fold_right aux2 1 (n, [J) inlet (_, 1) = aux 2 t inPNode (1, 1);;

val counted_ptree_of_tree : tree -> int ptree = <fun>

This function marks the root node with the integer 1 and uses an auxiliary function aux tocumulatively convert a list of values of type tree into a list of values of type ptree, storingthe result in a root PNode. The aux function folds an auxiliary function aux2 over each childof the current node, accumulating the counter and a list of child trees.

Applying this function to our example tree produces a more interesting result (illustrated infigure 3.11):

# counted_ptree_of_tree example;;- : int ptree =

PNode (1,[PNode (9, [PNode (13, [PNode (15, [J); PNode (14, [J)J);

PNode (10, [PNode (12, [J); PNode (11, [J) J) J ) ;PNode (2, [PNode (6, [PNode (8, [J); PNode (7, [J)J);

PNode (3, [PNode (5, [J); PNode (4, [J)J)J)J)

In practice, storing the maximum depth remaining in each branch of a tree can be useful whenwriting functions to handle trees. Values of our generic tree type may be converted into theptree type, storing the integer depth in each node, using the following function:

# let rec depth_ptree_of_tree = functionNode 1 ->

let aux t (od, 1) =

let t = depth_ptree_of_tree t and depth_of (PNode (d, _» = d in(max od (depth_of t), t: : 1) in

let (d, 1) = List.fold_right aux 1 (-1, [J) inPNode (d+ 1,1);;

val depth_ptree_of_tree : tree -> int ptree = <fun>


This function uses an auxiliary function aux to convert a list of child trees of type tree into alist of child trees of type ptree whilst accumulating the maximum depth in any child branch.The result is used to construct a PNode with a maximum depth of that of its children plusone, and a list of ptree children.

Applying this function to our example tree produces a rather uninteresting, symmetric set ofbranch depths:

# let example = depth_ptree_of_tree example;;val example : int ptree =

PNode (3,[PNode (2,

[PNode (1, [PNode (0, [J); PNode (0, [J)J);PNode (1, [PNode (0, [J); PNode (0, [J) J) J) ;

PNode (2,[PNode (1, [PNode (0, [J); PNode (0, [J)J);PNode (1, [PNode (0, [J); PNode (0, [J)J)J)J)

Using a tree of varying depth provides a more interesting result. The following function createsan unbalanced binary tree, effectively representing a list of increasingly deep balanced binarytrees in the left children:

# let unbalanced_tree n =

let rec aux m =if m=n then Node [J elseNode [balanced_tree m; aux (m+1) J in

aux 0;;val unbalanced_tree: int -> tree = <fun>

This can be used to create a wonky tree:

# let wonky = unbalanced_tree 3;;val wonky : tree =

Node[Node [J;Node

[Node [Node [J; Node [J J ;Node [Node [Node [Node [J; Node [JJ; Node [Node [J; Node [JJJ; Node [JJJJ

Converting this wonky tree into a tree containing the remaining depth in each node we obtaina more interested result (illustrated in figure 3.12):

# depth_ptree_of_tree wonky;;- : int ptree =

PNode (5,[PNode (0, [J);PNode (4,

[PNode (1, [PNode (0, [J); PNode (0, [J)J);PNode (3,

[PNode (2,[PNode (1, [PNode (0, [J); PNode (0, [J)J);PNode (1, [PNode (0, [J); PNode (0, [J)J)J);

PNode (0, [J)J)J)J)

3.9. TREES

o

89

Figure 3.12: An unbalanced binary tree with the remaining depth stored in every node.

In practice, the ability to express a tree in which each node may have an arbitrary number ofbranches often turns out to be a hindrance rather than a benefit. Consequently, the numberof branches allowed at each node in a tree is typically restricted to one of two values:

• zero branches for leaf nodes and

• a constant number of branches for all other nodes.

'frees which allow only zero or two branches, known as binary trees, are particularly prolificas they form the simplest class of such trees7 , simplifying the derivations of the complexitiesof operations over this kind of tree.

Although binary trees could be represented using the tree or ptree data structures, this wouldrequire the programmer to ensure that all functions acting upon these types produced nodelists containing either zero or two elements. In practice, this is likely to become a considerablesource of human error and, therefore, of programmer frustration. Fortunately, when writingin OCaml, the type system can be used to enforce the use of a valid number of branches ateach node. Such automated checking not only removes the need for careful inspection by theprogrammer but also removes the need to perform run-time checks of the data, improvingperformance. A binary tree analogous to our tree data structure can be defined as:

# type bin_tree = Leaf I Node of bin_tree * bin_tree;;type bin_tree = Leaf I Node of bin_tree * bin_tree

A binary tree analogous to our ptree data structure can be defined as:

# type 'a pbin_tree =

Leaf of 'a I Node of 'a * 'a pbin_tree * 'a pbin_tree; ;type 'a pbin_tree = Leaf of 'a I Node of 'a * 'a pbin_tree * 'a pbin_tree

Values of type ptree which represent binary trees may be converted to this pbin_tree typeusing the following function:

7If only zero or one "branches" are allowed at each node then the tree is actually a list (see section 3.8).


# let rec pbin_tree_of_ptree = functionPNode (d, [J) -> Leaf dPNode (d, [1; rJ) ->

Node (d, pbin_tree_of_ptree 1, pbin_tree_of_ptree r)I PNode C, _) -> invalid_arg "pbin_tree_of_ptree";;

val pbin_tree_of_ptree: 'a ptree -> 'a pbin_tree = <fun>

For example, the arbitrary-branching-factor example tree may be converted into a binary treeusing the pbin_tree_oCptree function:

# pbin_tree_of_ptree example;;- : int pbin_tree =

Node (3, Node (2, Node (1, Leaf 0, Leaf 0), Node (1, Leaf 0, Leaf 0)),Node (2, Node (1, Leaf 0, Leaf 0), Node (1, Leaf 0, Leaf 0)))

Note that the 'a pbin_tree type, which allows arbitrary data of type 'a to be stored in allnodes, could be usefully altered to an 'a 'b Pbin_tree type which allows arbitrary data oftype' a to be stored in leaf nodes and arbitrary data of type 'b to be stored in all other nodes:

# type (' a, 'b) pbin_tree =

Leaf of ' aI Node of 'b * ('a, 'b) pbin_tree * ('a, 'b) pbin_tree;;

type' a 'b pbin_tree =

Leaf of 'aI Node of 'b * ('a, 'b) pbin_tree * ('a, 'b) pbin_tree

Having examined the fundamentals of tree-based data structures, we shall now examine thetwo main categories of trees - balanced trees and unbalanced trees.

3.9.1 Balanced trees

Balanced trees, in particular balanced binary trees, are prolific in computer science literature.As the simplest form of tree, binary trees simplify the derivation of algorithmic complexities.These complexities often depend upon the depth of the tree. Consequently, in the questfor efficient algorithms, data structures designed to maintain approximately uniform depth,known as balanced trees, are used as the foundation for a wide variety of algorithms.

A balanced tree (as illustrated in figure 3.10) can be defined as a tree for which the differencebetween the minimum and maximum depths tends to a finite value for any such tree containingn nodes in the limit8 n - 00. Practically, this condition is often further constricted to be thatthe difference between the minimum and maximum depths is no more than 2.

Balanced binary trees are prolific because they are very efficient for many useful operations.The efficiency of these trees stems from their structure. In terms of the number of nodestraversed, any node in a tree containing either n nodes or n leaf nodes may be reached byO(ln n) traversals from the root.

8 Although taking limits over integer-valued variables may seem dubious, the required proofs can, in fact,be made rigorous.

3.9. TREES 91

Figure 3.13: An optimally unbalanced binary tree of depth x = 7 containing 2x + 1 = 15nodes, including the root node and x + 1 = 8 leaf nodes.

For example, the set and map data structures provided in the OCaml core library both makeuse of balanced binary trees internally. This allows them to provide single-element insertion,removal and searching in D(1n n) time-complexity.

For detailed descriptions of balanced tree implementation, we refer the eager reader to therelevant computer science literature [5]. However, although computer science exploits balancedtrees for the efficient asymptotic algorithmic complexities they provide for common operations,which is underpinned by their balanced structure, the natural sciences can also benefit fromthe use of unbalanced trees.

3.9.2 Unbalanced trees

Many forms of data commonly used in scientific computing can be usefully represented hierarchically, in tree data structures. In particular, trees which store exact information in leafnodes and approximate information in non-leaf nodes can be of great utility when writingalgorithms designed to compute approximate quantities. In this section, we shall considerthe development of efficient functions required to simulate the dynamics of particle systems,providing implementations for one-dimensional systems of gravitating particles. We begin bydescribing a simple approach based upon the use of a fiat data structure (an array of particles) before progressing on to a vastly more efficient, hierarchical approach which makes use ofapproximate methods and the representation of particle systems as unbalanced binary trees.Finally, we shall discuss the generalisation of the unbalanced-tree-based approach to higherdimensionalities and different problems.

In the context of a one-dimensional system of gravitating particles, the mass m > 0 E lR andposition r E lR1 of a particle may be represented by the record:

# type particle = { ill: float; r: float};;type particle = {ill : float; r : float; }


A function force2 to compute the gravitational force (up to a constant coefficient):

between two particles, pi and p2, may then be written:

# let force2 p1 p2 =let d = p2. r -. p1. r inp1.m *. p2.m /. (d *. abs_float d);;

val force2 : particle -> particle -> float = <fun>

For example, the force on a particle PI of mass mI = 1 at position rl = 0.1 due to a particleP2 of mass m2 = 3 at position r2 = 0.8 is:

1 x 3 300F = (0.8 _ 0.1)2 = 49 ~ 6.12245

# force2 { m = 1. ; r = 0.1 } { m = 3. ; r = 0.8 };;- : float = 6.12244897959183554

The particle type and force2 function underpin both the array-based and tree-based approaches outlined in the remainder of this section.

The simplest approach to computing the force on one particle due to a collection of otherparticles is to store the other particles as a particle array and simply loop through the array,accumulating the result of applying the f orce2 function. This can be achieved using a fold:

# let array_force p ps =

Array. fold_left (fun f p2 -> f +. force2 p p2) O. ps;;val array_force: particle -> particle array -> float = <fun>

This function can be demonstrated on randomised particles. A particle with random massm E [0 ... 1) and position r E [0 ... 1) can be created using the function:

# let random_particle _ = {m = Random. float 1.; r = Random. float 1. };;val random_particle: 'a -> particle = <fun>

A random array of particles can then be created using the function:

# let random_array n =Array. init n random_particle;;val random_array: int -> particle array = <fun>

The following function computes the force on a random particle due to a random array of 105

particles, returning a 2-tuple of the time taken in seconds and the answer found9 :

9Timing functions such as Sys. time will be discussed in more detail in section 8.2.

3.9. TREES

# let origin = random_particle 0;;val origin: particle = {m = 0.140791689359313688;

r = 0.582751366306423546}# let sys = random_array 100000;;val sys : particle array = ...

# let t = Sys.time 0 inlet f = array_force origin sys inlet t = Sys.time 0 in(t, f);;

- : float * float = (0.91, 2178953383.57117701)

93

Computing the force on each particle in a system of particles is the most fundamental taskwhen simulating particle dynamics. Typically, the whole system is simulated in discrete timesteps, the force computed for each particle being used to calculate the velocity and accelerationof the particle in the next time step. In a system of n particles, the array_f orce functionapplies the f orce2 function exactly n - 1 times. Thus, using the array_f orce function tocompute the force on all n particles would require 8(n2 ) time-complexity. This quadraticcomplexity forms the bottleneck of the whole simulation. Hence, the array_force function isan ideal target for optimisation.

In this case, the array-based function to compute the force on a particle took 0.82 seconds.Applying this function to each of the 105 particles would, therefore, be expected to take almosta day. Thus, computing the update to the particle dynamics for a single time step is likelyto take at least a day. This is highly undesirable. Moreover, there is no known approach tocomputing the force on a particle which both improves upon the 8(n2 ) asymptotic complexitywhilst also retaining the apparent exactness of the simple, array-based computation we havejust outlined.

In computer science, algorithms are optimised by carefully designing alternative algorithmswhich possess better complexities whilst also producing exactly the same results. This pedantryconcerning accuracy is almost always appropriate in computer science. However, many subjects, including the natural sciences, can benefit enormously from relinquishing this exactnessin favour of artful approximation. In particular, the computation of approximations known tobe accurate to within a quantified error. As we shall now see, the performance of the arraybased function to compute the force on a particle can be greatly improved upon by using analgorithm designed to compute an approximation to the exact result.

Promoting the adoption of approximate techniques in scientific computations can be somewhatof an uphill struggle. Thus, we shall now devote a little space to the arguments involved.Often, when encouraged to convert to the use of approximate computations, many scientistsrespond by wincing and citing an article concerning the weather and the wings of a butterfly.Their point is, quite validly, that the physical systems most commonly simulated on computerare chaotic. Indeed, if the evolution of such a system could be calculated by reducing thephysical properties to a solvable problem, there would be no need to simulate the systemcomputationally.

The chaotic nature of simulated systems raises the concern that converting to the use of approximate methods is likely to change the simulation result in an unpredictable way. This is avalid concern. However, virtually all such simulation methods are already inherently approximate. One approximation is made by the choice of simulation procedure, such as the Verletmethod for numerically integrating particle dynamics over time [6]. Another approximation is


made by the use of finite-precision arithmetic. Consequently, the results of simulations shouldnever be examined at the microscopic level but, rather, via quantities averaged over the wholesystem. Thus, the use of approximate techniques does not worsen the situation.

We shall now develop approximation techniques of controllable accuracy for computing theforce of a particle due to a given system of particles, culminating in the implementation of aforce function which provides a substantially more efficient alternative to the array_forcefunction for reasonable accuracies.

In general, the strength of particle-particle interactions diminishes with distance. Consequently, the force exerted by a collection of distant particles may be well-approximated bygrouping the collection into a pseudo-particle. In the case of gravitational interactions, thiscorresponds to grouping the effects of large numbers of smaller masses into small numbers oflarger masses. This grouping effect can be obtained by storing the particle system in a treedata structure in which branches of the tree represent spatial subdivision, leaf nodes storeexact particle information and other nodes store the information required to make approximations pertaining to the particles in the region of space represented by their lineage of thetree.

The spatial partitioning of a system of particles at positions ri E JR. may be represented by anunbalanced binary tree of the type:

# type partition =Leaf of particle list

I Node of partition * particle * partition; ;type partition =

Leaf of particle listI Node of partition * particle * partition

Leaf nodes in such a tree contain a list of particles at the same or at similar positions. Othernodes in the tree contain left and right branches (which will be used to represent implicitsubranges [l, ~(l +u)) and [~(l +u), u), respectively) and the mass and position of a pseudoparticle chosen to approximate the summed effects of all particles farther down the tree.

The mass mp and position rp of a pseudo-particle approximating the effects of a list of particles(mi, ri) is given by the sum of the masses and the weighted average of the positions of particlesin the child branches, respectively:

The following function computes the pseudo-particle approximant ofthe given list of particles:

# let average 1 =let aux a p = {rn = a.rn +. p.rn; r = a.r +. p.rn *. p.r} inlet pp = List. fold_left aux { rn = O. ; r = O. } 1 inifpp.rn=O. thenppelse{rn=pp.rn; r=pp.r/. pp.rn};;

val average: particle list -> particle = <fun>

For example, the pseudo-particle representing two particles {ml = 1, rl = -I} and {m2 =3,r2 = I} is {m = 1 + 3,r = :H-l + 3)}:

3.9. TREES

# average [{m = 1. ; r = -1.}; {m = 3. ; r = 1.}] ; ;- : particle = {m = 4. ; r = O. 5}

95

A function to compute the root node shared by two branches of a part it i on tree, includingthe pseudo-particle approximant in the root node, may then be written:

# let node_of (left, right) =let of _child = function Leaf 1 - > average 1 I Node C, p, _) - > p inlet lp, rp = oLchild left, oLchild right inlet m = lp.m +. rp.m inletr=ifm=O. thenO. else (lp.m*. lp.r+. rp.m*. rp.r) I. minNode (left, {m = m; r = r }, right);;

val node_of: partition * partition -> partition = <fun>

The nested child_of function extracts the particle representation of a child branch in thetree, either as the pseudo-particle representation of a list of particles in a leaf node, computedby the average function, or as the pseudo-particle held in the non-leaf child node.

For example, creating a node from a left leaf containing two particles and an empty right leafresults in a node containing the left leaf, the pseudo-particle and the right leaf:

# node_of (Leaf [{m = 1. ; r = -1.}; {m = 3. ; r = -1.}] , Leaf []);;Node (Leaf [{m = 1.; r = -1.}; {m = 3.; r = -1.}] , {m = 4.; r = -1.}, Leaf [])

A particle system consists of the lower and upper bounds of the partition and the partitionitself:

# type system = { lower: float; tree : partition; upper: float};;type system = { lower: float; tree: partition; upper: float}

We shall assume that a system is initialised with a range which encompasses the positionsof any particles which will be inserted into it. The task of inserting a particle then requirestraversal of the tree to a leaf node representing a range which includes the position of theparticle and, if necessary, the splitting of this leaf node to insert the new particle. This canbe achieved using the following function:

# let insert p sys =let rec aux np 1 u =

let aux2 np left right 1 m u =let (left, right) =

if np.r < m then (aux np 1 m left, right)else (left, aux np m u right) in

node_of (left, right) infunction

Leaf [] - > Leaf [p]Leaf (pph: :ppt as pp) ->

if pph.r =np.r then Leaf (np: :pph: :ppt) elselet m = 0.5 *. (1 +. u) inlet left, right ::= List. partition (fun p -> p. r < m) pp inlet left, right = Leaf left, Leaf right inaux2 np left right 1 m u

I Node (left, _, right) -> aux2 np left right 1 (0.5 *. (1 +. u» u in{ sys with tree = aux p sys .lower sys. upper sys. tree}; ;

val insert : particle -> system -> system = <fun>


The nested aux function inserts the given particle into the given partition tree. The aux2

function nested within the aux function propagates insertion into the appropriate branch ofthe tree, left or right, by calling aux with the particle (np) to be inserted, the implicit range(either [l, m) or [m, u)) and the child tree. The aux function inserts a new particle into anempty leaf by replacing it with a leaf containing a new particle. A leaf already containingparticles is split into a new pair of child partition trees and the aux2 function then used toinsert the new particle into the appropriate child tree. A non-leaf node simply uses the aux2

function, which will replace the appropriate branch of the tree and the pseudo-particle whilstleaving the other branch intact.

For example, the empty particle system for particles in the range [0, 1) is:

# let empty_sys = { lower = o. ; tree = Leaf []; upper = 1. };;val sys : system = {lower = O. ; tree = Leaf [] ; upper = i.}

Inserting a single particle results in the root node of the tree being a leaf node containing theparticle:

# let sys = insert {m = 3.; r = 0.1 } empty_sys;;val sys : system = {lower = 0.; tree = Leaf [{m = 3. ; r = 0.1.}] ; upper = i.}

Inserting a second particle in the other half of the range of the system creates a balanced binarytree of depth 1, the left-hand branch of the tree containing the particle in the lower-half ofthe range and the right-hand branch containing the particle in the upper-half:

# let sys = insert { m = 1.; r = 0.8 } sys;;val sys : system =

{lower = 0.;tree =

Node (Leaf [{m = 3.; r = 0.1}], {m = 4.; r = 0.275},Leaf [{m = 1. ; r = O. 8}] ) ;

upper = 1.}

Inserting a third particle near an existing particle deepens the tree, producing more interestingstructure and pseudo-particle content:

# let sys = insert { m = 1.; r = 0.82 } sys; ;val sys : system =

{lower = 0.;tree =

Node (Leaf [{m = 3.; r = 0.1}], {m = 5.; r = 0.384},Node (Leaf [], {m = 2.; r = 0.81},

Node(Node (Leaf [{m = 1.; r = 0.8}], {m = 2.; r = 0.81},

Leaf [{m = 1. ; r = o. 82}] ) ,{m = 2. ; r = o. 81}, Leaf []»);

upper = 1.}

3.9. TREES

o

{m=3.; r=O.l}

0.5 1

97

{m=1.; r=O.8} {m=1.; r=O.82}

Figure 3.14: An unbalanced binary tree used to partition the space r E [0,1) in order toapproximate the gravitational effect of a cluster of particles in a system.

This tree is illustrated in figure 3.14. Note that the pseudo-particle at the root node of thetree correctly indicates that the total mass of the system is m = 3 + 1 + 1 = 5 and the centreof mass is at r = !(3 x 0.1 + 0.8 + 0.82) = 0.384.

We shall now consider the force on the particle at r = 0.1, exerted by the other particles atr = 0.8 and 0.82. The force can be calculated exactly, in arbitrary units, as:

'" mimj 3 x 1 3 x 1F = L-:-t (rj _ ri)2 = 0.72 + 0.722 ~ 11.9095

J

In this case, the force on the particle at ri = 0.1 can also be well-approximated by groupingthe effect of the other two particles into that of a pseudo-particle. From the tree, the pseudoparticle for the range ~ ::; rp < 1 is {m = 2.; r = o. 8n. Thus, the force may be wellapproximated by:

rv mp mi _ 3 x 2 rv

F - (rp

_ ri)2 - 0.712 - 11.9024

where mp and r p are the mass and centre of mass of the pseudo-particle, respectively.

Given the representation of a particle system as an unbalanced partition tree, the force onany given "origin" particle due to the particles in the system can be computed very efficientlyby recursively traversing the tree either until a pseudo-particle in a non-leaf node is found toapproximate the effects of the particles in its branch of the tree to sufficient accuracy or untilreal particles are found in a leaf node. This approach can be made mOre rigorous by boundingthe error of the approximation.

The simplest upper bound of error is obtained by computing the difference between the minimum and maximum forces which can be obtained by a particle distribution satisfying theconstraint that it must produce the pseudo-particle with the appropriate mass and position.If ri (j. [l, u), the force F is bounded by the force due to masses at either end of the range andthe force due to all the mass at the centre of mass:


3 x 2 (0.81 - 0.5 1 - 0.81 )1 - 0.5 (1 - 0.1)2 + (0.5 - 0.1)218.8426

For example, the bounds of the force in the previous example are given by r = 0.1, m = 3,l = 0.5, c = 0.81, u = 1 and M = 2:

3x25:F5:

(0.81 - 0.1)211.9024 5: F 5:

If this error was considered to be too large, the function to approximate the force would recurseinto the smaller-scale spatial range [l, u) = [0.75,1). This tightens the bound on the force to:

11.9024 5: F 5: 12.5707

This recursive process can be repeated either until the bound on the force is tight enough oruntil an exact result is obtained.

The following function computes the difference between the upper and lower bounds of theforce on an origin particle p due to a pseudo-particle pp representing a particle distribution inthe spatial range from 1 to u:

# let metric p pp 1 u =if 1 <= p.r && p.r < u then infinity elselet sqr x = x *. x inlet r = p.r and c = pp.r inlet fmin = p.m *. pp.m I. sqr (p.r -. pp.r) inlet fmax = p.m *. pp.m I. (u -. 1) *.

((c -. 1) I. (sqr (u -. r)) +. (u -. c) I. sqr (1 -. r)) infmax -. fmin;;

val metric: particle -> particle -> float -> float -> float = <fun>

Note that the metric function returns an infinite possible error if the particle p lies withinthe partition range [l, u), as the partition might contain another particle at the same position(~ !).For example, these are the errors resulting from progressively finer approximations:

# metric { m = 3.; r = 0.1 } { m = 2.; r = 0.81 } O. 1. ; ;- : float = infinity# metric { m =3.; r =0.1 } { m = 2.; r = 0.81 } 0.5 1.;;- : float = 6.94019227519524939# metric { m = 3.; r = 0.1 } { m = 2. ; r = 0.81 } 0.75 1.;;- : float = 0.668276868664461787# metric { m = 3.; r = 0.1 } { m = 2. ; r = 0.81 } 0.75 0.875;;- : float = 0.277220270131675051

A function to compute an approximation to the total force on a particle p due to other particlesin a system sys to within an error delta can be written:

# let force p sys delta =let rec aux 1 u =function

Leaf 1 -> List. fold_left (fun f p2 -> f +. force2 p p2) O. 1Node (left, pp, right) ->

if metric p pp 1 u < delta then force2 p pp elselet m = 0.5 *. (1 +. u) in(aux 1 m left) +. (aux m u right) in

aux sys.lower sys.upper sys.tree;;val force: particle -> system -> float -> float = <fun>

3.9. TREES

1092 t4

2

-2

-4

.-6

99

Figure 3.15: Measured performance of the tree-based approach relative to a simplearray-based approach for the evaluation oflong-range forces showing the resulting fractional error 8 = 10 - EllE vs time taken t = ttree/tarray relative to the array-basedmethod.

The tree representation of this particle system is easily constructed by folding our insertfunction over the array which was used to test the array_force function:

# let sys = Array. fold_left (fun s p -> insert p s) empty_sys sys;;val sys : system = ...

The tree-based force function can compute controllably accurate approximations to the forceon an origin particle due to a collection of other particles, trading accuracy for performance.The following time function measures the time taken to compute the force to within the givenpermissible error e, returning a 2-tuple of the time taken and the answer obtained:

# let time e =

let t = Sys.time 0 inlet ans = force origin sys e in( (Sys. time 0) -. t, ans);;

val time : float -> float * float = <fun>

Applying this function with increasing permitted error results in a significant improvement inperformance:

# time 1e-9;;- : float * float = (0.490000000000000213, 2178953383.57115459)# time 1e-6;;- : float * float = (0.179999999999999716, 2178953383.57127047)# time 1e-3;;- : float * float = (0.0500000000000007105,2178953383.58280039)

From measurements of real-time performance (illustrated in figure 3.15), when requiring aforce computation with an accuracy of one part in one million accuracy (log28 = -20),


the tree-based approach is approximately one thousand times faster (log2 t :::: -10) than thearray-based approach. Considering that, even when using the array-based approach, suchcomputations are inherently approximate, a fractional error of 10-6 is a small price to pay forthree orders of magnitude improvement in performance.

The tree-based approach we have just described is a simple form of what is now known asthe Fast Multipole Method (FMM) [7]. Before being applicable to most physical systems, theapproaches we have described must be generalised to higher dimensionalities. This generalisation is most easily performed by increasing the branching factor of the tree from 2 to 2d fora d-dimensional problem. A more powerful generalisation involves associating the branchesof the binary tree with subdivision along a particular dimension (either implicitly, typicallyby cycling through the dimensions, or explicitly, by storing the index of the subdivided dimension in the node of the tree). In particular, this allows anisotropic subdivision of space,i.e. some dimensions can be subdivided more than others. Anisotropic subdivision is usefulin the context of anisotropic particle distributions, such as those found in many astrophysicalsimulations. One such method of anisotropic subdivision is known as the k-D tree.

Chapter 4

Numerical Analysis

Computers can only perform finite computations. Consequently, computers only make useof finite precision representations of numbers. This has several important implications in thecontext of scientific computation.

This chapter provides an overview of the representations and properties of values of types intand float, used to represent members of the sets Z and JR, respectively. Practical examplesdemonstrating the robust use of floating-point arithmetic are then given. Finally, some otherforms of arithmetic are discussed.

4.1 Number representation

In this section, we shall introduce the representation of integer and floating-point numbersbefore outlining some properties of these representations.

4.1.1 Integers

Positive integers are represented by several, least-significant binary digits (bits). For example,the number 1 is represented by the bits ... 00001 and the number 11 is represented by thebits ... 01011. Negative integers are represented in twos-complement format. For example,the number -1 is represented by the bits ... 11111 and the number -11 is represented by thebits ... 10101.

Figure 4.1: Values i of the type int, called machine-precision integers, are an exactrepresentation of a consecutive subset of the set of integers i E [l .. .u] C Z where land uare given by min_int and max_int, respectively.

101

102 CHAPTER 4. NUMERICAL ANALYSIS

Figure 4.2: Values of the type float, called double-precision floating-point numbers,are an approximate representation of real-valued numbers, showing: a) full-precision(normalised) numbers (black), and b) denormalised numbers (red),

Consequently, the representation of integers n E Z by values of the type int is exact within afinite range of integers (illustrated in figure 4.1). This range is platform specific and may beobtained as the min_int and max_int values in the Pervasives module. On a 32-bit platform,the range of representable integers is substantial:

# min_int, max_int; ;- : int * int = (-1073741824, 1073741823)

On a 64-bit platform, the range is even larger.

The binary representation of a value of type int may be obtained using the following function:

# let binary_of_int n =let ree aux i =

let bit = if (n Isr i) land 1 = 0 then "0" else "1" inbit-(if i=O then "" else aux (i-l)) in

aux (Sys, word_size - 2);;val binary_of_int : int -> string = <fun>

On a w-bit machine, an int may use w - 1 bits (the remaining bit is used by the garbagecollector). This binary_of_int function contains a nested auxiliary function aux. The auxfunction considers each bit i E {O ... w - 2}, extracting the bit using the expression en Isri) land 1, and prepending a "0" or "1" onto the remaining computation. The aux functionis initially called with i = w - 2, where w is given by Sys. word_size.

For example, the 31-bit binary representations of 11 and -11 are:

# binary_of_int 11;;- : string = "0000000000000000000000000001011"# binary_of_int (-11);;- : string = "1111111111111111111111111110101"

As we shall see in this chapter, the exactness of the int type can be used in many ways.

4.1.2 Floating-point numbers

In science, many important numbers are written in scientific notation. For example, Avogadro's number is conventionally written NA = 6.02214 x 1023 • This notation essentiallyspecifies the two most important quantities about such a number:

4.1. NUMBER REPRESENTATION

1. the most significant digits called the mantissa, in this case 6.02214, and

2. the offset of the decimal point called the exponent, in this case 23.

103

Computers use a similar, finite representation called ''floating point" which also contains a mantissa and exponent. In OCaml, floating-point numbers are represented by values of the typefloat. Roughly speaking, values of type int approximate real numbers between -max_intand max_int with a constant absolute error of ! whereas values of the type float have anapproximately-constant relative error.

In order to enter floating-point numbers succinctly, the OCaml language uses a standard "e"notation, equivalent to scientific number notation a x lOb. For example, the number 5.4 x 1012

may be represented by the value:

# 5.4e12;;-: float=5.4e+12

As the name ''floating point" implies, the use of a mantissa and an exponent allows the pointto be ''floated'' to any of a wide range of offsets. Naturally, this format uses base-two (binary)rather than base-ten (decimal) and, hence, numbers are represented by the form a x 2b where ais the mantissa and b is the exponent. Double-precision floating-point values consume 64-bits,of which 53 bits are attributed to the mantissa (including one bit for the sign of the number)and the remaining 11 bits to the exponent.

Compared to the type int, the exponent in a value of type float allows a huge range ofreal-valued numbers to be approximated. As for the type int, this range is given by values inthe Pervasives module:

# min_float, max_float; ;- : float * float = (2. 22507385850720138e-308, 1.79769313486231571e+308)

Some useful values not in the set of real numbers JR. are also representable in floating-pointnumber representation. Numbers out of range are expressed by the values -0. (=I 0),neg_infinity (-00) and infinity (00). For example, in floating-point arithmetic ~ = -0:

# -1. I. infinity;;- : float = -0.

Also, nan is a special value, reserved for calculations which do not return a real-valued numberx E JR., e.g. when a supplied parameter falls outside the domain of a function. For example,In(-1) tJ. lR:

# log (-1.);;- : float = nan

The domain ofthe function log in the Pervasives module is 0 ::; x, with log o. evaluatingto neg_infinity.

In particular, nan is the only float not equal to itself:

104

# nan <> nan;;- : bool = true

CHAPTER 4. NUMERICAL ANALYSIS

In the case ofln(-1), the implementation of complex numbers provided in the Complex modulemay be used to calculate the complex-valued result:

# Complex.log (Complex.neg Complex.one);;- : Complex. t = {Complex. re = O. ; Complex. im = -3 .14159265358979312}

As well as min_float, max_float, infinity, neg_infinity and nan, the Pervasives modulealso contains an epsilon_float value:

# epsilon_float;;- : float = 2.22044604925031308e-16

This is the smallest number that, when added to one, does not give one:

# 1. +. epsilon_float;;- : float = 1.00000000000000022

Consequently, the epsilon_float value is seen in the context of numerical algorithms as itencodes the accuracy of the mantissa in the floating point representation. In particular, thesquare root of this number often appears as the accuracy of numerical approximants computedusing linear approximations (leaving quadratics terms as the largest remaining source of error).This still leaves a substantially accurate result, suitable for most computations:

# 1. +. sqrt epsilon_float;;- : float = 1. 00000001490116119

The approximate nature of floating-point computations is often seen in simple calculations.For example, the evaluation of ! is only correct to 16 fractional digits:

# 1. /. 3.;;- : float = 0.333333333333333315

In particular, the binary representation of floating-point numbers renders many decimal fractions approximate. For example, although 1 is represented exactly by the type float, thedecimal fraction 0.9 is not:

# 1. -. 0.9;;- : float = 0.0999999999999999778

Many of the properties of conventional algebra over real-valued numbers can no longer berelied upon when floating-point numbers are used as a representation. For more details, seethe relevant literature [8].

4.2. QUIRKS

4.2 Quirks

105

In the interests of efficiency, float arithmetic uses whatever form of floating-point arithmeticis provided in hardware and is closest to IEEE double-precision floating point.

However, x86 CPUs represent floating-point numbers in registers on the CPU using additional precision (80 bits instead of the usual 64). This can occationally result in unexpectedbehaviour. For example, when compiled to byte-code or executed in the top-level, the followingprogram prints zero as expected:

# let twothirds = 2. /. 3. inprint_endline (string_of_float (2. /.3. - twothirds));;

O.- : unit = ()

However, when compiled to x86 native-code using ocamlopt (version 3.08) this program produces a result close, but not equal, to zero:

3.70255041904e-17

This is a consequence of the value of the variable twothirds being stored as a 64-bit value inmemory and the sub-expression 2. /. 3. and result being evaluated in 80-bit registers.

4.3 Algebra

In real arithmetic, addition is associative:

(a+b)+c=a+(b+c)

In general, this is not true in floating-point arithmetic. For example, in floating-point arithmetic (0.1 + 0.2) + 0.3 =J 0.1 + (0.2 + 0.3):

# (0.1 +.0.2) +.0.3 = 0.1 +. (0.2 +.0.3);;- : bool = false

In this case, approximate number representations have resulted in slightly different approximations to the exact answer:

# (0.1+.0.2) +.0.3,0.1+. (0.2+.0.3);;- : float * float = (0.600000000000000089,0.6)

Hence, even in seemingly simple calculations, values of type float should not be comparedfor exact equality.

More significant errors are obtained when dealing with the addition and subtraction of numberswith wildly different exponents. For example, in real arithmetic 1.3 + 1015 - 1015 = 1.3 butin the case of float arithmetic:

106

# 1.3 +. le15 -. le15;;-: float = 1.25

CHAPTER 4. NUMERICAL ANALYSIS

The accuracy of this computation is limited by the accuracy of the largest magnitude numbersin the sum. In this case, these numbers are 1015 and _1015 , resulting in a significant error of0.05 in this case.

The accuracy of calculations performed using floating-point arithmetic may often be improvedby careful rearrangement of the expressions. Such rearrangements often result in more complicated expressions which are, therefore, slower to execute. For example, this form of thefunction f (x):

h(x) = vT+X-1involves the subtraction of a pair of similar numbers when x :::: O. This may be expressed inOCaml as:

# let f_l x = sqrt (1. +. x) -. 1.;;val f_l : float -> float = <fun>

As expected, results of this function are significantly erroneous in the region x :::: O. Forexample:

1 + 10151015 - 1 :::: 4.99999999999999875 ... X 10-16

# f_l le-15;;- : float = 4.44089209850062616e-16

The h function may be rearranged into a form which evades the subtraction of similar-sizednumbers around x :::: 0:

xh(x) = 1 + vT+X

This may be expressed in OCaml as:

#letf_2 x=x/. (1. +. sqrt (1. +. x));;val f_2 : float -> float = <fun>

Although h(x) = h(x) \:j x E lR, the h form of the function is better behaved when evaluatedusing floating-point arithmetic, particularly in the region x :::: O. For example, the value ofthe function at x = 10-15 is much better approximated by h than it was by h:

# C2 le-15;;- : float = 4. 9999999999999994e-16

This is particularly clear on a graph of the two functions around x :::: 0 (illustrated in figure4.3).

4.4. INTERPOLATION

f(x)

2x10-15

1 x 10-15

-4x10-15 -2x10-15

X10-15

-2x10-15

2x10-15 4x10-15x

107

Figure 4.3: Accuracy of two equivalent expressions when evaluated using floating-pointarithmetic: a) h(x) = v1 + x - 1 (red line), and b) h(x) = xj(1 + v1 + x) (green line).

4.4 Interpolation

Due to the accumulation of round-off error, loops should not use loop variables of type float

but, rather, use the type int and, if necessary, convert to the type float within the loop.Interpolation is an important example of this.

The following higher-order function tries to fold over an interpolation across a semi-inclusiverange [l,u) making n applications of f(x) with:

x E {l, l + d, l + 2d, ... ,u - d}

where d = (u - l)jn:

# let interp f aeeu 1 u n =let d = (u -. 1) /. float_of_int n inlet ree aux aeeu x = if x float -> 'a) -> 'a -> float -> float -> int -> 'a = <fun>

However, this function makes inappropriate use of floating-point arithmetic. Specifically,the step size d = (u - l)jn is precalculated and repeatedly added to the "loop variable" x.Consequently, this function is prone to cumulative errors in x.

Choosing a range and number of steps for which the floating-point representations happen tobe exact, this function produces the desired behaviour. For example, with l = 0, u = 1 andn = 4, the function f is invoked exactly n = 4 times, as expected:

# interp (fun 1 x -> x:: 1) [J 0.1. 4;;- : float list = [0.75; 0.5; 0.25; O.J

However, when the required arithmetic does not happen to be exact, unexpected behaviourcan arise. For example, with l = 0, u = 0.9 and n = 3, the function f is invoked four timesinstead of n = 3 times:


# interp (fun 1 x->x:: 1) [J 0.0.93;;- : float list = [0.899999999999999911; 0.6; 0.3; O.J

In this case, the result of repeatedly adding the approximate representation of d to that of x,starting with x = l, produced an approximation which was slightly lower than u. Thus, thefunction f was erroneously applied an extra time, with an argument approximately equal tou = 0.9. This produced a list containing four elements instead of the expected three.

As such functionality is commonly required in scientific computing, a robust alternative mustbe found.

Fortunately, this problem is easily solved by resorting to an exact form of arithmetic for theloop variable, typically int arithmetic, and converting to floating-point representation at alater stage. For example, the interp function may be written robustly by using an integerloop variable i:

x(i) = l + i(u-l)n

for i E {O ... n - 1}:

# let interp f accu 1 u n =let x i = 1 +. (float_oCint i) /. (float_oCint n) *. (u -. 1) inlet rec aux i accu = if i < n then aux (i + 1) (f accu (x i)) else accu inaux 0 accu;;

val interp: (, a -> float -> 'a) -> 'a -> float -> float -> int -> 'a = <fun>

Thanks to the use of an exact form of arithmetic, this function produces the desired behaviour:

# interp (fun 1 x -> x:: 1) [J O. 0.93;;- : float list = [0.6; 0.3; O.J

We shall now conclude this chapter with two simple examples of the inaccuracy of floatingpoint arithmetic.

4.5 Quadratic solutions

The solutions of the quadratic equation ax2 + bx + c = 0 are well known to be:

-b± Jb2 - 4acXl,2 = 2a

The root Jb2 - 4ac may be productively factored out of these expressions:

y = y!b2 -4ac

y-bXl= -

2ay+b

X2= --2a

These values are easily calculated using floating-point arithmetic:

4.6. MEAN AND VARIANCE

# let quadratic abc =lety=sqrt (b*. b-. 4. *. a*. c) in(-. b +. y) I. (2. *. a). (-. b -. y) I. (2. *. a);;

val quadratic: float -> float -> float -> float * float = <fun>

109

However, when evaluated using floating-point arithmetic, these expressions can be problematic. Specifically, when b2 » 4ac, subtracting 4ac from b2 in the subexpression b2 - 4acwill produce an inaccurate result approximately equal to b2• This results in -b + Vb2 - 4acbecoming equivalent to -b + b and, therefore, an answer of zero.

For example, using the conditions a = 1, b = 109 and c = 1, the correct solutions arex ~ -10-9 and -109 but the above implementation of the quadratic function rounds thesmaller magnitude solution to zero:

# quadratic 1. 1e91.;;- : float * float = (0 .• -1e+9)

The accuracy of the smaller-magnitude solution is most easily improved by calculating thesmaller-magnitude solution in terms of the larger-magnitude solution, as:

_ { b 2 0 _Y~b cXl - b < 0 y-t X2 = -

2a Xla

This formulation, which avoids the subtraction of similar values, may be written:

# let quadratic abc =let y = sqrt (b *. b -. 4. *. a *. c) inlet xi = (if b < O. then (y -. b) else -. (y +. b» I. (2. *. a) inxi. c I. (xi *. a);;

val quadratic: float -> float -> float -> float * float = <fun>

This form of the quadratic function is numerically robust, producing a more accurate approximation for the previous example:

# quadratic 1. 1e9 1.;;- : float * float = (-1000000000 .• -1e-09)

Numerical robustness is required in a wide variety of algorithms. We shall now consider theevaluation of some simple quantities from statistics.

4.6 Mean and varIance

In this section, we shall illustrate the importance of numerical stability using expressions forthe mean and variance of a set of numbers.

The mean value x of a set of n numbers Xk is given by:

1 nX=- LXk

nk=l

This expression may be computed by a function written in terms of a fold_left by accumulating the sum and number of the elements:


# let mean x =let (sum, n) =

List.fold_left (fun (sum, n) e -> (sum +. e, n + 1» (0.,0) x insum /. float_of_int n;;

For example, the mean of {1, 3, 5, 7} is 1(1 + 3 + 5 + 7) = 4:

# mean [1. ; 3. ; 5. ; 7. ] ; ;- : float = 4.

Although the sum of a list of floating point numbers may be computed more accuratelyby accumulating numbers at different scales and then summing the result starting from thesmallest scale numbers, the straightforward algorithm used by this mean function is oftensatisfactory. The same cannot be said of the straightforward computation of variance.

The variance a 2 of a set Xk of numbers is typically written:

Although variance is a strictly non-negative quantity, the subtraction of the sums in thisexpression for the variance may produce small, negative results when computed directly usingfloating-point arithmetic, due to the accumulation of rounding errors. This problem can beavoided by computing via a recurrence relation [8]:

Mk = M k- l + (Xk - M k - l ) /k

8k = 8k-l + (Xk - Mk-l) X (Xk - Mk)

Thus, the variance may be computed more accurately using the following function:

# let variance =let aux (m_k, s_k, k) x_k =

let m_k2 = m....k +. (x_k -. m....k) /. k in(m_k2, s_k +. (x_k -. m_k) *. (x_k -. m_k2) , k +. 1.) in

function [J -> invalid_arg "variance" I xl:: t ->let C, s, n2) = List. fold_left aux (xl, O. , 2.) t ins /. (n2 -. 2.);;

val variance: float list -> float = <fun>

The nested auxiliary function aux accumulates the recurrence relation when applied to fold_left.The body of the variance function is then a A-function which tries to decapitate the givenlist. If the given list is empty than an Invalid_argument exception is raised. Otherwise, thefirst element of the given list is used to initialise the recurrence relation (Ml = Xl, 81 = 0and k = 2) which is then executed over the remaining elements using fOld_left, to return abetter-behaved approximation to the variance a2 ~ 8n /(n - 1).

For example, the variance of {1, 3, 5, 7} is a 2 = 6~ and the variance function gives an accurateresult:

4.7. OTHER FORMS OF ARITHMETIC

# variance [1.; 3. ; 5. ; 7. ] ; ;- : float = 6.66666666666666696

111

The numerical stability of this variance function allows us to write a function to computethe standard deviation (7 the obvious way, without having to worry about negative roots:

# let standard_deviation x = sqrt (variance x) ;;val variance : float list -> float = <fun>

Clearly numerically stable algorithms which use floating-point arithmetic can be useful. Weshall now examine some other forms of arithmetic.

4.7 Other forms of arithmetic

As we have seen, the int and float types in OCaml represent numbers to a fixed, finiteprecision. Although computers can only perform arithmetic on finite-precision numbers, theprecision allowed and used could be extended indefinitely. Such representations are known asarbitrary-precision numbers.

In this section, we shall introduce arbitrary-precision rational and floating-point arithmetic aswell as adaptive-precision arithmetic, which solves problems using as little extra precision aspossible.

4.7.1 Rational arithmetic

Rationals, fractions of the form ~, q > 0, P E Z, may be represented exactly using rationalarithmetic. This form of arithmetic uses arbitrary precision integers to represent p and q.

Compared to the type float, rational arithmetic allows arbitrary precision to be used for anyvalue in R The higher the precision, the slower the calculations.

Arbitrary-precision integer and rational arithmetic are implemented by the Num module (introduced in sections 2.6.1 and 2.7). We shall use the custom top-level built in section 2.7.

As we have seen, a factorial function may be written:

# open Num;;# let rec factorial n =

if n = 0 then lnt 1 else lnt n *1 factorial (n-1);;val factorial: int -> Num.num = <fun># string_of_num (factorial 33);;- : string = "8683317618811886495518194401280000000"

Rational arithmetic, represented by the constructor Ratio, may then be used to calculate anapproximation to:

00 1e=:L-:r

i=O ~.


# let rec e n =

if n = 0 then lnt 1 else lnt 1 II factorial n +1 e (n - 1) ;;val e : int -> Num.num = <fun>

For example:

~ .!. = 7437374403113~i! 27360571392002=0

# string_of _num (e 17);;- : string = "7437374403113/2736057139200"# float_of_num (e 17);;2.71828182846# exp 1.;;- : float = 2.71828182845904509

Rational arithmetic can be useful in many circumstances, including geometric computations.

4.7.2 High-precision floating point

Rational arithmetic is ill-suited to calculations involving numbers with wildly varying magnitudes. In such cases, a form of high-precision floating-point arithmetic can be useful. Typically,this entails a representation with a controllably accurate mantissa and an arbitrary-precisioninteger exponent. This functionality is provided by the freely available GNU MP (GMP)library. OCaml bindings to GMP called mlgmp are freely available.

The OCaml bindings to the GNU MP library encapsulate the interface in a module Gmp.This module contains submodules F, Q and Z implementing arbitrary-precision floating-point,rational, and integer arithmetics, respectively.

A top-level which includes the functionality of mlgmp may be created using:

$ ocamlmktop -custom gmp.cma -0 gmp.top

This top-level may then be used to execute code using the GMP library.

For example, consider computing V2 to 50 decimal places accuracy by applying the NewtonRaphson root-finding method to the function j(x) = x2

- 2. We can begin by opening thenamespace of the Gmp module:

# open Gmp; ;

As we only wish to use high-precision floating-point arithmetic, we can productively replacethe usual operators with their high-precision equivalents:

# let (+. ), (-. ), (*. ), (I. ) = F.add, F.sub, F.mul, F.div;;val ( +.) Gmp.F.t -> Gmp.F.t -> Gmp.F.t = <fun>val ( -.) Gmp.F.t -> Gmp.F.t -> Gmp.F.t = <fun>val ( *.) Gmp.F.t -> Gmp.F.t -> Gmp.F.t = <fun>val ( I.) Gmp . F . t - > Gmp. F . t - > Gmp. F . t = <fun>

4.7. OTHER FORMS OF ARITHMETIC 113

The default precision (for new numbers) can be set in terms of the number of bits accuracy,with 100 digits being log2 10100 ~ 332:

# F. default_pree : = 332;;- : unit = 0

A generic Newton-Raphson method may be implemented as a higher-order function whichaccepts the function f : lR ~ lR, derivative function f' : lR ~ lR, initial estimate of the root xand number of iterations n:

# let ree newton_raphson f f' x n =if n = 0 then x elsenewton_raphson f f' (x -. ((f x) I. (f' x))) (n - 1) ; ;

val newton_raphson :(Gmp.F.t -> Gmp.F.t) -> (Gmp.F.t -> Gmp.F.t) -> Gmp.F.t -> int -> Gmp.F.t=<fun>

In order to find .J2, we use f(x) = x2- 2 and, therefore, f'ex) = 2x:

# let f x = x *. x -. F. from_float 2.;;val f : Gmp.F.t -> Gmp.F.t = <fun># let f' x = x *. F. from_float 2. ; ;val f' : Gmp. F . t -> Gmp. F . t = <fun>

Computing the root of f (x) using only 7 iterations produces a result accurate to 50 digits,represented by a value of the abstract type F. t:

# let x = newton_raphson f f' (F. from_float 1.) 7;;val x : Gmp.F.t = <abstr>

This value may be converted into a string in the given base with the given number of digitsusing the to_string_base_digits function in the F submodule of Gmpl:

# F. to_string_base_digits -base: 10 -digits: 50 x;;- : string = "1. 4142135623730950488016887242096980785696718753769EO"

Although such high-precision arithmetic can be used to achieve the necessary precision, robustalgorithms using ordinary floating-point arithmetic, or adaptive-precision are likely to be muchmore efficient.

4.7.3 Adaptive precision

Some problems can use ordinary precision floating-point arithmetic most of the time andresort to higher-precision arithmetic only when necessary. This leads to adaptive-precisionarithmetic, which uses fast, ordinary arithmetic where possible and resorts to suitable higherprecision arithmetic only when required.

Geometric algorithms are an important class of such problems and a great deal of interestingwork has been done on this subject [9]. This work is likely to be of relevance to scientistsstudying the geometrical properties of natural systems.

IThis uses named arguments (of the form - name: value) which are described in section A.2.

114 CHAPTER 4. NUMEillCAL ANALYSIS

Chapter 5

Input and Output

In this chapter, we examine the various ways in which an OCaml program can transfer data,including printing to the screen and saving and loading information on disc. In particular, weexamine some sophisticated tools for OCaml which greatly simplify the task of designing andimplementing programs to load files in well-defined formats.

5.1 Printing to screen

The ability to print information on the screen is useful as a means of conveying the result ofa program (if the result is simple enough) and providing run-time information on the currentstate of the program as well as providing extra information to aid debugging. Naturally, OCamlprovides several functions to print to the screen l which are in the Pervasives module.

The print_string function can be considered the most primitive function for printing to thescreen. This function prints the given string with no carriage return. For example, usingthe print_string function to print the string "Hello " and then the string2 "world! \n" isequivalent to just printing the string "Hello world! \n":

# print_string "Hello ";print_string "world!\n";;

Hello world- : unit = 0

Built-in types can be converted to strings using functions such as string_oi_int. To savetyping, abbreviated printing functions also exist for built-in types:

# print_float 1.3;;1.3- : unit = 0

A carriage return can be printed using the print_newline function:

lIn Unix systems, these functions send character data to standard output (stdout) which is displayed onthe console by default but which can be piped into a file instead, if desired.

2The character '\n' represents a new line.

115

116

# print_newline ();;

- : unit = 0

CHAPTER 5. INPUT AND OUTPUT

The functionality of printing can only be provided by way of a side-effect and, therefore,printing must be performed by functions. Accidentally omitting the unit argument to theprint_newline function is a common mistake which results in the function being returnedwithout printing anything:

# print_newline;;- : unit -> unit = <fun>

A string can be printed with a terminating new line using the print_endline function:

# print_endline "Hello world!";;Hello world- : unit = 0

In addition to printing, the print_newline and print_endline functions also force any previous printing to be completed. This is known as flushing the output stream. Failing toflush the output stream, particularly when printing debugging information, can be a sourceof confusion. A stream may be flushed explicitly using the flush function in the Pervasivesmodule:

# flush stdout;;- : unit = 0

Data can be read from or saved into a file on disc in much the same way as printing to thescreen.

5.2 Reading from and writing to disc

The act of saving data in a file is performed by opening the file, writing data to it and thenclosing the file. We shall now examine the basic syntax for performing these operations.

Files can be opened for reading or writing using the open_in and open_out functions, respectively. For example, the following opens a file (either replacing an existing file or creating anew file) called ''test.txt'' for output, referring to the result (known as a file handle) as handle:

# let handle = open_out "test. txt";;val handle : out_channel =<abstr>

The resulting file handle can then be passed to the output_string function in order towrite into the file. For example, the following outputs the string "Hello world!" to thefile "test.txt":

# output_string handle "Hello world!";;- : unit = 0

5.3. MARSHALLING 117

Files will be closed automatically when the handle is gaxbage collected but can also be closedexplicitly using the close_in and close_out functions. For example, the following closes the''test.txt'' file:

# close_out handle;;- : unit = ()

A file might be closed explicitly to ensure that the file is closed before being reopened, e.g. whenwriting to and subsequently reading from the same file.

We can load the contents of this file using equivalent functions:

# let h = open_in "test. txt" inlet s = input_line h inclose_in h;s; ;

- : string = "Hello world!"

The functions for reading from disc may also be used to read from the command-line bysupplying the input channel stdin.

In addition to these functions, and equivalent functions dealing with integers, bytes and characters, OCaml provides functions for performing generic input and output, a, technique knownas marshalling.

5.3 Marshalling

A very powerful pair of functions capable of saving and loading any value of any type are alsoprovided by OCaml. These functions axe input_value and output_value. For example, thefollowing outputs data representing a 3-tuple of type int * float * string list into a file"test.dat":

# output_value (open_out_bin "test.dat") 0,3., ["piece"]);;- : unit = 0

The input_value and output_value functions offer some sophistication in that they automatically detect and handle cyclic and shared data structures. However, they are not typesafe. Consequently, when values axe read back, their type must be specified explicitly andcorrectly. For example, the 3-tuple written to the file "test.dat" may be read back in using theinput_value function by explicitly declaxing the type of the result:

# let a : int * float * string list = input_value (open_in_bin "test.dat");;val a : int * float * string list = 0, 3., ["piece"])

Note that the open_out_bin and open_in_bin functions were used to ensure that the file wasopened in binaxy mode3 • Although these functions are extremely useful, they count among

3This distinction is only important when the operating system makes a distinction, e.g. under MicrosoftWindows.

118 CHAPTER 5. INPUT AND OUTPUT

AbstractLexing Parsing

Characters ' Tokens ----=:....., SyntaxTree

•••

I IDENT"x" I

ASSIGN

INTEGER 1

PLUS

INTEGER 2

•••

Figure 5.1: Parsing character sequences often entails lexing into a token stream andthen parsing to convert patterns of tokens into grammatical constructs represented hierarchically by a tree data structure.

the more developmental aspects of the language and, therefore, are likely to change in thefuture. In particular, the binary format is likely to change, potentially rendering old dataunreadable. Hence, the input_value and output_value functions are most useful for storingdata temporarily. For example, in order to allow a program to be stopped and started withoutlosing its intermediate data.

We shall now examine a very sophisticated, and yet safe and easy to use, method for inputtingdata from more exotic formats.

5.4 Lexing and Parsing

In addition to the primitive input and output functions offered by OCaml, the language isbundled with very powerful tools, called ocamllex and ocamlyacc, for deciphering the contentof files according to a formal grammar. This aspect of the language has been particularlywell honed due to the widespread use of this family of languages for writing compilers andinterpreters, Le. programs which understand, and operate on, other programs.

In the context of scientific computing, providing data in a human readable format whichhas a formally defined grammar is highly desirable. This allows a data file to convey usefulinformation both to a human and to a computer audience. In the case of a human, the filecan be read using a text editor. In the case of a computer, the file can be parsed to produce adata structure which reflects the information in the file (illustrated in figure 5.1). In the lattercase, the program could then go on to perform computations on the data and also, possibly,save the data in the same format.

5.4. LEXING AND PARSING 119

The ability to use ocamllex and ocamlyacc is, therefore, likely to be of great benefit toscientists. We shall now examine the use of these tools in more detail.

5.4.1 Lexing

The first step in using these tools to interpret a file is called lexing. This stage involvesreading characters from the input, matching them against patterns and outputting a streamof tokens. A token is a value which represents some of the input. For example, a sequence ofspace-separated digits could be lexed into a stream of integer-valued tokens.

In order to produce tokens, patterns are spotted in the characters by matching them againstregular expressions, also known as regexps. Analogously to pattern matches, regexps given toocamllex may contain several kinds of structure. Many of the kinds of constructs availableto a regexp for ocamllex are identical to those used in OCaml pattern matching. Specifically:

'x' matches the specified, single character.

match any single character.

"string" match the given string of characters.

Several other constructs are specific to regexps:

[ 'a' 'c' 'e' ] match any character in the given set. Within a set, ranges of consecutive characters may be specified using the shorthand notation 'a'-'c' for 'a', 'b' or 'c'.

[ ~ 'a' 'c' 'e' ] match any character not in the given set.

regexp * match zero or more repetitions of a string matching regexp.

regexp + match one or more repetitions of a string matching regexp.

regexp ? match regexp or the empty string.

regexPl # regexp2 match any string which matches regexPl and does not match regexp2.

regexpl I regexp2 match any string which either matches regexPl or matches regexp2.

regexPl regexp2 concatenate a string matching regexPl with a string matching regexp2.

eof match the end of file.

Before delving into the use of ocamllex in writing lexers, some example regular expressionsused to match practically important character sequences will be of use. String representationsof integers are very simple, consisting only of one or more decimal digits. This is easilyrepresented by the regular expression:

['0'-'9'J+


Note that, for this and most other regular expressions to work, the lexer must be greedy. Inthis case, given a sequence of digits, the lexer will match them all to this regexp, rather thanmatching only the first digit.

String representations of floating-point numbers are somewhat more adventurous. An initialattempt at a regular expression might match a sequence a digits followed by a full-stop followedby another sequence of digits:

['0'-'9']+ '.' ['0'-'9']+

This will match "12.3" and "1.23" correctly but will fail to match "123." and ".123". Thesecan be matched by splitting the regular expression into two variants, one which allows zero ormore digits before the full-stop and one which allows zero or more digits after the full-stop:

['0'-'9']+'.' ['0'-'9']* I ['0'-'9']* ' , ['0'-'9']+

Before continuing, we can usefully factor this regexp using a let construct provided byocamllex for dealing with regexps:

let digit = ['0'-'9']digi t+ ' , digit* I digit* , , digit+

As we have already seen, a conventional notation (e.g. 1,000,000,000,000=1e12) exists fordecimal exponents. The exponent portion of the string ("e12") may be represented by theregexp:

let exponent = ['e' 'E'] ['+' '-']? digit+

Thus, a regular expression matching positive floating-point numbers represented as stringsmay be written:

let digit = ['0'-'9']let exponent = ['e' 'E'] ['+' '-'] digit+(digit+ '.' digit* I digit* ' , digit+) exponent?

On the basis of these example regular expressions for integer and floating-point number representations, we shall now develop a lexer. A file giving the description of a lexer for ocamllexhas the suffix ".mll". Although lexer definitions depend upon external declarations, we shallexamine the description of the lexer first. Specifically, we shall consider the file "myLexer.mll":

5.4. LEXING AND PARSING

{

open MyParserlet line = ref 1

}

let digit = ['0'-'9']let exponent = ['e' 'E'] ['+' ,-,] digit+let floating = (digit+ ' , digit* I digit* , , digit+) exponent?

121

rule token = parse[, , '\t'] {token lexbuf}

I '\n' { iner line; CR }I floating { REAL (float_of _string (Lexing .lexeme lexbuf» }I digit+ { INTEGER(int_of_string (Lexing.lexeme lexbuf» }I eof { EOF }I _ { failwith ("Mistake at line "-string_of_int !line) }

A lexer description for ocamllex begins with a header of ordinary OCaml code enclosed incurly braces. In this case, the header opens the namespace of the MyParser module (whichwe have yet to define) and creates a mutable variable line to keep track of the line number.The header is followed by let constructs which build regular expressions. Finally, the gutsof the lexer appear as a rule called token which parses a sequence of characters, matchingthem against regular expressions and executing corresponding actions, most of which producetokens.

The first rule matches spaces and tabs, simply absorbing this ''whitespace'' by recursivelycalling the token rule. The second rule matches the new-line character, incrementing theline count and generating the CR token. The third rule matches the string representationof a floating point number. The lexbuf variable contains the current state of the lexer, theLexing .lexeme function extracts the matched string from lexbuf. This string is used to generate a REAL token containing the corresponding value of type float using the float_of _string

function. The fourth rule matches the string representation of an integer, generating anINTEGER token containing the corresponding value of type into The fifth rule matches theend-of-file marker, generating the EOF token. Finally, the sixth rule is a catch-all which matchesany other character sequences and raises an exception containing the line number of the invalidinput.

This lexer can be compiled into an OCaml program using the ocamllex compiler. For example,from the Unix shell:

$ oeamllex myLexer.mll12 states, 322 transitions, table size 1360 bytes

The resulting OCaml source code which implements a lexer of this description is placed in the"myLexer.ml" file. Before compiling this file, we must create the myParser module which itdepends upon.

In many cases, a parser using a lexer would itself be generated from a parser description, usingthe ocamlyacc compiler. We shall describe this approach in the next section but, before this,we shall demonstrate how the functionality of a generated lexer may be exploited withoutusing ocamlyacc.


Before compiling the OCaml program "myLexer.ml", which implements our lexer, we mustcreate the MyParser module which it depends upon:

# module MyParser =

structtype token = CR I REAL of float I INTEGER of int I EOF

let rec main lexer lexbuf = match lexer lexbuf withCR ->

print_endline "CR";main lexer lexbuf

INTEGER n ->print_endline ("INTEGER "-string_of_int n);main lexer lexbuf

REAL x ->print_endline ("REAL "-string_of_float x);main lexer lexbuf

I EOF -> 0I _ -> failwith "Not EOF"

end; ;module MyParser :

sigtype token = CR I REAL of float I INTEGER of int I EOFval main: (, a -> token) -> 'a -> unit

end

Note that the CR, REAL, INTEGER and EOF tokens used by the lexer are actually nothing morethan type constructors, in this case for for the MyParser. token type. Having defined theMyParser module and, in particular, the MyParser. token variant type, we can include thefunctionality of the lexer into the top-level using the #use directive4 :

# #use ''myLexer.ml'';;

Finally, we can try our lexer by passing an entry point into the lexer and a buffer for thelexer to the MyParser. main function. The entry point for our lexer is the only rule, token,represented by the MyLexer . token function. A buffer for the lexer can be created using theLexing. from_ channel function from the OCaml core library. Using standard input, we have:

# MyParser.main token (Lexing.from_channel stdin);;

The lexer is now being used to interpret standard input. Typing:

4 567.9

Results in MyParser .main printing:

4The top-level spits out a great deal of superfluous output, which we have not included here.


INTEGER 4INTEGER 5INTEGER 6REAL 7.9CR

Typing:

Rubbish!

Results in:

Exception: Failure "Mistake at line 2".

123

The capabilities of a lexer can clearly be useful in a stand-alone configuration. In particular,programs using lexers, such as that we have just described, will validate their input to somedegree. In contrast, many current scientific applications silently produce erroneous results.However, the capabilities of a lexer can be greatly supplemented by an associated parser, aswe shall now demonstrate.

5.4.2 Parsing

The parsing stage of interpreting input converts the sequence of tokens from a lexer into ahierarchical representation (illustrated in figure 5.1) - the abstract syntax tree (AST). Thisis performed by accumulating tokens from the lexer either until a valid piece of grammar isrecognised and can be acted upon, or until the sequence of tokens is clearly invalid, in whichcase a Parsing. Parse_error exception is raised.

The ocamlyacc compiler can be used to create OCaml programs which implement a specifiedparser. The specification for a parser is given by a description in a file with the suffix ".mly".Formally, these parsers implement LALR(l) grammars described by rules provided in BackusNaur form (BNF).

For example, consider a parser, based upon the lexer described in the previous section, whichinterprets textual data files. These files are expected to contain an integer followed by threefloating-point numbers on each line, for an unknown number of lines. This format might beused to represent the chemical element and three dimensional coordinate of each atom in amolecule.

A program to parse these files can be generated by ocamlyacc from a grammar which we shallnow describe. Given a file "name.mly" describing a grammar, by default ocamlyacc producesa file "name.ml" containing an OCaml program implementing that grammar. Therefore, inorder to generate a MyParser module for the lexer, we shall place our grammar description ina "myParser.mly" file.

Grammar description files begin by listing the definitions of the tokens which the lexer maygenerate. In this case, the possible tokens are CR, EOF, INTEGER and REAL.

Tokens called NAME1, NAME2 and so on, which carry no associated values, may be definedby:

124

%token NAMEl NAME2 ...

CHAPTER 5. INPUT AND OUTPUT

A token called NAME which carries an associated value of type type is defined by:

%token <type> NAME

Thus, the tokens for our lexer can be defined as:

%token CR EOF%token <int> INTEGER%token <float> REAL

The token definitions are followed by a declaration of the entry point into the parser (i.e. therule of the parser used to start the parsing process). For reasons that will become clear, weshall use:

%start main

This is followed by a declaration of the type returned by the action corresponding to the entrypoint. In this case:

%type < (int * float * float * float) list> main

Before the rules and corresponding actions describing the grammar and parser are given, thetoken and entry-point definitions are followed by a separator:

%%

The guts of the parser are represented by a sequence of groupings of rules and their corresponding actions:

group:I rulel

{ actionl }

rulen{ actionn };

A grouping represents several possible grammatical constructs, all of which are used to produceOCaml values of the same type, i.e. the types of the expressions action 1 to actionn must bethe same. Rules are simply a list of expected tokens and groupings. In particular, rules maybe recursive, i.e. they may refer to themselves, which is useful when building up arbitrarilylong lists.

Our parser will begin with a description of the expected contents of a line of input. We shallcall this single-rule group atom:


atom:

This group contains a single rule - the expected contents of a line of input:

I INTEGER REAL REAL REAL

125

The action corresponding to this rule will convert the matched tokens into an OCaml datastructure (which will become a branch of the resulting AST). In this case, the four matchedtokens will be converted into a 4-tuple int * float * float * float containing the contentsof each token. In general, the data associated with the i th token of a rule's pattern may bereferred to using the notation $i in the corresponding action. Thus, our action is simply:

{ ($1, $2, $3, $4) };

Our parser will only contain one other group, main, which we defined as the entry-point intothe parser when we specified %start main. This group contains two rules, the actions of whichreturn the type (int * float * float * float) list. The first rule handles a line of inputcontaining an atom followed by the remainder of the file, prepending the 4-tuple generatedfor the atom onto the list generated by parsing the remainder of the input. The second rulematches the end of the input, producing the empty list:

main:atom CR main

{ $1 :: $3 }EOF

{ [J };

Note that a semicolon terminator is placed at the end of each group.

An OCaml program "myParser.ml" implementing this parser, described in this "myParser.m1y"file, may be compiled using the ocamlyacc program:

$ ocamlyacc myParser.mly

The MyLexer and MyParser modules can then be compiled using the usual commands:

$ ocamlc -c myParser.mli$ ocamlc -c myParser.ml$ ocamlc -c myLexer.ml

Having compiled the lexer and parser into byte-code, we can create a custom top-level whichincludes their functionality:

$ ocamlmktop myParser.cmo myLexer.cmo -0 myparser.top


We can now demonstrate this lexer and parser working on an example file called "test.dat"which contains5 :

1 5.4 3.9 3.712 5.9 4.2 3.11 5.4 3.9 2.5

Running the custom top-level allows us to play with the lexer and parser. The following parsesthe "test.dat" file:

$ ./myparser.top# let lexbuf = Lexing. from_channel (open_in "test. dat") in

MyParser.main MyLexer.token lexbuf;;- : (int * float * float * float) list =

[(1,5.4,3.9,3.7); (12,5.9,4.2,3.1); (1,5.4,3.9, 2.5)J

Thus our lexer and parser have worked together to interpret the integer and floating-pointnumbers contained in this file as well as the structure of the file in order to convert theinformation contained in the file into a data structure which can then be manipulated byfurther computation.

Moreover, the lexer and parser help to validate input. For example, a file which erroneouslycontains letters is caught by the lexer. In this case, the lexer will raise an exception whichcontains a reference to the line number at which the error was noticed, as demonstrated in thepreceding section. A file which erroneously contains a floating-point value where an integerwas expected will be lexed into tokens without fault but the parser will spot the grammaticalerror and raise the Parsing. Parser_error exception.

In an application, the call to the main function in the MyParser module can be wrapped in atry . .. with ... to catch the Pars ing .Parser_error exception and handle it, most likelymaking use of the line variable in the MyLexer module to obtain the line number at whichthe grammatical error was noticed.

Currently, the input is expected to end with a new line. In practice, it may be useful to relaxthe parser to also allow a file ending directly with an EOF, without the CR. This may be doneby supplementing the main group with an extra rule:

main:atom CR main

{ $1 :: $3}atom EOF

{ [ $1 J }I EOF

{ [J };

As we have seen, the ocamllex and ocamlyacc compilers can be indispensable in both designing and implementing programs which use a non-trivial file format. These tools are likely tobe of great benefit to scientists wishing to create unambiguous, human-readable formats.

5Note that the last line ends with a new line before the EOF.

Chapter 6

Visualization

The ability to visualise problems and data can be of great use when trying to understanddifficult concepts and, hence, can be of great use to scientists. Perhaps the most obvioususe of computer graphics is the visualisation of atomic and molecular systems. However, agreat many other problems can also be elucidated through the use of real-time graphics. Inparticular, the ability to render animated 2D and 3D graphics can be exploited to improveupon current, made-for-print graph drawing applications.

In this chapter, we shall introduce a powerful library which can be used to render high-fidelity,real-time 2D and 3D graphics even on modest consumer hardware. We shall then introduceOCaml bindings to this library which provide access to its functionality from programs writtenin the OCaml language. Finally, we shall develop a graphical application written entirely inOCaml.

6.1 Overview of OpenGL

Over the years, several libraries have been developed with the intent of providing access topowerful computer graphics hardware whilst presenting a simple, easy-to-use interface to thefunctionality. However, only one library has become the de-facto standard for this task - theOpen Graphics Library (OpenGL) by Silicon Graphics Incorporated (SGI). In recent years,Microsoft have developed a competitor known as DirectX. However, although the capabilitiesof OpenGL and DirectX are similar, DirectX only works on Microsoft operating systems(i.e. Windows) whereas OpenGL is freely available for a wide variety of architectures andoperating systems, most notably Linux, Mac OS X and Windows.

In order to be able to render to the screen using OpenGL, a program must first acquire aresource known as a rendering context. Obtaining a rendering context directly is often trickyand the code required is specific to an OS. Consequently, the process of acquiring a renderingcontext is typically performed using a cross-platform library. The OpenGL Utility Toolkit(GLUT) is one such, freely available library which also provides access to input data fromthe mouse and keyboard.

Fortunately, Jacques Garrigue, Isaac Thotts and other authors have written OCaml bindingsto OpenGL (called lablGL), glut (called lablglut) and several other OpenGL-related libraries.We shall use these libraries in order to write OCaml programs which use OpenGL.

127

128 CHAPTER 6. VISUALIZATION

We shall now introduce a basic template program, based upon glut, which will be used as thefoundation for several visualisation programs written using OpenGL.

6.1.1 GLUT

A template program called "render.mI", which uses lablGL and lablglut, may be written1 :

letlet width = ref 640 and height = ref 480 inlet argv' = Glut.init Sys.argv inGlut.initDisplayMode ();Glut. initWindowSize -w: ! width -h: ! height;ignore (Glut. createWindow -title: "The window name");let set_projection w h = 0 inlet render 0 = 0 inlet reshape -w -h =

GIDraw.viewport 0 0 w h;set_projection w h;width := w; height := h;render 0 in

Glut. reshapeFunc - cb: (reshape) ;Glut. displayFunc - cb: (render) ;Glut.mainLoop 0

Two functions are currently missing, set_projection and render.

The set _proj ect ion function defines the two- or three-dimensional space visualised by therendering context. The render function is responsible for making the appropriate calls toOpenGL to render whatever is required.

This program may be compiled by supplying the lablGL and lablglut archives which it dependsupon, using the syntax described in chapter 2:

$ ocamlopt -I +lablGL lablgl.cmxa lablglut.cmxa render.ml -0 render

We shall derive working programs from this template program in the remainder of this chapter.In the mean time, let us dissect this template program.

The mutable width and height variables are used to store the current width and height of theOpenGL rendering context, measured in screen pixels. The init function in the Glut moduleis used to parse any command-line arguments pertaining to glut. The initDisplayModefunction can be used to request various properties for the rendering context:

• an alpha channel using the alpha optional argument - used for transparency.

• double buffering using the double_buffer optional argument - used to eliminate ffickering during animations.

IThis uses named arguments (of the form -name:value) which are described in the appendices, in section A.2, and the ignore function which simply ignores its argument and returns the value of type unit.

6.1. OVERVIEW OF OPENGL 129

• a depth buffer using the depth optional argument - used to make nearer objects obscure'farther objects regardless of the order in which they are drawn.

• a stencil buffer using the stencil optional argument - used to restrict rendering tocertain pixels.

In this case, none of the properties are requested.

The Glut. initWindowSize function is used to request the initial size of the rendering context.The Glut. createWindow function creates a window with the given title. The resulting windowcontains the OpenGL rendering context, which can then be used for visualisation.

Once this code has been executed, call-back functions can be given to glut in order to performarbitrary computations on demand, including rendering for visualisation. These call-backfunctions are executed by glut under appropriate circumstances, such as a key-press, mousemovement or required screen update. Several call-backs may be specified, the most importantof which are:

• Render the contents of the window: displayFunc

• Resize the window: reshapeFunc

• Handle a key-press: keyboardFunc

• Handle a mouse button press: mouseFunc

• Handle mouse movement while a mouse button is pressed: motionFunc

• Handle mouse movement when no mouse buttons are pressed: passiveMotionFunc

Each of these call-backs require specific arguments. For more details, consult the lablGLdocumentation.

Before continuing, let us fill in the set_proj ection and render functions in order to createa simple, working program.

The following implementation of the set_proj ection function sets up an orthogonal projection covering the two-dimensional space (0 ... W, 0 ... h) where w and h are the width andheight of the window in pixels, respectively:

let set_projection w h =GlMat.mode 'projection;GlMat .load_ identity 0 ;let w = float_of_int wand h = float_of_int h inGlMat.ortho -x: (0., w) -y: (0., h) -z: (0., 1.);GlMat .mode 'modelview in

The following implementation of the render function asks OpenGL to draw a red triangle ona light-green background:


Figure 6.1: Simple demonstration, rendering a triangle using OpenGL.

let render 0 =

GIClear.color (0.8,1.,0.8);GIClear.clear ['color];GIDraw.color (1., 0.,0.);GIDraw.begins'triangles;GIDraw.vertex -x:O. -y:O. 0;GIDraw . vertex -x: 100. -y: 200. 0;GIDraw.vertex -x:200. -y:O. 0;GIDraw . ends 0;Gl. flush 0 in

The result of this program is shown in figure 6.1. In the next section, we shall alter thisprogram to demonstrate ways that more complicated objects can be rendered in terms of theprimitives provided by OpenGL.

6.2 Basic rendering

Several decades ago, the raster display emerged as the dominant method for displaying computer graphics. These displays scan a two-dimensional surface in a characteristic pattern toproduce a graphical display. Ultimately, this is manifested by today's computers representingtheir display as a matrix of coloured dots called pixels. A wide variety of algorithms have beeninvented which determine pixels colours in order to produce meaningful or aesthetic results.The details of these algorithms can be phenomenally complicated, driven by desire for morecapable graphics.

In the quest for ever more sophisticated graphics, dedicated hardware was produced in orderto render graphics as quickly as possible. Thanks to the economies of mass production and theubiquity of computer games, highly sophisticated rendering hardware is now commonplace.However, such hardware does not support all of the methods of graphics rendering which havebeen developed over the past few decades. In this section, we shall examine the renderingprimitives which are supported by modern graphics hardware and OpenGL.

The information required to render an object can be split into geometry and fill. The geometryof the object defines the shape of the object in terms of OpenGL primitives. The fill determinesthe way in which pixels are coloured in. In the simplest case, the geometry of an object can

6.2. BASIC RENDERING

Lines Line strip Line loop246 246 246

III IW JH1

131

Triangles Trianglestrip

2 4 6 2 4 6

~, •a) 1 3 5

b) 1 3 5

135

135

135

Triangle

fan3 4

2_51 6

8 7

Quads Quad

strip2 3 2 4 6

I •c) 1 4 1 3 5

Polygon

3 4

2_51 6

d) 8 7

Figure 6.2: OpenGL primitives: a) lines, b) triangles, c) quadrilaterals, and d) convexpolygons.

be described by a set of triangles and the fill can be described by a single colour. Functionspertaining to simple geometry and fill specification axe encapsulated in the GlDraw module bylablGL.

6.2.1 Geometric primitives

Several geometric primitives axe provided by OpenGL. These primitives are points, lines,triangles, quadrilaterals (quads) and convex polygons. Of these primitives, lines, trianglesand quads may be rendered individually or in groups (illustrated in figure 6.2). Lines maybe rendered individually or as a "strip", a set of abutting lines, or as a "loop", a strip with aclosing line. Triangles may be rendered individually or as a "strip", where adjacent trianglesshaxe two vertices, or as a "fan", where the triangles all shaxe a single, common vertex. Quadsmay be rendered individually or as a strip.

The lablGL names given to these different primitives axe self-explanatory2:

• Points: 'points

• Lines: 'lines, 'line_strip, 'line_loop

2The ' preceding these names denotes a form of variant type called a polymorphic variant. These types havespecial semantics which we have not yet described and which are not important in this context. Polymorphicvariants are discussed in more detail in section A.8.

132

a)

CHAPTER 6. VISUALIZATION

Figure 6.3: A circular annulus drawn as a single triangle strip: a) rendered result, andb) the underlying geometry (a single triangle strip).

• Triangles: 'triangles, 'triangle_strip, 'triangle_fan

• Quads: 'quads, 'quad_strip

• Polygon: 'polygon

The example implementation of the render function drew a triangle by calling the beginsfunction in the GlDraw module, with the argument (triangles, and then supplied threevertices as two-dimensional coordinates using the vertex function in the same module, withthe optional arguments x and y.

The other primitives may be drawn by making similar calls to functions in the GlDraw module,beginning with an appropriate argument to begins, followed vertex definitions specified bycalls to the vertex, vertex2, vertex3 or vertex4 functions and delimited by a call to theends function. For example, the following implementation of the render function draws acircular annulus as a single triangle strip:

let strips = 72 inlet render () =

GlClear.color (0.6,1.,0.8);GlClear.clear ['color];GlDraw.color (0.5,0.,1.);GlDraw. begins' triangle_strip;for i = 0 to strips do

let pi = 4. *. atan 1. and i = float_of_int ~ ~n

let theta = 2. *. pi *. i /. float_of_int strips inlet x, y = sin theta, cos theta inlet xl, yl = 150. *. x +. 200., 150. *. Y +. 200. inlet x2, y2 = 200. *. x +. 200.,200. *. Y +. 200. inGlDraw. vertex -x: xl -y: yl ();GlDraw. vertex -x: x2 -y: y2 ();

done;GlDraw. ends 0;Gl. flush 0 in

6.2. BASIC RENDERING 133

The result is shown in figure 6.3a along with the skeleton of the geometry in figure 6.3b,showing the triangle strip. The underlying geometry is most easily visualised by specifying that solid primitives are rendered as outlines rather than being filled, using the callGlDraw.polygon_IDode 'both 'line.

Clearly, relatively simple programs can be used to visualise complicated geometries. Usefulinformation can also be conveyed using additional information, such as pixel colour.

6.2.2 Filling

The simplest form of filling simply fills any covered pixels with a constant colour, specified interms of red, green and blue components. The original example implementation of the renderfunction drew a red triangle by first specifying the colour using a call to the GlDraw. colorfunction with a 3-tuple specifying the red, green and blue components, respectively, as valuesof the type float in the range O... 1, Le. the colour red was obtained by specifying full redand no green or blue, giving the 3-tuple (1., 0., 0.). In the previous example, a mauveannulus was rendered by specifying the 3-tuple (0.5, 0., 1.), i.e. full blue, half red and nogreen.

OpenGL supports many, more sophisticated forms of filling including smooth shading (interpolating between different colours at different vertices), alpha blending (for transparency) andtexture mapping. For more information on these additional forms of filling see some of themany books on OpenGL [10].

6.2.3 Projection

The set_projection function shared by both of the previous examples used a call to theGlMat .ortho function to specify that the renderable area represented a two-dimensional space(0 ... W, O... h), Le. the units of the space are pixels. This is easily altered to represent differentregions of 2D space but OpenGL can also be used to visualise 3D spaces by projecting themonto the 2D space of the screen.

In order to use 3D rendering, the rendering context should possess a depth buffer. This canbe requested in the call to initDisplayMode:

Glut. initDisplayMode -depth:true ();

The set_projection function can then use 3D projection by:

• setting the properties of the perspective transform (the field-of-vision, the aspect-ratioand the depth of field) using the perspective function in the GluMat module.

• specifying the location, target and up-direction ofthe camera using the look_at functionalso in the GluMat module.

For example, the set_projection function could be altered to:


Figure 6.4: 3D perspectiveprojection of the circular annulus.

let set_proj ection w h =

GlMat.mode 'projection;GlMat .load_identity 0 ;let w = float_of_int wand h = float_of_int h inGluMat . perspective -fovy: 45.0 -aspect: (w /. h) -z: (0.1, 1000.);GluMat.look_at -eye: (3.,3., -5.) -center: (0., 0.,0.) -up: (0.,1.,0.);GlMat.mode 'modelview;GlMat . load_identity 0; in

The render function should also be altered to clear the depth buffer and enable depth testing.We shall first factor out a function to render the annulus:

let strips = 72 inlet render _annulus 0

GlDraw.begins'triangle_strip;for i = 0 to strips do

let pi = 4. *. atan 1. and i = float_of_int i inlet theta = 2. *. pi *. i /. float_of _int strips inlet x, y = sin theta, cos theta inList.iter GlDraw.vertex2 [1.5 *. x, 1.5 *. y; 2. *. x, 2. *. y]

done;GlDraw. ends 0 in

The render function may then be redefined as:

let render 0 =

GlClear.color (0.6, 1.,0.8);GlClear.clear ['color; 'depth];Gl.enable'depth_test;GlDraw.color (0.5, 0., 1.);render_annulus ();Gl.flush 0 in

The result of this 3D perspective view of the circular annulus is shown in figure 6.4.

6.2. BASIC RENDERING

6.2.4 Animation

135

In the current version of our OpenGL program, the render function is only called when necessary - when all or part of the window containing the rendering context becomes visible. Twofundamental alterations are required to produce animated graphics. Firstly, the animationneeds to be redrawn constantly. This is easily achieved by requesting that the window beredrawn when nothing else is happening, by specifying the idle call-back:

Glut.idleFunc -cb: (Some Glut.postRedisplay);

Secondly, flicker-free animation is typically achieved by using two buffers, only one of which isdisplayed at anyone time. Implementing this requires asking for a double buffered renderingcontext:

Glut. initDisplayMode -depth :true -double_buffer :true 0 ;

The render function must then be altered to swap buffers once rendering is complete. Thiscan be achieved by calling Glut. swapBuffers after the call to Gl. flush.

We shall begin by defining a time function before the definition of the new render function:

let time =

let start =Unix. gettimeofday 0 infun 0 -> Unix.gettimeofday 0 -. start in

This function returns the time in seconds since the program started. The render functionmay then be defined:

let render () =

GlClear.color (0.6,1.,0.8);GlClear.clear ['color; 'depth];Gl.enable 'depth_test;GlDraw.color (0.5 *. (1. +. sin(time 0)), 0., 1.);render_annulus ();Gl. flush 0;Glut. swapBuffers 0 in

The time function requires the gettimeofday function in the Unix module which, consequently, must now be specified when compiling:

$ ocamlopt -I +lablGL lablgl.cmxa lablglut.cmxa unix.cmxa render.ml -0 render

The resulting program produces a result similar to that shown in figure 6.4 but cycling throughthe colours blue and purple. Considerably more interesting animations can be obtained bytransforming objects.


6.3 Transformations

The task of transforming the vertices of an object, to shift Or stretch the object, can be rathertedious. More importantly, this task can also be very computationally expensive if millionsof vertices are in play. Fortunately, OpenGL provides a simple and efficient way to transformvertex coordinates. In particular, computations are automatically off-loaded onto dedicatedhardware when possible, greatly improving performance.

In OpenGL, transformation matrices are held on a stack. By adding a new transformationonto the stack, or removing the last transformation from the stack, transformations can beapplied hierarchically.

The pedagogical example of a hierarchy of transformations is the rendering of a robot arm.The arm might contain three joints, a shoulder joint at the base, an elbow and a wrist. Thebase of the arm is static and the upper arm, lower arm and hand are each affected by all jointrotations between them and the base. The upper arm is transformed only by the rotation atthe shoulder joint. The lower arm is transformed by the rotation at the elbow joint as well asthe shoulder joint. Finally, the hand is transformed by the rotations of the shoulder, elbowand wrist joints. This is a hierarchy of three rotations, one for each of the three joints, witheach rotation affecting the remainder of the arm.

The current object transformation matrix (known as the "model view" matrix in OpenGL)can be altered in the render function using the rotate, scale and translate functions in theGlMat module. The current matrix can also be copied onto the stack using the push functionand the last matrix moved back off the stack into the current matrix using the pop function.

For example, the animated annulus in the previous example may be made to spin by simplyaltering the current model view matrix using the rotate function. The resulting renderfunction must initialise the model view matrix to the identity matrix using the load_identityfunction, to remove the effects of any transformations left over from the previous frame ofanimation:

let render 0 =

GlMat.load_identity 0;GlMat.rotate-angle:C100. *. time 0) -x:O. -y:1. -z:O. 0;

The result is a spinning, coloured, circular annulus.

A more interesting result can be obtained by replacing the call to render_annulus with asequence of transformations and calls:

GlMat.scale -x:0.l-y:0.l-z:0.l 0;for i = 1 to 20 do

GlMat.rotate -angle: (-10. *. time 0) -x:0.25 -y:1. -z:O. 0;GlMat.translate-x:(-2.5) -y:O. -z:O. 0

done;for i = 1 to 40 do

GlMat.translate -x:2.5 -y:O. -z:O. 0;GlMat.rotate -angle: (10. *. time 0) -x:0.25 -y:1. -z:O. 0;render_annulus 0

done;

6.4. EFFICIENT RENDERING 137

a) b)

Figure 6.5: Two frames from an animated sequence of transformed circular annuli: a)0.8, and b) 1.7 seconds into the animation.

The second loop repeatedly translates and rotates the next object relative to the previousobject, accumulating a spiral of 40 objects. The first loop performs the reverse operation(effectively transforming by the inverse) 20 times, such that the animation is centred on the20th object. The result is the animated spiral of coloured, circular annuli shown in figure 6.5.

Note that each annulus is still flat: the apparent curvature of the spiral is an illusion caused bythe presence of a sufficient number of small segments (the same illusion which makes the annuliappear circular rather than polygonal). Also, note that reversing the transformation requiresthe individual matrix transformations to be specified in reverse order: A-IB-1BA = I.

As we have seen, even the simplest rendering techniques allow quite complicated objects tobe rendered (the last example contains (73 - 2) x 40 = 2,840 triangles). However, in orderto render more complicated objects in real time, more efficient approaches to rendering arerequired.

6.4 Efficient rendering

In the interests of correctness, the first version of any visualisation program should be writtenusing intermediate-mode rendering (Le. calls to the begins, vertex and ends functions in theGlDraw module). Once a working first version of the required program has been written, theprogram may be optimised by using more sophisticated approaches to rendering.

The main overhead of immediate mode rendering is the number of function calls required toperform any given rendering task. Thus, optimisations typically allow data to be conveyed tothe OpenGL more efficiently in order to reduce the number of function calls required.

Sequences of OpenGL calls, such as those used to define the vertices of polygons, may becached in display lists. The contents of display lists cannot be altered without rebuildingthe display list and, thus, display lists are ideal for static geometry. As a rule of thumb,performance is likely to increase when » 102 vertices are stored in a display list. Functionspertaining to display lists are encapsulated in the GlList module by lablGL.

For example, the circular annulus in the previous example is a static geometry requiring72 x 2 = 144 unique vertices. Thus, the rendering of an annulus, as performed by therender_annulus function, may be productively replaced with a display list. This can be


achieved by altering the render_annulus function to compile the calls required to define thegeometry in a display list, reusing the display list in future calls:

let annulus_list = ref None and strips = 72 inlet render_annulus 0 = match !annulus_list with

None -)annulus_list := Some (GlList.create 'compile_and_execute);GlDraw.begins'triangle_strip;for i = 0 to strips do

let pi = 4. *. atan 1. and i = float_of_int i inlet theta = 2. *. pi *. i I. float_of_int strips inlet x, y = sin theta, cos theta inList.iter GlDraw.vertex2 [1.5 *. x, 1.5 *. y; 2. *. x, 2. *. y]

done;GlDraw. ends 0;GlList.ends 0

Some 1 -) GlList. call 1 in

The annulus_list variable contains an optional reference to the display list holding thedefinition of the annulus. When first called, the render_annulus function creates a newdisplay list using the create function in the GlList module, storing the resulting list in themutable annulus_list variable. A sequence of OpenGL commands are then both compiledinto the display list and simultaneously executed, until the GlList. ends function is called.

Future calls to the render_annulus function find a display list in the annulus_list variableand execute this display list using the call function in the GlList module.

Alternatively, vertex data may be stored in vertex arrays. These may include vertex positions,normal-vectors, colours and texture map coordinates. Vertices are then referred to by theirindex in the vertex array. Rendering then requires a sequence of such indices to be supplied.In order to further reduce the number of function calls made, sequences of vertex indices maybe stored in index arrays. Functions pertaining to vertex and index arrays are encapsulatedin the GlArray module by lablGL.

For example, a vertex array containing the 144 two-coordinate vertices of the circular annulusmay be created using:

let strips = 72 inlet vertex_array =

let a = Array.make (strips*4) O. infor i=O to strips - 1 do

let j = i * 4 inlet pi = 4. *. atan 1. and i = float_of _int i inlet theta =2. *. pi *. i I. float_of _int strips inlet x, y = sin theta, cos theta ina. (j+O) <- 1.5 *. x; a. (j+1) <- 1.5 *. y;a.(j+2) <- 2. *. x; a.(j+3) <- 2. *. y;

done;Raw.of_float_array a -kind: 'double in

The render_annulus function may then be altered to enable vertex arrays and to refer tovertices in the array, rather than specifying their coordinates explicitly:

6.5. RENDERING SCIENTIFIC DATA 139

let render_annulus () =GlArray.enable 'vertex;GlArray. vertex' two vertex_array;GlDraw.begins'triangle_strip;for i=O to strips * 2 do

GlArray. element (i mod ((strips - 1) * 2)) ;done;GlDraw. ends 0;GlArray.disable 'vertex; in

The call to the enable function in the GlArray module enables vertex arrays. The call to thevertex function tells OpenGL where the vertex array is. Calls to the element function referto a vertex in the array, replacing calls to the vertex or vertex2 functions in the GlDrawmodule.

Alternatively, the sequence of vertex indices specified to the GlArray. element function maybe contained in an index array:

let index_array =let a == Array. init ((strips + 1) * 2) (fun i -> i mod (strips * 2)) inRaw. of _array a -kind: 'uint in

The triangle strip may then be rendered with a single call to the draw_elements function inthe GlArray module:

let render _annulus 0 ==

GlArray.enable'vertex;GlArray. vertex 'two vertex_array;GlArray. draw_elements' triangle_strip ((strips + 1) * 2) index_array;GlArray.disable 'vertex in

Unlike display lists, vertex arrays allow vertex data to be altered. Consequently, vertex arraysare most useful when rendering geometries which are constantly changing shape.

Having examined most of the fundamentals of rendering using OpenGL, we shall now developa program capable of rendering something useful.

6.5 Rendering scientific data

The ability to render simulated atomic structures can be useful in elucidating complicatedgeometric properties but can also be used as a diagnostic tool. In this section, we shall developa program capable of loading a one-component atomic structure from file and animating thestructure in real-time using OpenGL.

We shall begin by defining a lexer and parser to load the file format. The file format consistsof three numbers, the vector coordinates of an atom, on each line. Thus the lexer, called''lexer.mll'', generates FLOAT, CR and EOF tokens:


{

open Parserlet line = ref 1

{FLOAT(float_of_string (Lexing.lexeme lexbuf)) }{ EOF }{ failwith ("Mistake at line "~string_of_int ! line) }

}

let digit = [ '0' - '9' Jlet exponent = [ 'e' 'E' J [ '+'let floating = '-'? (digit+ '.'rule token = parse[, , '\t'J {token lexbuf}I '\n' { incr line; CR}I floatingI digit+I eofI

'-' J digi t+digit* I digiU '. ' digit+) exponent?

The parser, called ''parser.mly'', recognises these lines containing triples of numbers and produces a list of 3-tuples:

%token CR EOF%token <float> FLOAT%start main%type <(float * float * float) list> main%%atom: I FLOAT FLOAT FLOAT { $1, $2, $3 };main: I atom CR main { $1 :: $3} I atom EOF { [$lJ } I EOF { [J };

The main program, called ''render.ml'', begins with the usual preamble:

letlet width = ref 640 and height = ref 480 inlet argv' = Glut.init Sys.argv inGlut. initDisplayMode -depth:true -double_buffer: true 0;Glut.initWindowSize -w: !width -h: !height;ignore (Glut. createWindow -title: "Atomic visualisation");

Followed by a function to load the atomic coordinates as a list of 3-tuples, using the parser:

let atoms =Parser.main Lexer.token (Lexing.from_channel stdin) in

In order to centre the rendered object, the mean coordinate is computed using a fold:

let offset =let add (a, b, c) (d, e, f) = (a +. d, b +. e, c +. f) inlet n, (x, y, z) = let aux (n, t) r = (n+1, add t r) inList. fold_left aux (0, (0., 0., 0.)) atoms inlet n = float_of_int n in(-. x I. n, -. y I. n, -. z I. n) in

The set_projection function initialises the perspective transform to use a 45° field-of-visionlooking down the positive z-axis from the camera position (0,0, -100):

6.5. RENDERING SCIENTIFIC DATA

let set_projection w h ::;G1Mat.mode 'projection;G1Mat .load_identity 0 ;let w = float_of_int wand h ::; float of_int h inGluMat.perspective -fovy:45.0 -aspect:(w /. h) -z:(O.l, 1000.);GluMat.look_at

-eye:(O., 0., -100.) -center:(O., 0.,0.) -up:(O., 1., 0.);G1Mat.mode 'modelview;G1Mat.load_identity (); in

In order to time the animation, we reuse the previously defined time function:

let time =

let start::; Unix. gettimeofday 0 infun 0 -> Unix. gettimeofday 0 -. start in

141

The render function begins by rotating the object about the y-axis once the object has beencentred by translating by the mean coordinate:

let render 0 =

G1Mat.load_identity ();G1Mat.rotate -angle: (30. *. time 0) -x:O. -y:1. -z:O. 0;G1Mat.translate3 offset;

The colour and depth buffers are then cleared, giving a white background:

G1Clear.color (1.,1.,1.);G1Clear.clear ['color; 'depth];

The atoms are drawn as black points, five pixels in diameter, by iterating the vertex3 functionover the list atoms of atomic coordinates:

G1Draw.color (0., 0.,0.);G1Draw.point_size 5.;G1Draw.begins 'points;List.iter G1Draw.vertex3 atoms;G1Draw. ends 0;

Finally, the render function flushes the stream of OpenGL commands, causing the scene to berendered, and swaps buffers to display the result:

Gl.flushO;Glut. swapBuffers 0 in

The program ends with the previous definition of the reshape function, registering reshape,display and idle call-backs and entering the glut "main loop" to begin the animation:


1" .'

Figure 6.6: A frame from a smoothly animated 1Q4-atom model of amorphous silicon.

let reshape -w -h =

G1Draw.viewport 0 0 w h;set_projection w h;width := w; height := h;render 0 in

Glut.reshapeFunc -cb: (reshape);Glut.displayFunc -cb: (render);Glut. idleFunc -cb: (Some render);Glut.mainLoop 0

This program may be compiled using:

$ ocamllex lexer.mll15 states, 375 transitions, table size 1590 bytes$ ocamlyacc parser.mly$ ocamlopt -c parser.mli$ ocamlopt -c parser.ml$ ocamlopt -c lexer.ml$ ocamlopt -I +lablGL lablgl.cmxa lablglut.cmxa unix.cmxa parser.cmx lexer.cmxrender.ml -0 render

6.5. RENDERING SCIENTIFIC DATA 143

A snapshot of the animation resulting from the visualisation of a 104-atom model of amorphoussilicon is shown in figure 6.6.

Before concluding this chapter we should reiterate that the lablGL bindings to OpenGL forOCaml do not yet provide the safe execution environment offered by the core OCaml distribution. Indeed, the task of writing safe bindings to such an unsafe interface is quite formidable.Next, we shall examine program transformations which can improve performance.


Chapter 7

Optimization

Despite advances in computer technology, improving efficiency can still be desirable. This taskis greatly simplified by starting with a correct but inefficient version of a program. Comparedto other languages, the features of OCaml make it ideally suited for the rapid creation ofreliable programs. Once a program has been shown to function correctly, these programs canbe targeted for optimization.

This chapter examines some techniques which can be used to optimize OCaml code. Theoverall approach to whole program optimization is to perform each of the following steps inorder:

1. Profile the program compiled with automated optimizations and running on representative input.

2. Of the sequential computations performed by the program, identify the most timeconsuming one from the profile.

3. Calculate the (possibly asymptotic) algorithmic complexity of this bottleneck in termsof suitable primitive operations.

4. If possible, manually alter the program such that the algorithm used by the bottleneckhas a lower asymptotic complexity and repeat from step 1.

5. If possible, modify the bottleneck algorithm such that it accesses its data structures lessrandomly to increase cache coherence.

6. Perform low-level optimizations on the expressions in the bottleneck.

We shall now consider each of these processes in more detail.

7.1 Profiling

Before beginning to optimize a program, it is vitally important to profile the program runningon representative inputs in order to ascertain quantitative information on any bottlenecks

145

146 CHAPTER 7. OPTIMIZATION

in the flow of the program. For example, many scientific programs load data, perform acomputation and save the result. If the computation is trivial then most of the time will bespent loading and saving data. In this case, the I/O routines would be the best targets foroptimization (in particular, the file formats themselves), and not the computational routine.

The most useful form of profiling offered by OCaml is on native code compiled with ocamlopt.In this case, specifying the -p flag (at compilation and at linking) results in the generationof a "gmon.out" file after the resulting executable is run. This file can be interpreted by theGNU profiler gprof using the syntax:

gprof name >profile. txt

where name is the name of the executable. The resulting file "profile.txt" is split into threesections:

1. List of functions in the program in descending order of the time which was spent withinthe body of the function, Le. for a function !, this time excludes the time-spent in thebodies of any other functions which were called by !.

2. A hierarchical representation of the time taken by each function call made in the program.

3. A bibliography of function references.

For example, the following test program "sort.ml" loads file "input.dat" of numbers, sorts thenumbers and saves the result as a file "output.dat":

letlet infile = "input .dat" and outfile = "output. datil inlet data =

let ch = open_in infile inlet rec load 1 =

try load (float_of_string (input_line ch) .. 1)with End_oCfile -> 1 in

load [J inlet data = List. sort compare data inlet ch = open_out outfile inList. iter (fun x -> output_string ch (string_of_float x» data

This program can be compiled into a native-code executable called "sort" with code to performprofiling measurements inserted by the compiler using:

$ ocamlopt -p sort.ml -0 sort

When executed with a file "input.dat" containing 106 random numbers, this program runs 50%slower than without profiling, Le. than if the -p flag had not been specified:

$ ./sort

7.1. PROFILING 147

The gprof program uses the "gmon.out" and "sort" files to create a textual representation ofthe profiling information:

$ gprof sort >profile.txt

The same "sort" executable should be used to generate profiling information as was executedto create the "gmon.out" file. Failing to do so will result in misleadingly erroneous output.

This information can be quite lengthy, hence we have chosen to pipe the output into a file"profile.txt". In this case, we find the first section of the "profile.txt" to contain the followingprofile information:

Flat profile:

Each sample counts as 0.01 seconds.% cumulative

time seconds35.43 5.1318.54 7.8211.40 9.473.32 9.952.76 10.352.69 10.742.62 11.122.24 11.441.93 11.721.93 12.001. 76 12.261. 73 12.511.38 12.71

selfseconds calls

5.13 18372.69 186732521.65 10220.48 122094500.40 28590.39 10000010.38 1747620.33 122094390.28 3009500.28 4757120.26 88703220.25 122096260.20 12209439

selfs/call

0.000.000.000.000.000.000.000.000.000.000.000.000.00

totals/call name

0.00 mark_slice0.00 compare_val0.00 sweep_slice0.00 caml_oldify_one0.00 caml_oldify_mopup0.00 caml_format_float0.00 camlList__rev_merge_2750.00 caml_alloc_shr0.00 camlList__rev_merge_rev_2850.00 camlList__ chop_2670.00 caml_fl_merge_block0.00 caml_fl_allocate0.00 allocate_block

This section indicates the time spent within the body of each profiled function, Le. excludingthe time spent in child functions. In this case, the mark_slice function, part of the OCamlgarbage collector, is seen to have accounted for rv 35% the entire running time of the program,taking 5.13 seconds in total during 1837 calls to this function. Although this is interestinginformation, the second section typically provides more useful details.

The second section of the profiling information decomposes the time taken to execute theprogram in terms of the hierarchy of function calls made by the program. Each subsectionconcentrates on a different function, showing the functions which called it (including howmuch time was spent in them and how many times they called) above the function itself andthe functions called below. In this case, the first subsection concerns the caml_main function:

granularity: each sample hit covers 2 byte(s) for 0.07% of 14.48 seconds

index %time self children called name0.00 14.38 1/1 main [2]

[1] 99.3 0.00 14.38 1 caml_main [1]0.00 14.37 1/1 caml_start_program [3]0.00 0.01 1/1 caml_init_gc [60]0.00 0.00 1/1 caml_init_custom_operations [133]0.00 0.00 1/1 caml_init_ieee_floats [134]0.00 0.00 1/1 parse_camlrunparam [147]0.00 0.00 1/1 caml_init_signals [135]0.00 0.00 1/1 init_atoms [142]0.00 0.00 1/1 caml_executable_name [131]0.00 0.00 1/1 caml_sys_init [139]

-----------------------------------------------


In this case, the caml_main function is seen to have been called once, by the main function, and to have called several other functions. Of the functions called by camLmain, thecamLstart_program function accounted for virtually all of the time.

Later subsections, concerning functions which the programmer has control over, provide moreuseful information. The subsection concerning the camlList__sort_295 function is of particular interest:

[6] 67.4 0.030.00

9.734.76

1+951424 <cycle 1 as a whole> [6]349525 camlList__sort_295 <cycle 1> [13]

Firstly, note that the names of the OCaml functions have been mangled. In the namecamlList __sort_295, the prefix caml simply denotes a function generated from OCaml code,the List __ denotes a function from the List module, the sort is the name of the functionand the _295 is an internal index which allows OCaml to identify the precise function withinthe program (which may contain numerous functions called sort).

The function referred to as camlList __sort_295 is, in fact, not the List. sort function calledby our program but another function named sort which is nested within the List. stable_sortfunction. This information can, of course, be discovered by examining the second section ofthe profile, tracing the function calls made by the program.

In this case, we see that a function in the List module called sort accounted for 67.4% of thetotal running time of the program, having been called 349,525 times. The <cycle 1> refersto the fact that this function is one in a chain of functions calling each other. Examiningthe source code shows that the nested sort function is mutually recursive with a rev_sortfunction. Indeed, other sections of the profile provide detailed information on the breakdownof the time spent in these functions. Given that this program spends most of its time sorting,and not loading or saving data, the sort function is the most suitable target for optimisation.In this case, perhaps an array-based sort would be faster.

Profiling can be used to identify the performance critical portions of whole programs. Theseportions of the program can then be targeted for optimisation. Algorithmic optimisations arethe most important set of optimisations which can be applied.

7.2 Algorithmic optimization

As we saw in chapter 3, the choice of data structure and of algorithm can have a huge impacton the performance of a program.

In the context of program optimisation, intuition is often terribly misleading. Specifically,given the profile of a program, intuition often tempts us to perform low-level optimisations onthe function or functions which account for the largest proportion of the running time of theprogram. Counter intuitively, the most productive optimisations often stem from attempts toreduce the number of calls made to the performance-critical functions, rather than trying tooptimise the functions themselves. Thus, a programmer must always strive to "see the forestfor the trees".

7.3. LOWER-LEVEL OPTIMIZATIONS 149

If profiling shows that most of the time is spent performing many calls to a single functionthen, before trying to optimize this function (which can only improve performance by a constant factor), consider alternative algorithms and data structures which can perform the samecomputation whilst executing this primitive operation less often. This can reduce the asymptotic complexity of the most time-consuming portion of the program and is likely to providethe most significant increases in performance.

For example, if profiling a program indicates that 80% of the running time is spent searchingan array for a given element, altering the program to use a set data structure, instead of anarray, is likely to be a productive optimisation, as searching is O(n) for arrays and O(1n n) forsets (see table 3.3).

An extensive review of the performances of algorithms used in scientific computing is beyondthe scope of this book. The current favourite computationally-intensive algorithm used toattack scientific problems in any particular subject area is often a rapidly moving target.Thus, in order to obtain information on the state-of-the-art choice of algorithm it is necessaryto refer to published research in the specific area, or to web sites. When discussing particulartopics, we shall endeavour to reference recent research.

Only once all attempts to reduce the asymptotic complexity have been exhausted should otherforms of optimization be considered. We shall consider such optimizations in the next section.

7.3 Lower-level optimizations

Cache coherency should be considered carefully as the cost of cache-misses can be over anorder of magnitude in performance. There are two principal optimizations to improve cachecoherency. The first is to alter the algorithm such that the asymptotic complexity is preservedwhilst reducing the randomness in accesses to data structures. The second is to use datastructures which consume less memory and, therefore, are more likely to fit into caches. Bothof these optimizations require careful selection of the data structure. We shall now quantifysome of the differences in performance between the built-in data structures.

7.3.1 Benchmarking data structures

Measuring the performance of operations over the most common data structures can be aproductive way to obtain quantitative information which can then be used to justify designdecisions objectively when programming. However, measuring the performance of functionsin any setting, other than those in which the functions are to be used in practice, can easilyproduce misleading results. Although we have made every attempt to provide independentperformance measurements, effects such as the requirements put upon the garbage collectorby the different algorithms are always likely to introduce systematic errors. Consequently,the performance measurements which we now present must be regarded only as indicativemeasurements.

Common operations over data structures include:

• the construction of a data structure containing a given number of elements n (e.g. usingthe Array.init function),

-

150

1092 t-18-19-20 -.-----· --._------------21-22 • I '"

-23 ....- ..-,------...--- _ow

-24-25

CHAPTER 7. OPTIMIZATION

Array

List

Set

2 4 6 81092 n

10 12 14 16 18 20

Figure 7.1: Measured performance (time t in seconds per element) of the creation oflists,arrays and set data structures as a function of the number of elements n.

.............-.......L...",.- __.--... 1fI~·-..-.... ~·-.__...-..-...--~....----------

1092 t-18-19-20-21-22 ...-23 '-. • ."... .I•••• • • ......._--J-24 .......- .......- ......,.-25

Array

List

Set

2 4 6 81092 n

10 12 14 16 18 20

Figure 7.2: Measured performance (time t in seconds per element) ofmap functions overlist, array and set data structures containing n elements.

1092 t-18-19-20-21-22 ...-23 • •....

• .... lIIll-24 .. .........-.....-25

2 4 6 8

~--1092 n

10 12 14 16 18 20

Array

List

Figure 7.3: Measured performance (time t in seconds per element) of the left-fold functions over list and array data structures data structures containing n elements.


Array

List

Set

/'.'"• L ....~... . ,'"... ...-

-. 111:, p~........._-II ...~. _ ..........e............. ~-.............. ------

....----·.-...__11".

1092 t-18-19-20-21-22-23-24-25

2 4 6 81092 n

10 12 14 16 18 20

Figure 7.4: Measured performance (time t in seconds per element) of the right-fold functions over list, array and set data structures containing n elements.

• mapping onto a new data structure (e.g. using the List .map and Array.map functions),and

• folding a function over the elements in a data structure (e.g. using the higher-orderList .fold_left function).

In order to give some idea of the relative performance of these common operations whendealing with the most common data structures - lists, arrays and sets - we have measuredthe performance of these operations in artificial settings. In all cases, the data structurescontain elements of type float. The set data structure was created as:

module Key = struettype t = flo atlet compare i j = if i -. j < O. then -1 else if i = j then 0 else 1

end

module Set = Set. Make (Key)

This data structure may then be compared against lists and arrays.

The Array. ini t function, used to create arrays, was matched by an equivalent, tail-recursive1

list_init function to create lists:

# let list_init n f =

let ree aux n 1 = if n < 0 then 1 else aux (n - 1) (f n " 1) inaux (n-1) [];;

val list_init : int -> (int -> 'a) -> 'a list = <fun>

and a tail-recursive set_ini t function to create sets:

# let set_init n f =

let ree aux n s = if n < 0 then s else aux (n - 1) (Set. add (f n) s) inaux (n - 1) Set. empty; ;

val set_init : int -> (int -> Set.elt) -> Set.t = <fun>

ITail recursion is the most important low-level optimisation and is discussed in section 7.3.3.1.


The measured performance of these functions when used to create data structures of differentsizes is shown in figure 7.1. Array initialisation is fastest, followed by list initialisation and,finally, set initialisation.

Array initialisation consists of allocating the whole array followed by filling in the elementssequentially. Consequently, array initialisation becomes increasingly efficient for larger n, dueto cache coherency and the lessening significance of the initial allocation.

In contrast, list initialisation requires each element to be allocated individually, to create a2-tuple of the head of the list and a reference to the tail. As a result, list initialisation istypically 4- to 6-times slower than array initialisation, dominated by the cost of allocation,and the time taken per element is roughly constant (independent of the size of the final list).

The efficiency of set initialisation worsens for larger n, as expected from the O(ln n) asymptoticcomplexity ofelement insertion. However, considering the substantially more complicated datastructure (a balanced binary tree) which underpins sets, the performance of set creation issurprisingly efficient compared to list creation.

The Array.map and non-tail-recursive List .map functions were used with an equivalent, tailrecursive set_map function for sets:

# let set_map f s = Set. fold (fun e s -> Set. add (f e) s) s Set. emptyval set_map: (Set. elt -> Set. elt) -> Set. t -> Set. t = <fun>

to measure performance of map operations (illustrated in figure 7.2). Overall, the measurements are similar to those of creation as the performance is dominated by the cost of allocatinga new data structure. However, two important features are clearly visible. Firstly, the performance of the List .map function worsens considerably for n > 216 . This is due to thenon-tail-recursive nature of this function incurring a significant performance cost for deeprecursion. In contrast, the tail-recursive set_map function does not incur this cost and, consequently, actually outperforms the List. map function for n > 219 despite the additionalsophistication of the data structure. Secondly, the performance of the Array.map functionexhibits transients at n::::: 213 and 220. These sudden drops in performance can be attributedto machine-dependent cache effects.

The Array. fold_left and tail-recursive List. fold_left functions were used to measure theperformance ofleft-folds over array and list data structures (illustrated in figure 7.3), respectively, accumulating the sum of the elements in the data structures. As these operations donot require the creation of new data structures and, therefore, do not require much allocation,the performance of the operation for lists is much closer to that for arrays compared to thetwo previous operations. Specifically, the performance is insignificantly different in the range50 < n < 1000. However, the additional memory requirements of lists results in the performance loss due to cache effects appearing at slightly small n, at n ::::: 213 compared to n ::::: 215

for arrays.

The Array. fOld_right, the non-tail-recursive List. fOld_right and the tail-recursive Set. foldfunctions were used to measure the performance of right-folds over array, list and set datastructures (illustrated in figure 7.4), respectively, again accumulating the sum of the elementsin the data structures. As for the left-fold measurements, these right-fold results did notrequire the allocation of new data structures. However, the non-tail-recursive nature of the


List. fold_right function incurs an increasing performance cost for n > 211 , resulting inList. fold_right being 35-times slower than Array. fold_right for n ~ 220 .

Note that, considering the sophistication of the underlying balanced binary tree data structure,folds over sets are remarkably fast.

These benchmark results may be used as reasonably objective, quantitative evidence to justifya choice of data structure. We shall now examine other forms of optimisation, in approximatelydecreasing order of productivity.

7.3.2 Automated transformations

The first optimizations to try are automated optimizations because these require little work.We shall now discuss options given to the compiler which affect performance and a wholeprogram transformation known as defunctorizing.

7.3.2.1 Compiler optimizations

The most obvious compiler optimization is to use the native-code compiler rather than thebyte-code compiler. This typically results in code executing three times as quickly (often morein the case of numerically intensive programs).

The native-code compiler also understands three flags (presented to it on the command line)which affect performance:

• The -unsafe flag removes bounds checking when accesses are made to array elements.

• The -inline n flag controls the aggressiveness of the compiler's inlining of non-recursivefunctions. Specifying a larger integer n causes larger functions to be inlined. The defaultis n = 1.

• The -noassert flag causes assertions to be skipped when compiling.

Removing bounds checking can increase performance by up to 15% but at the severe cost ofrendering an OCaml program unsafe to run. Consequently, turning bounds checking off is notrecommended.

Inlining a function explicitly substitutes a called function into the body of the caller. Thisremoves the overhead of making the function call and can facilitate better optimization of theresulting machine code but often at the cost of increasing the amount of code and, therefore,reducing program-cache coherence. Consequently, more aggressive inlining can increase ordecrease performance.

Assertions, of the form assert epred), verify the given predicate pred evaluates to true,raising the Assert_failure exception if the predicate is false. This a useful way to performrun-time sanity checks which can all be removed to improve performance by compiling withthe -noassert flag.


7.3.2.2 Defunctorizing

The current OCaml native-code compiler does not inline functions from functors. Performingthis inlining can provide a significant increase in performance (typically by less than a factorof two but, in a contrived example, by a factor of over ten times). This optimization canbe performed by feeding the source code through a defunctorizer, such as the freely availableocamldefun program before compiling.

7.3.3 Manual transformations

As a last resort, program transformations performed manually should be considered as ameans of optimization. We shall now examine several different approaches. Although we tryto associate quantitative performance benefits with the various approaches, these are onlyindicative and are often chosen to represent the best-case.

7.3.3.1 Tail-recursion

Straightforward recursion is very efficient when used in moderation. However, the performanceof deeply recursive functions can suffer. This can be seen on the performance measurementsfor the non-tail-recursive List .map and List . fold_right functions (shown in figures 7.2 and7.4, respectively) at large n. Indeed, for n > 218 the performance is dominated by the costof recursion in both cases. Performance degradation due to deep recursion can be avoided byperforming tail recursion.

If a recursive function call is not tail recursive, state will be stored such that it may be restoredafter the recursive call has completed. This storing, and the subsequent retrieving, of state isresponsible for the performance degradation.

Tail recursion involves writing recursive calls in a form which does not need this state. Mostsimply, a tail recursive call returns the result of the recursive call directly, Le. without performing any computation on the result.

For example, the fOld_range function from page 36 could have been defined as:

# let rec fold_range1 f accu 1 u =if 1 int -> 'a) -> 'a -> int -> int -> 'a = <fun>

but was actually defined a..c;:

# let rec fold_range2 f accu 1 u =if 1 int -> 'a) -> 'a -> int -> int -> 'a = <fun>

These fold_range1 and fold_range2 functions produce the same results:

# fold_range1 (fun t h -> h :: t) [] 0 10;;- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]# fold_range2 (fun t h -> h :: t) [] 0 10;;- : int list = [0; 1; 2; 3; 4; 5; 6; 7; 8; 9]


. .I •

-. r;#J e:-: •.II II· ...~:a-=

• •• --III ml,,[hA1III II • II .... IlIIIli

• • • .,.•••• II

.:-......... II'" IlAtII \.. II:.. .,..-....g..,l11611

t1.x10-4

0.5x10-4

0.25x10-4

0.75x10-4

n2000 4000 6000 8000

a)

t0.1

0.08 .."..0.06 - fold_range1

0.04

0.02II ...................

fold_range2~ -.. ............

n2.5x105 5.x105 7.5x105 10.x105

b)

Figure 7.5: Measured performance of the non-tail-recursive fold_range1 and tailrecursive fold_range2 functions summing n integers in t in seconds, showing: a) thenon-tail-recursive form is rv 15% more efficient for small ranges (n < 212), and b) thetail-recursive form is 5.7x more efficient for large ranges (n > 215).

However, the fold_range1 function acts upon the result of its recursive call by passing theresult as an argument to the function f. Conversely, the fold_range2 function uses the resultof the call to f as an argument to the recursive call, returning the result of the recursive callwithout acting upon it. This difference is due to the fold_range1 function counting upwards(i.e. applying f to 1 first) whereas the fold_range2 function counts downwards (i.e. applyingf to u-1 first). Thus, the fold_range1 function is not tail recursive whereas the fold_range2function is tail recursive.

As we have seen, tail recursiveness affects performance. Performance measurements for thefold_range1 and fold_range2 functions are illustrated in figure 7.5. In this case, as for mostother functions, the non-tail-recursive form is slightly faster for shallow recursions (small n)whereas the tail-recursive form is considerably faster for deep recursions (large n).

Tail-recursion optimisations lead to several high-level code transformations which can considerably improve performance for large inputs. Specifically, the rev, rev_map, rev_map2and rev_append functions in the List module are all tail-recursive and can, therefore, beused to form replacements for the non-tail-recursive append, concat, flatten, map, map2,fOld_right, fold_right2, remove_assoc, remove_assq, split, combine and merge. Weshall now examine some code transformations which can improve performance on large input


lists.

Most simply, applying an operator which is both associative and commutative over a list canbe done equivalently using fold_left or fOld_right. As the latter is not tail recursive, thefold_left function should be preferred. For example, as integer addition commutes, thefollowing are equivalent:

# let sum1 = List. fold_left ( + ) 0;;val sum1 : int list -> int = <fun>#letsum21=List.fold_right(+) 10;;val sum2 : int list -> int = <fun>

The tail-recursive sum1 function will be significantly more efficient than the non-tail-recursivesum2 function when 1 has a large number of elements (i.e. » 103 ).

The humble map function is another useful, but not tail-recursive, function. For long lists,the map function may be replaced by the tail-recursive rev _map function, which maps onto areversed list, followed by the rev function, if necessary. For example, the following functionwill produce the same results as the map function (provided the function f being applied isorder-independent, e.g. if f is purely functional) but is considerably more efficient for longlists:

# let map_tr f 1 = List. rev (List. rev_map f 1);;val map_tr: (, a -> 'b) -> 'a list -> 'b list = <fun>

As a slightly more sophisticated example, the List. flatten function is not tail recursive andmay be considered equivalent to:

# let flatten 1 = List. fold_right (fun 1 accu -> 1 @ accu) 1 [J ; ;val flatten: 'a list list - > 'a list = <fun>

This may be replaced with a tail-recursive version which builds the result up in reverse orderusing rev_append before reversing the result using rev:

# let flatten_tr 1 =let aux accu 1 = List. rev_append 1 accu inList.rev (List.fold_left aux [J 1);;

val flatten_tr: 'a list list -> 'a list = <fun>

For example:

# flatten_tr [[1; 2; 3J; [4; 5; 6J; [7; 8; 9JJ;;- : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9J

This is useful when inlining sequences, for example when flattening a hierarchical data structure. However, note that the depth of recursion in the non-tail-recursive implementation ofthe flatten function is due to the fOld_right and, therefore, is equal to the length of theinput list, not the length of the output list. Thus, the effects of tail recursion on practicaluses of the flatten function are likely to be less significant than for simpler functions, such asmap and fOld_right. In general, tail recursion is most important when the depth ofrecursionhas the same complexity as the algorithm itself. For example, an O(n) algorithm with In ndeep non-tail recursion is not likely to suffer from performance degradation. The Set. foldfunction and the implementation of the discrete wavelet transform presented in section 10.5are both examples of O(n) algorithms with Inn deep non-tail recursion.


Input Temporary Output Input Output

10 H flo H f(fl o) I 10 ~ f(fl o) I11 H fl 1 H f(f1 1) I 11 ~ f(fl 1) I12 H fl 2 H f(fl 2) I 12 ~ f(fl 2 ) I13 H fl 3 H f(f1 3) I 13 ~ f(f1 3 ) I14 H fl 4 H f(fl 4) I 14 ~ f(fl 4) I• • • • •• • • • •• • • • •

1n-1 H fl n-1 H f(fln_l) I 1n

_1 ~ f(fln_l) IList.map f (List.map f 1) List.map (fun e -> f (f e» 1

Figure 7.6: Deforestation refers to methods used to reduce the size of temporary data,such as the use of composite functions to avoid the creation of temporary data structures illustrated here: a) mapping a function f over a list 1 twice, and b) mapping thecomposite function f 0 f over the list 1 once.

7.3.3.2 Deforesting

Functional programming style often results in the creation of temporary data due to therepeated use of maps, folds and other similar functions. The reduction of such temporary datais known as deforestation. In particular, the optimization of performing functions sequentiallyon elements rather than containers (such as lists and arrays) in order to minimize the numberof temporary containers created (illustrated in figure 7.6).

For example, the Shannon entropy H of a vector v representing a discrete probability distribution is given by:

n

H(v) = L: vdn IVili=1

This could be written in OCaml by creating temporary containers, firstly Ui = In Vi and thenWi = UiVi and finally calculating the sum H(v) = L:.:i Wi:

# let entropy1 v =let u = List .map log v inlet w = List.map2 (*. ) v u inList.fold_left(+.) O. w;;

val entropy1 : float list -> float = <fun>

This function can be completely deforested by performing all of the arithmetic operations atonce for each element, avoiding the use of the temporary containers u and w:

158

-5

-10

-15

-20

,"2 4 6 8 10 12 14.;e· "!.§...

,;II'" •.#1/"" ,.,,' 't1'

._- ~Ill

JI'~.~.

~I."

Jtrf"~••~.#I

CHAPTER 7. OPTIMIZATION

entropy1

entropy2

Figure 7.7: Measured performance of the entropy1 and entropy2 functions computingthe Shannon entropy of arrays of n floating-point numbers, showing time taken t inseconds.

# let entropy2 v =List.fold_left (fun h v -> h +. v *. log v) o. v;;

val entropy2 : float list -> float.=: <fun>

The measured performance of the entropy1 and entropy2 functions is shown in figure 7.7.The entropy2 function is rv 35 times faster for n ~ 106 .

7.3.3.3 Terminating early

Algorithms may execute more quickly if they are allowed to terminate prematurely. However,the trade-off between any extra tests required and the savings of exiting early can be difficult topredict. The only general solution is to try premature termination when performance is likelyto be enhanced and revert to the simpler form if the savings are not found to be significant.We shall now consider a simple example of premature termination as found in the core libraryas well as a more sophisticated example requiring the use of exceptions.

The for_all function in the List module applies a predicate function to elements in a list,returning true if the predicate was found to be true for all elements and false otherwise.Note that the predicate need not be applied to all elements in the list, as the result is knownto be false as soon as the predicate returns false for any given element. In the core library,this function is implemented as:

# let rec for _all p = function[] -> true

I a::l->p a&&for_all p 1;;

The premature termination of this function is not immediately obvious. In fact, the && operator has the unusual semantics of in-order, short-circuit evaluation. This means that theexpression p a will be evaluated first and only if the result is true will the expression for_allp 1 be evaluated. Consequently, this implementation of the for_all function can return falsewithout recursively applying the predicate function p to all of the elements in the given list.


When using higher-order functions, such as folds, algorithms can no longer be prematurelyterminated in this way, Le. by not recursing. The solution is to escape from the higher-orderfunction by raising an exception. For example, the for_all function could be written in termsof iter:

# exception Finished;;exception Finished# let for _all p 1:=

tryList. iter (fun e -> if not (p e) then raise Finished) 1;true

with Finished -> false; ;val for _all: (, a -> bool) -> 'a list -> bool := <fun>

This implementation ofthe for_all function tries to apply the predicate to all of the elementsof the given list without any applications returning false. If this is achieved then true isreturned. Otherwise, the Finished exception will be raised when the predicate function firstreturns false. This exception will be caught by the try construct and the function returnsfalse.

Effectively, this use of exceptions allows functions to "escape" from deep recursions. This useof exceptions is both generally applicable and useful.

7.3.3.4 Specializing data structures

A trade-off often exists between the genericity and the efficiency of data structures and functions. This section concentrates on the performance impact of generic data structures. Thenext section deals with generic (polymorphic) functions.

The humble mathematical vector, for example, is ubiquitous in numerical applications. However, several different data structures can be used to represent a vector. As always, the choiceof data structure can have a strong effect on the performance of the resulting program.

For example, if a vector is represented by a float list, the cross product may then be writtensuch that it raises an exception if called with vectors of any dimensionality except 3:

# let vec_crossl a b := match (a, b) with([xl; yl; zlJ, [x2; y2; z2J)->

[yh.z2 -. zh.y2; zh.x2 -. xh.z2; xh.y2 -. yh.x2JI _ -> invalid_arg "vec_cross";;

val vec_crossl : float list -> float list -> float list = <fun>

Alternatively, for a program which only uses 3D vectors (such as a particle simulation) wemay have chosen to represent a 3D vector using the record type:

# type vec3 = {x:float; y:float; z:float};;type vec3 := { x : float; y : float; z : float; }

The cross product may then be written such that it is always valid:


# let vec_cross2 {x=x1; y=y1; z=z1} {x=x2; y=y2; z=z2} ={x=yh.z2-. zh.y2;y=zh.x2-. xh.z2;z=xh.y2-. yh.x2};;

val vec_cross2 : vec3 -> vec3 -> vec3 == <fun>

Essentially, the OCaml type checker verifies at compile-time that the vectors passed to thevec_cross2 function will always have three elements. Therefore, this need not be checkedat run-time, avoiding some computation. Consequently, the vec_cross2 function is rv 45%faster than the vec_cross1 function2 •

7.3.3.5 Avoiding polymorphic numerical functions

In OCaml, the creation and use of polymorphic functions (generic over the types which theycan handle) can be subtle. Avoiding polymorphic functions in the primitive operations ofnumerically intensive algorithms can significantly improve performance.

For example, consider two functions which add and multiply the elements in an array offloating-point values, respectively:

# let sum1 a =let r == ref O. infor i = 0 to Array. length a - 1 do

r:=!r+.a.(i)done;!r; ;

val sum1 : float array -> float == <fun># let product1 a ==

let r == ref 1. infor i = 0 to Array. length a - 1 do

r:=!r*.a.(i)done;!r; ;

val product1 : float array -> float = <fun>

The common, higher-order fold_left function can be factored out from sum1 and product1:

val fold_left: ('a -> 'b -> 'a) -> 'a -> 'b array -> 'a

When written in terms of fOld_left, the sum and product functions may be written moreconcisely:

# let sum2 = Array. fold_left ( +. ) O.and product2 == Array. fold_left ( *. ) 1.;;

val sum2 : float array -> float == <fun>val product2 : float array -> float == <fun>

Clearly, this has significantly reduced the amount of code required to provide the specifiedfunctionality.

2This optimisation is facilitated by the unboxing of the vec3 type, as we shall see in section 7.3.3.6.


sum2

sum1

.....

n0.25x106 0.5x106 0.75x106 1.x106

...

t0.05

0.04

0.03

0.02

0.01E_..:.iP"'-

a)

product1

...../J"/ ..'

~---- .-../""""... .:..----....!J,--~

n0.25x106 0.5x106 0.75x106 1.x106

0.01 product2

t0.05

0.04

0.03

0.02

b)

Figure 7.8: Measured performance of the sum and product functions when applied toarrays of n floating-point numbers, showing time taken t in seconds for: a) the sumfunctions, and b) the product functions.

However, the fold_left function is polymorphic. Currently, the OCaml compiler only generates generic implementations of polymorphic functions. Consequently, the fold_left functioncontains dispatch code to perform the task appropriately for any given type (although the typeis always float in this case).

Also, the current OCaml compilers do not inline functions which are passed as arguments tohigher-order functions. Consequently, trivial functions passed to Array. fold_left cannot beinlined and the resulting function call can be a significant overhead.

These overheads result in the sum2 and product2 functions executing significantly moreslowly than the sum1 and product1 functions. The measured performance of the sum1, sum2,product 1 and product2 functions is shown in figure 7.8. The polymorphism-free sum1 andproduct1 functions are rv 50% faster than the polymorphic-fold-based sum2 and product2functions. Thus, polymorphic functions should not be used in the performance-critical partsof programs.

7.3.3.6 Unboxing data structures

Typically in functional languages, most data structures are boxed. This means that datastructures are stored as a reference to a different piece of memory. Although elegant, boxing


Figure 7.9: As data structures are boxed by default, an array of complex numbers Zi =xi+iYi stored as a (float * float) array is actually represented by an array ofpointersto pairs of pointers to floating-point numbers.

Figure 7.10: The ocamlopt compiler unboxes records with all fields of the type float.Consequently, an array ofcomplex numbers stored as a Complex. t array is more efficientthan a (float * float) array.

can incur significant performance costs.

For example, much of the efficiency of arrays stems from their elements occupying a contiguousportion of memory and, therefore, accesses to elements with similar indices are cache coherent.However, if the array elements are boxed, only the references to the data structures will be ina contiguous portion of memory (see figure 7.10). The data structures themselves may be atcompletely random locations. Consequently, cache coherency may be very poor.

Fortunately, ocamlopt will not box values of many types, including float array, decreasingmemory requirements and greatly improving performance. Consequently, in performancecritical code, unboxed data structures should be used in preference to boxed data structures.

For example, computing the product of an array of complex numbers is unnecessarily inefficientwhen the numbers are represented by a (float * float) array (see figure 7.9). A Complex. t

• • •

Figure 7.11: The ocamlopt compiler unboxes values of the type float array. Consequently, an array of complex numbers stored as a float array of alternate real andimaginary values is more efficient than a Complex. t array.


array is more efficient (see figure 7.10), where Complex. t is defined in the Complex moduleof the core library as the record type:

type t -= { re ; float; im : float; }

A function to compute the product of an array of complex numbers might be written overthe type (float * float) array, where each element is a 2-tuple representing the real andimaginary parts of a complex number:

# let c_prodl a =letz=ref (1.,0.) infor i=O to Array. length a - 1 do

z := match !z, a. (i) with (rel, im!), (re2, im2) ->rel *. re2 -. iml *. im2, rel *. im2 +. iml *. re2

done;! z; ;

val c_prodl : (float * float) array -> float * float = <fun>

A more efficient alternative may be written over the type Complex. t array:

# let c_prod2 a =let z = ref Complex. one infor i=O to Array.length a - 1 do

z := Complex.mul !z a. (Ddone;(lz).Complex.re, (lz).Complex.im;;

val c_prod2 : Complex. t array -> float * float = <fun>

An even more efficient alternative may be written by storing the array of n complex numbersin a float array with 2n elements (see figure 7.11):

# let cprod3 a =let z_re = ref 1. in let z3m = ref O. infor i=O to Array.length a / 2 - 1 do

z_re := !z_re *. a. (i * 2) -. lz_im *. a. (i * 2 + 1);z_im := !z_re *. a. (i * 2 + 1) +. !z_im *. a. (i * 2);

done;!z_re, !z_im;;

val c_prod3 : float array -> float * float = <fun>

Measuring the performance of the c_prodl, c_prod2 and c_prod3 functions (illustrated infigure 7.12) shows that the c_prod2 function is rv 33% faster than the c_prodl function andthe c_prod3 function is rv 28% faster than the c_prod2 function.

Having examined the many ways OCaml programs may be optimised, we shall now reviewexisting libraries which may be of use to scientific programmers before presenting a variety ofexample functions and programs.


t0.070.060.050.04

0.030.020.01

-.. -.-..... . .....• D rl'.

... ""...-.... -... ...l1li- ,JI.. . •..- ........ . .

~... . .

• • • ......... JI" • ......• .. -. •• rtfJ .... •

• .."" "'..L -.". ........""oIl; .-~ ....

",. ••\01'01'" •

~~...-n

1x105 2x105 3x105 4x105 5x105

c-prod1

c-prod2

c-prod3

Figure 7.12: Measured performance of the c_prodl, c_prod2 and c_prod3 functions computing the product ofarrays ofn complex numbers showing time taken t in seconds.

Chapter 8

Libraries

8.1 Command-line arguments

Many programs, particularly under Unix, are designed to be run from the command line.Such programs typically allow arguments to be passed on the command line. For example,the ocamlopt compiler allows several flags such as -p and -unsafe, as we have seen:

$ ocamlopt -p -unsafe test.ml -0 test

The Arg module can be used to parse such command-line arguments and, therefore, canbe useful when writing controllable programs. This module is described in detail in theOCaml reference manual [2] but we shall provide an overview and examples here because thisfunctionality is often required in scientific programs.

Parsing is performed by the parse function in the Arg module:

val parse;(Arg.key * Arg.spec * Arg.doc) list ->Arg. anon_fun -> Arg. usage_msg -> unit

Command-line arguments may be specified using a hyphen, known as keyword arguments.For example, the -unsafe argument to the ocamlopt compiler is a keyword. Such namedarguments mayor may not be followed by associated information. For example, the -inlineargument to the ocamlopt compiler is followed by an integer.

Command-line arguments may also be specified without a keyword, known as anonymous arguments. For example, the file names to be compiled by ocamlopt are specified as anonymousarguments, such as test .ml in the above example.

The first argument to the parse function is a list of 3-tuples specifying the keyword, actionspecification and description of each named command-line argument. The second argumentspecifies the function used to handle anonymous command-line arguments (i.e. those withouta keyword). The final argument specifies the usage message, printed if invalid command-linearguments are given to the program.

In the type of the parse function, the types key, doc and usage_msg are all string and thetype spec is:

165

166

type spec =Unit of (unit -> unit)Boolof (bool -> unit)Set of bool refClear of bool refString of (string -> unit)Set_string of string refInt of (int -> unit)Set_int of int refFloat of (float -> unit)Set_float of float refTuple of Arg. spec listSymbol of string list * (string -> unit)Rest of (string -> unit)

CHAPTER 8. LIBRARIES

This spec type allows appropriate actions to be specified for a comprehensive selection ofnamed arguments. The way in which the Arg module is used to parse command-line argumentsis best understood by example.

Consider a simple program "test.ml" which prints xY for some specified x and y. The argumentsx and y are most easily specified as anonymous arguments:

let x, y =let input = ref [J inlet usage = "Usage: test <x> <y>" inArg.parse [J (fun x -> input := x :: !input) usage;match! input with

[y; xJ -> float_of_string x, float_of_string yI _ -> invalid_arg usage in

Note that each new anonymous argument is prepended onto the string list referenced byinput and, consequently, appear in reverse order in the pattern match.

This program can be compiled and executed quite simply because the Arg module is providedin the core OCaml distribution:

$ ocamlopt test.ml -0 test$ ./test 2 38.

The Arg module also provides a -he1 p argument which causes the program to print itscommand-line options and exit:

$ ./test -helpUsage: test <x> <y>

-help Display this list of options--help Display this list of options

For a more sophisticated example, consider a similar program which allows the arguments xand y to be specified as named arguments. This may be written by specifying the keywords-x and -y and using the Float constructor to provide a function to set x and y:

8.2. TIMING 167

let x, y -=

let x = ref None and y = ref None inlet usage = "Usage: test -x <x> -y <Y>" inlet set_x = "-x", Arg.Float (fun a -> x := Some a), "the x value" inlet set_y -= "-y", Arg.Float (fun a -> y :-= Some a), "the y value" inArg.parse [set_x; set_yJ ignore usage;match !x, !y with

Some x, Some y -> x, yI _ -> invalid_arg usage in

This program may be compiled and executed as simply as the first example:

$ ocamlopt test.ml -0 test$ ./test -x 2 -y 38.

Naturally, named command-line arguments may be specified in any order:

$ ./test -y 3 -x 28.

Specifying the -help argument now provides information on each named argument:

$ ./test -helpUsage: test -x <x> -y <y>-x the x value-y the y value-help Display this list of options--help Display this list of options

The ability to parse command-line arguments can be a productive first step towards writingeasy-to-use programs.

8.2 Timing

Two different timing functions are provided by OCaml when running under Unix:

• The Sys. time function returns the time in seconds spent executing the current program.

• The Unix .gettimeofday function returns the time in seconds since the "Epoch"l.

These timing functions can be useful in several circumstances. Most simply, as a means ofprofiling a program, measuring the amount of time required to perform different operations.By using timing functions selectively, useful profiling information can be obtained withoutintroducing the overhead of full profiling caused by compiling with profiling on (as discussed

IMidnight UTe 1st January 1970

168 CHAPTER 8. LIBRARIES

in section 7.1). Timing functions can also be used in benchmarking (as seen in section 7.3.1)and in the creation of real-time programs, such as those producing animations (as discussedin chapter 6).

When timing operations, a higher-order function which returns the time taken to compute itsfunction argument can be useful:

# #load "unix. ema" ; ;# let time f =

let t = Unix.gettimeofday () inf ();(Unix. gettimeofday (») -. t;;

val time: (unit -> 'a) -> float = <fun>

This function may be used to time any given unit -> unit function. For example, theresolution of this timing function may be found experimentally by applying it within anapplication of Array.map (which is comparatively very fast). For a large array a:

# let a = Array.make 1000000 0;;val a ; int array = ..•

The time taken to apply Array. map is small:

# time (fun () -> Array.map (fun i -> i) a);;- : float = 0.389801025390625

The time spent in the timer function is then very significant:

# let t, b =let b =ref [I I] inlet f () = b := Array.map (fun _ -> time (fun i -> i)) a inlet t = time f int, !b;;

val t : float = 1.5448310375213623val b : float array =

[I 1. 1920928955078125e-06; 1.9073486328125e-06; 9. 5367431640625e-07; 0.;

The time taken to call this timer function is approximately 1.54 - 0.39 = 1.15J.ls and theresolution of this timer is approximately 1J.ls (which is not surprising given that this is knownas the microsecond timer!).

In contrast, at least on this platform2 , the Sys. time function is remarkably inaccurate bycomparison:

2 Athlon/Linux.

8.3. BIG ARRAYS 169

# let time f ""let t "" Sys.time 0 inf 0;(Sys.time 0) -. t;;

val time: (unit -> 'a) -> float"" <fun># time (fun 0 -> Array.map (fun i -> i) a);;- : flo at "" 0.369999999999999218# let t, b ""

let b "" ref [I I] inlet f 0"" b :"" Array.map (fun _ -> time (fun i -> i)) a inlet t "" time f int, !b;;

val t : float"" 1.38000000000000078val b : float array""

[10.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.; 0.;

Not uncoincidentally, these results are all very close to multiples of O.Ols, Le. the Sys. timefunction only has centi-second resolution. Thus, the Unix. gettimeofday function is likely tobe the timer of choice, at least on this platform.

8.3 Big arrays

The array type was described in detail in section 3.2. This type offers a polymorphic, homogeneous container optimised for use within OCaml programs.

However, the array type suffers from two main drawbacks:

• The maximum number of elements, given by Sys .max_array_length, is only aroundfour million3 on a 32-bit architecture.

• Values of the type array are not easily handled by functions written in other languages.

These drawbacks are most important in the context of numerical programming, which sometimes requires the use of very large arrays of numeric elements (e.g. float) and the use offunctions written in other languages, such as C and Fortran.

These drawbacks are addressed by another type of array, known as big arrays, which cancontain an arbitrary number of elements of various different numeric types and which are storedin either C or Fortran format. However, big arrays also have some relative disadvantages:

• The elegant pattern matching syntax for the array type cannot be used with big arrays.

• The top-level currently lacks the ability to print the contents of a big array.

• Access to the elements of big arrays from OCaml is slower than access to the array type.

3In the case of float arrays, the maximum size is half Sys .max_array_length.


Definitions pertaining to big arrays are in the Bigarray module of the core OCaml distributionand are described in detail in the OCaml reference manual [2]. We shall give only a briefreviewof the functionality of big arrays, required to understand the remainder of this book.

A top-level including the functionality of big arrays may be created and entered using:

$ ocamlmktop bigarray. cma -0 bigarray. top$ ./bigarray.top

The Bigarray module is designed to have its namespace opened:

# open Bigarray; ;

Specialised big arrays of one-, two- and three-dimensions may be handled using definitionsin the submodules Array1, Array2 and Array3, respectively, as well as arrays of arbitrarydimensionality in the Genarray submodule.

Elements in a big array essentially have two different types associated with them, defined bythe kind of big array:

• The OCaml type used the handle the elements, e.g. int or float .

• The storage type used to represent each element in memory, e.g. int32_el tor float64_el t.

For example, the ''kind'' of a big array which uses the OCaml type float but which actuallystores the floating-point values using the 32-bit IEEE single-precision format (i.e. sacrificingprecision for memory usage) is denoted by the value:

# let mykind = float32; ;val mykind: (float, Bigarray. float32_elt) Bigarray .kind = <abstr>

Note that the abstract type of this value prescribes the use of the OCaml type float and thestorage type float32_el t.

The array "layout" used by the C language may be specified using:

# let mylayout = c_layout;;val mylayout : Bigarray. c_layout Bigarray . layout = <abstr>

A 1D array of this kind using this layout may then be created from a value of the type arrayusing the of _array function in the Array1 module:

# let a = Array. init 4 (fun i -> 1. I. float_of_int (1 + i));;val a : float array = [11.; 0.5; 0.333333333333333315; 0.251 ]# let a = Array1.of_array mykind mylayout a;;val a: (float, Bigarray. float32_elt, Bigarray. c_layout)

Bigarray . Array1. t = <abstr>

8.4. VECTOR-MATRIX 171

An efficient iter function over such a big array may be defined by specializing the big arraytype:

# let iter f (a: (float, float32_elt, c_layout) Array1. t) =

let len =: Array1.dim a inif len = 0 then () elsefor i =: 0 to len - 1 do

f (Arrayl.get a i)done; ;

val iter: (float -> 'a) -> (float, Bigarray.float32_elt,Bigarray. c_layout) Bigarray. Arrayl. t -> unit = <fun>

The contents of a may then be printed using this iter function:

# iter (fun x -> print_endline (string_oCfloat x)) a;;1.0.50.3333333432670.25- : unit = ()

Note the reduced precision of the representation of ~.

We shall use big arrays in the context of vector-matrix computations and the Fourier transform.

8.4 Vector-Matrix

Many scientific problems can be phrased in terms of vector-matrix algebra. Thus, the ability tohandle vectors and matrices can be instrumental in writing scientific programs. In particular,the ability to perform some complicated computations on them (e.g. finding the eigenvaluesof a matrix) can be pivotal in scientific programs. Such computations are often prone tonumerical error and, therefore, can be tedious to program robustly.

Fortunately, LAPACK is a well known, freely-available library of functions for performingmany such vector and matrix computations [11]. Much of the functionality of LAPACKis available from OCaml through freely-available bindings called lacaml written by MarkusMottl, Christophe Troestler, Oleg Trott and Liam Stewart.

Once LAPACK and lacaml are installed, a top-level which includes the functionality of thelacaml bindings may be created using:

$ ocamlmktop -custom -cclib -11apack2 -I +lacaml bigarray.cma lacaml.cma -0

lacaml.top

Programs may be compiled similarly to byte-code and native code:

$ ocamlc -custom -cclib -11apack2 -I +lacaml bigarray.cma lacaml.cmafile.ml -ofile$ ocamlopt -cclib -llapack2 -I +lacaml bigarray. cmxa lacaml. cmxa file .ml -0 file

The ability to use the LAPACK library to perform vector-matrix computations is a great boonfor scientific programming in OCaml.


8.5 Fourier transform

Many scientific computations require the use of the Fourier transform, or an algorithm basedupon the Fourier transform (such as fast convolution). In the context of numerical algorithms,the Fourier transform cannot be performed. This stems from the fact that the Fourier transform is obtained by taking the limit of infinite period and, of course, computers cannot handleinfinite amounts of data. However, the coefficients of a Fourier series may be computed givenuniformly spaced sampling data. Hence, Fourier series are typically computed in the place ofthe Fourier transform and, misleadingly, algorithms for computing Fourier series are referredto as Fourier transform algorithms. In particular, the Fast Fourier Transform (FFT) algorithm, which computes the Fourier series of a set of n uniform samplings in 8(nln n) timecomplexity for any4 n.

Implementing a FFT which works for any n is decidedly tricky. Fortunately, this hard workhas already been done for us. Matteo Frigo and Steven G. Johnson have written and distributed an excellent implementation of the FFT, called the Fastest Fourier Transform in theWest (FFTW). This implementation is freely available on the web. Christophe Troestler haswritten OCaml bindings for the FFTW library and also distributed them for free on the web.We shall now describe the use of the FFTWlibrary via these bindings, assuming both FFTWand the bindings have already been installed.

A top-level which includes the functionality of the FFTW library may be created using:

$ ocamlmktop fftw.cma -0 fftw.top

This top-level may then be used to compute Fourier series.

$ . /fftw. top# open Bigarray; ;

In the interests of efficiency, the FFTW library provides a function to generate partiallyspecialised functions to compute the FFT for a given n. The OCaml bindings to the FFTWlibrary present this functionality as a curried function Fftw. create. The resulting functionacts upon a big array. We shall use the following fourier function which acts upon normalarrays of the type Complex. t array:

4 Although many numerical texts aimed at scientists (e.g. the infamous Numerical Recipes [12]) mistakenlyclaim that the Fast Fourier Transform can only be performed on integer power of two numbers of samples, thisis not true. Indeed, this has not been true for decades.

8.5. FOURIER TRANSFORM

# let (fourier, ifourier) =let to_big a =

let n = Array.length a inlet big_a = Array1. create Fftw. complex c_layout n inArray.iteri (fun i z -> Array1.set big_a i z) a;big_a in

let of_big big3 =

let n = Array1. dim big_a inArray.init n (fun i -> Array1.get big3 i) in

let fft norm dir a =let n, big_a = Array.length a, to_big a inof_big (Fftw. create -normalize :norm dir n big_a) in

(fft false Fftw.forward, fft true Fftw.backward);;val fourier : Complex. t array -> Complex. t array = <fun>val ifourier : Complex. t array -> Complex. t array = <fun>

This fourier function will compute the Fourier series Vs from Ur :

n-lVs = L ure-27rirs/n

r=O

The ifourier function will compute Ur from the Fourier series Vs:

n-lUr = .!. L vse27rirs/n

n s=O

173

Note the asymmetric normalisations, unconventional in the physical sciences.

In the interests of clarity, we shall use the following function to create a string representing acomplex number:

# let string_of_complex z = match z.Complex.re, z.Complex.im with0.,0. ->"0"x, O. -> string_of_float x0., Y -> (string_of_float y)-"i"x, y -> (string_of_float x)-" + "-(string_of_float y)-"i";;

val string_of_complex : Complex. t -> string = <fun>

and the following function to create a string representing an array of complex numbers:

# let string_of_complex_array a =let 1 = Array. to_list a in"[I"-(String.concat "; "(List.map string_of_complex l))-IIIJ";;

val string_of_complex_array : Complex. t array -> string = <fun>

Let us create variables to use as short-hand notations for values n = -1, z = 0 and p = 1 oftype Complex. t:


a)

Ur

0.5

-0.5

-1b)

-8 -6 -4

Im[vs ]10

7.55

2.5

-~.5

-5-7.5-10

2 6

.. Figure 8.1: Fourier series of a discretely sampled sine wave, showing: a) the samplesUr r E [0,16) and Fourier series sin(~r), and b) the corresponding Fourier coefficients Vscomputed numerically using FFTW.

# let (n, z, p) =

let p = Complex. one in(Complex. neg p, Complex. zero, p) ; ;

val n : Complex.t = {Complex.re = -1.; Complex.im= -O.}val z : Complex.t = {Complex.re = 0.; Complex.im= O.}val p : Complex. t = {Complex. re = 1.; Complex. im = O.}

This creates an array a containing (0,1,0,-1,0,1,0,-1,0,1,0,-1,0,1,0,-1) and, therefore,n = 16:

# let a = CIz; p; z; n; z; p; z; n; z; p; z; n; z; p; z; n 1J ; ;val a : Complex.t array =

# string_of_complex_array a;;-: string = "CIO; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1.IJ"

This discrete sampling is illustrated in figure 8.1a. We shall now calculate the functional formof the Fourier series via the numerically computed series found using FFTW.

Taking the samples to be unit-separated samples, the Nyquist frequency is lINy = ~. The signala may be considered to be a sampling of a real-valued sinusoid A sin(21fl/r) with amplitudeA = 1 and frequency l/ = !lINy = i.The FFT of a is:

# string_of_complex_array (fourier a);;- ': string = "C 10; 0; 0; 0; -8. i; 0; 0; 0; 0; 0; 0; 0; 8. i; 0; 0; 0 IJ "

Each element Vs of this array may be taken to represent the frequency l/(8) = 8l1Ny/n andamplitude A = ~vs of Fourier components in the signal. As we are dealing with Fourier series,the indices 8 are periodic over n. Consequently, we may productively interpret the second halfof this result as representing negative frequencies, as illustrated in figure 8.1b.

In this case, the only non-zero elements are V4 = -8i and V12 = 8i. This shows that the signalcan be represented as the sum of two plane waves which, in fact, partly cancel to give a sine

8.5. FOURIER TRANSFORM

wave:

f(r) ~ ( L vSe21firs/n + L vse21fir(s/n-l))

O:::;s<!n !n:::;s<n

~ (V4e21fir4/16 _ V12ie21fir(12/16-1))

116

((-8i)isin(~r) + (8i)isin(-~r))

sin(~r)

as expected.

Also, the inverse FFT of the FFT of a recovers the original a:

# string_of_complex_array (ifourier (fourier a»;;- : string = " [10; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1.; 0; 1.; 0; -1. I] "

175

The FFTW library and the OCaml bindings for FFTW allow the Fourier series of large,possibly multidimensional data sets to be computed quickly and with relative ease.


Chapter 9

Simple Examples

In this chapter, we shall design and develop many simple functions. These derivations areprovided for readers who wish to expose themselves to simple examples of OCaml code beforeattempting to decipher the more involved examples presented in chapter 10. Readers fluentin OCaml may wish to skip this chapter.

9.1 Arithmetic

Many useful functions simply perform computations on the built-in number types. In thissection, we shall examine progressively more sophisticated numerical computations.

The heaviside step function:

is a simple numerical function which may be implemented trivially in OCaml:

# let heaviside x = if x < O. then o. else 1. ; ;val heaviside : float -> float = <fun>

The Kronecker 0-function:

may be written as:

# let kronecker i j = if i = j then 1 else 0; ;val kronecker : 'a -> 'a -> int = <fun>

However, this implementation is polymorphic, which may be undesirable for two reasons:

• Static type checking will not enforce the application of this function to integer typesonly.

177

178 CHAPTER 9. SIMPLE EXAMPLES

• Polymorphism incurs run-time performance penalties and this function may be used inperformance-critical code.

Consequently, we may wish to restrict the type of this function by adding an explicit typeannotation:

# let kronecker (i : int) j = if i = j then 1 else 0; ;val kronecker : int -> int -> int = <fun>

Erroneous applications of this function to inappropriate types will now be caught at compiletime by the OCaml compilers and this function will execute more quickly due to removal ofpolymorphism.

Computations involving trigonometric functions may be performed using the sin, cos, tan,

asin (arcsin), acos (arccos), atan (arctan), atan2 (as for atan but accepting signed numerator and denominator to determine which quadrant the result lies in), COSh, sinh and tanh

functions. For example, the constant 1r (= 3.14159265358979312 ... ) is most easily calculatedas 1r = 4 arctan 1 using the a tan function:

# let pi = 4. *. atan 1.;;val pi : float = 3.14159265358979312

The conventional mathematical functions vx (sqrt x) and eX (exp x) are required to computethe Gaussian:

1 1 (X 11)2j(x) = --e-~ -,..~(7

Thus, a function to calculate the Gaussian may be written:

# let gaussian mu sigma x =let sqr x = x *. x inexp (-. sqr (x -. mu) /. (2. *. sqr sigma)) /.

(sqrt (2. *. pi) *. sigma);;val gaussian; float -> float -> float -> float = <fun>

As this implementation of the Gaussian function is curried, a function representing a probability distribution with a given f-L and (7 may be obtained by applying the first two arguments:

# let f =gaussian 1. 0 . 5; ;val f ; float -> float = <fun># Array. init 21 (fun i -> f (float_of _int i /. 10.));;- : float array =

[10.107981933026376126; 0.157900316601788299; 0.221841669358911087;0.299454931271489755; 0.38837210996642596; 0.483941449038286731;0.579383105522965458; 0.666449205783599341; 0.73654028060664678;0.78208538795091187; 0.797884560802865406; 0.782085387950911759;0.73654028060664678; 0.666449205783599341; 0.57938310552296568;0.483941449038286731; 0.388372109966425849; 0.299454931271489755;0.221841669358911087; 0.157900316601788382; 0.1079819330263761261J

9.1. ARITHMETIC 179

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

1 6 15 20 15 6 1

Figure 9.1: The first seven rows of Pascal's triangle.

We have already seen the factorial function:

# let rec factorial n = if n < 1 then 1 else n * factorial (n - 1);;val factorial: int -> int = <fun>

The binomial coefficient (~) is typically defined in mathematics as:

(~) - r!(nn~ r)!

Naturally, this may be written directly in terms of the factorial function:

# let binomi al n r = factorial n / (f actori al r * factorial (n - r)) ; ;val binomial int -> int -> int = <fun>

# List.map (binomial 6) [0; 1; 2; 3; 4; 5; 6J;;- : int list = [1; 6; 15; 20; 15; 6; 1J

However, even when the result can be represented within machine precision, this naIve implementation of the binomial function can fail because a subexpression (specifically n!) overflows:

# List.map (binomial 13) [0; 1; 2; 3; 4; 5; 6J;;- : int list = [1; 0; -2; -9; -24; -44; -59J

In this case, the correct result e:) = 1716 is well within machine precision and the erroneousresults due to overflowing arithmetic are likely to be deemed unacceptable.

This problem with numerical precision is most easily circumvented by resorting to computingPascal's triangle, where each number in the triangle is the sum of its two "parents" from therow before (illustrated in figure 9.1). This may be represented as the recurrence relation:

(;) = { (n;') l (~=ilr=Or=notherwise

Computing binomial coefficients using Pascal's triangle is more robust than computing viafactorials because the numbers involved now increase monotonically, only overflowing if theresult overflows.

The recurrence relation may be implemented as a recursive function:


# let rec binomial n r =

if r = 0 I I r = n then 1 elsebinomial (n - 1) r + binomial (n - 1) (r - 1);;

val binomial: int -> int -> int = <fun>

However, the double recursion and lack of reuse of previous results leads to an asymptoticalgorithmic time-complexity of O( (~)). Although this implementation is numerically robust,its complexity may be unacceptable.

The complexity may be improved by reusing previous results. This can be achieved by computing rows of Pascal's triangle up to row n and then extracting the r th element. Such analgorithm may be implemented using a list to represent each row of the triangle:

# let binomial n r =

let rec aux n =let rec aux2 = function

[J -> []I [hJ -> [hJI hi: :h2::t -> (hi + h2) :: aux2 (h2 :: t) inif n = 0 then [1] else aux2 (0 :: aux (n - 1» in

List.nth (aux n) r;;val binomial: int -> int -> int = <fun>

Alternatively, an equivalent function may be written using arrays, by mutating a single arrayin-place:

# let binomial n r =

let b = Array. init (n + 1) (fun i -> if i = 0 then 1 else 0) infor i = 1 to n do

b.(i)<-l;for j = i - 1 downto 1 do

b.(j) <-b.Cj) +b.Cj -1);done;

done;b. (r); ;

val binomial : int -> int -> int = <fun>

Both the list- and array-based implementations can compute the previous example withoutoverflowing:

# List.map (binomial 13) [0; 1; 2; 3; 4; 5; 6J;;- : int list = [1; 13; 78; 286; 715; 1287; 1716J

Although the asymptotic time-complexity has been worsened from O(n) for the factorialbased implementation to O(n2 ) for the Pascal's triangle implementations, this complexity isstill acceptable because n is limited to small values by the rapid growth of the result. Also, notethat the asymptotic space-complexity is O(n) for the implementations based upon Pascal'striangle.

The recurrence-relation-based implementation may also be made more efficient by simplystoring previous results. This can be achieved by storing results in a hash table which maps2-tuples (n,r) onto answers (~):

9.2. LIST RELATED 181

# let rec binomial =let memory = Hashtbl. create 1 infun n -> fun r ->

if r = 0 I I r = n then 1 elsetry Hashtbl. find memory (n, r)with Not_found ->

let ans = binomial (n - 1) r + binomial (n - 1) (r - 1) inHashtbl.add memory (n, r) ans;ans; ;

val binomial: int -> int -> int = <fun>

Storing and recalling previously computed results in this manner is known as memoizing (seesection A.7). This memoized implementation of the binomial function only computes resultsas necessary (lazy evaluation). Consequently, the asymptotic time-complexity of computingC) for some constant c is now O(n) rather than O(n2

), as for the list- and array-based implementations. However, the asymptotic space-complexity has increased from O(n) to O(n2 )

in the memoized implementation.

In addition to arithmetic!, many useful functions are related to data structures.

9.2 List related

In this section, we shall examine a variety functions which act upon lists. These functionsare often polymorphic and typically make use of either recursion and pattern matching orhigher-order functions. These concepts can all be very useful but are rarely seen in currentscientific programs.

9.2.1 count

The ability to count the number of elements in a list for which a predicate returns true issometimes useful. A function to perform this task may be written by accumulating the count,folding a test over each element in turn. As addition is both associative and commutative andfold_Tight is not tail recursive, this task is best performed using the fold_left function:

# let list_count pred 1 =

List.fold_left (fun count e -> count + if pred e then 1 else 0) 0 1;;val list_count: (' a -> bool) -> 'a list -> int = <fun>

For example, the following counts the number of elements which are exactly divisible by three(0, 3, 6 and 9):

# list_count (fun x -> x mod 3 = 0) [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int = 4

As a polymorphic function, count may be applied to lists of any type. For example, thiscounts the number of lists in the given list which are greater than Or equal to2 the list [2; 3;4J according to the built-in lexicographic ordering:

Ipun intended.2 As ( <= ) a b means a <= b.


# list_count (( <=) [2; 3; 4J) [[1; 2; 3J; [2; 3; 4J; [3; 4; 5JJ;;- : int = 2

The actual lists counted are easily extracted by applying the List. fil ter function instead:

# List.filter (( <=) [2; 3; 4J) [[1; 2; 3J; [2; 3; 4J; [3; 4; 5JJ;;- : int list list = [[2; 3; 4J; [3; 4; 5JJ

The list_count function applies the given predicate function to all n elements in the givenlist. Consequently, the asymptotic complexity of this function in terms of the number ofpredicate tests performed n is 8(n).

9.2.2 position

The ability to prepend elements to lists indefinitely makes them the ideal data structure formany operations where the length of the output depends upon the input. Let us examine afunction which composes an arbitrary length list as the result.

A function like list_count but which returns a list of the indices of the matching elements canalso be useful. This functionality can be obtained by folding with an accumulator containingboth the current index i and the resulting index list is:

# let list_position pred 1 =let aux (i, is) e = i + 1, if pred e then i :: is else is insnd (List.fold_left aux (0, [J) 1);;

val list_position: (, a -> bool) -> ' a list -> int list = <fun>

As the fold returns a 2-tuple containing the list length and the list of indices, the snd functionfrom the Pervasives module is used to extract the second element of the 2-tuple (the result).The auxiliary function aux prepends the current index i onto the result is if the predicatepred matches, and increments the current index i.

For example, the following extracts the list ofindices of the elements which are exactly divisibleby three:

# list_position (fun x -> x mod 3 = 0) [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int list = [9; 6; 3; OJ

Like list_count, the list_position function is useful for general purpose list dissection.

9.2.3 mapi

In addition to the conventional higher-order functions map and rev _map, analogous functionswhich present the integer index as well as the value of the each element can be useful, i.e. tofind {!(O,lo),!(1,h), ... ,!(n-1,ln-l)}. These function are conventionally called mapi andrev_mapi, the former currently being provided for arrays (described in section 3.2) but neitherare provided for lists. For example, the Array .mapi function may be used to convert an arrayof values into an array of index-value pairs:

9.2. LIST RELATED

# Array.mapi(funi e->i, e) [I'a'; 'c'; 'e'; 'g'; 'i'IJ;;-: (int*char) array = [1(0, 'a'); (1, 'c'); (2, 'e'); (3, 'g'); (4, 'i')IJ

183

The mapi function for lists could be written using pattern matching, with an auxiliary functionto accumulate the current index:

# let list_mapi f 1 =let rec aux n = function

h: :t -> let h = f n h in h :: aux (n + 1) tI [J -> [J inauxOl;;

val mapi : (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>

This implementation of the list_mapi function uses a 2-argument nested auxiliary functionaux which accepts the current index n and the remaining list. The aux function is initiallycalled with the arguments 0 and the input list 1. The aux function repeatedly decapitatesthe remaining list, applying the given function f to the current index n and the head h of theremaining list and prepending the resulting value f n h onto the list formed by recursing onthe tail t, until no elements remain. Note that, by using the form let h = f n h in, thisfunction can guarantee to apply the given function f in forwards order to each element in thegiven list, Le. the first application of f is to the first element of the given list.

This list_mapi function provides the same functionality for lists as the Array. mapi functiondoes for arrays. For example:

# list_mapi (fun i e -> i, e) ['a'; 'c'; 'e'; 'g'; 'i'J;;-: (int * char) list = [(0, 'a'); 0, 'c'); (2, 'e'); (3, 'g'); (4, 'i')J

However, the list_mapi function is not tail-recursive. A tail-recursive alternative may bewritten by composing the result in reverse.

A rev_mapi function for lists could be written using pattern matching, with an auxiliaryfunction to accumulate the current index:

# let list_rev_mapi f 1 =let rec aux n accu = function

h: :t -> aux (n + 1) (f n h :: accu) tI [J -> accu inaux 0 [J 1;;

val list_rev_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>

This implementation of the 1 i st_rev_mapi function uses a 3-argument nested auxiliary function aux which accepts the current index n, the accumulated result accu and the remaininglist. This auxiliary function repeatedly decapitates the remaining list, recursing with the index, accumulator and remainder as the incremented index n+l, the result f n h of applyingthe given function f to the current index and the current element prepended onto the accumulator accu and the tail of the remainder. The way in which the aux function recurses isimportant in several ways:

• By repeatedly decapitating the input and prepending the result onto the accumulator,the accumulator is built in reverse order.


• As the function application f n h appears in an argument to the recursive call, thisapplication of the function f must be applied before recursing3 and, therefore, the firstapplication of the given function f can again be guaranteed to be to the first element ofthe given list. Hence, there is no need to use a let h = f n h in construct to guaranteeapplication order, as in the previous example.

• This implementation of the list_rev_mapi function is tail recursive because the resultof the recursive call is not acted upon. Consequently, the list_rev_mapi function willbe considerably faster than the list_mapi function when applied to large lists.

The list_rev_mapi function produces the reverse of the result of the list_mapi function:

# list_rev_mapi (fun i e -> i, e) ['a'; 'c'; 'e'; 'g'; 'i'J;;- : Cint * char) list = [(4, 'i'); (3, 'g'); (2, 'e'); (1, 'c'); (0, 'a')J

Interestingly, an equivalent list_rev_mapi function may be rewritten in terms of a fold, byaccumulating a 2-tuple containing the current index n and the resulting list 12:

# let list_rev_mapi f 1 =snd (List. fold_left (fun (n, 12) e -> n + 1, (f n e :: 12)) (0, [J) 1);;

val list_rev_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>

However, an equivalent list_mapi function cannot be written in terms of a fold withoutperforming 2 traversals of the input list. Specifically, the functionality may be obtainedeither by reversing the result of list_rev_mapi or by using fold_right, in which case theaccumulated index must count down from the length of the list which can only be obtainedby explicitly counting the number of elements in the list using length:

# let list_mapi f 1 = List. rev (list_rev_mapi f 1);;val list_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun># let list_mapi f 1 =

let aux e (n, 1) = (n - 1, (f (n - 1) e :: 1)) insnd (List.fold_right aux 1 (List.length 1, [J));;

val list_mapi: (int -> ' a -> 'b) -> ' a list -> 'b list = <fun>

Thus, for small input lists, 1 ist_mapi is best written in terms of pattern matching.

9.2.4 chop

Consider a function to chop a list into two lists at a given index i. This can be achieved byrecursively chopping the tail at index i - 1 until i = 0 and the 2-tuple of the empty list (thefront list) and the remaining tail (the back list) is returned. When completed, the recursivecalls have the decapitated head prepended onto the front list fr:

3 Formally, this can be attributed to the fact that OCaml is a strict language, meaning that functionarguments are always evaluated before the function application takes place. In contrast, some other languages,known as lazy languages, only evaluate expressions when their result is required.

9.2. LIST RELATED

# let rec list_chop i 1 = match i, 1 withI 0, 1 -> ([J, 1)I i, h::t -> (fun (fr, ba) -> h :: fr, ba) (list_chop (i - 1) t)I _ -> invalid_arg "list_chop";;

val list_chop: int -> 'a list -> 'a list * 'a list = <fun>

185

As this implementation of the chop function only traverses the list to the given index i, thealgorithm is 8(i). Also, this implementation is not tail recursive, as the A-function acts uponthe result of the recursive call. A tail-recursive alternative may be written using an auxiliaryfunction which accumulates the front list in reverse order, applying rev to obtain the correctresult. As a function which returns the front list in reverse order can be useful when definingother functions, we shall separate this into a list_rev_chop function:

# let list_rev_chop i 1 =let rec aux i fr ba = match i, fr, ba with

0, fr, ba -> (fr, ba)Ii, fr, h::t -> aux (i -1) (h:: fr) tI _ -> invalid_arg "list_rev_chop" inauxi[Jl;;

val list_rev_chop: int -> 'a list -> 'a list * 'a list = <fun>

A tail-recursive function equivalent to list_chop may then be written in terms of list_rev_chop:

# let list_chop_tr i 1 =(fun (fr, ba) -> List.rev fr, ba) (list_rev_chop i 1);;

val list_chop_tr : int -> 'a list -> 'a list * 'a list = <fun>

For example, chopping the first five elements off the list {O ... 9} gives the lists {O ... 4} and{5 ... 9}:

# list_chop_tr 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int list * int list = ([0; 1; 2; 3; 4J, [5; 6; 7; 8; 9J)

The list_rev_chop function supplies the first list in reverse order:

# list_rev_chop 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int list * int list = ([4; 3; 2; 1; OJ, [5; 6; 7; 8; 9J)

As a tail-recursive function, list_ chop_tr will be considerably faster than list_chOp forlarge i.

As we shall see, these functions can be used in the creation of several, more sophisticatedfunctions.

9.2.5 dice

Consider a function called list_dice which splits a list containing nm elements into n lists ofm elements each. This function may be written in terms of the list_chop function developedin section 9.2.4.


# let rec list_dice m 1 =match list_chop m 1 with

(1, [J) -> [lJI (11, 12) -> 11 :: list_dice m 12;;

val list_dice: int -> 'a list -> 'a list list = <fun>

For example, the list_dice function may be used to dice the list {1 ... 9} into lists containing3 elements each:

# list_dice 3 [1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int list list = [[1; 2; 3J; [4; 5; 6J; [7; 8; 9JJ

This function could be used, for example, to convert a stream of numbers into 3D vectorsrepresented by lists containing three elements.

9.2.6 replace

The ability to replace the i th element of a list is sometimes useful. As the i th element of alist may be reached by traversing the previous i elements, this task can be done in 8(i) timecomplexity. A function to perform this task may be written in terms of the list_rev_chopfunction (described in section 9.2.4) by replacing the head of the back list before appendingthe front list in reverse order using the rev _append function:

# let list_replace xiI = match list_rev_chop i 1 withfr, ba -> List . rev_append fr (x :: List. tl ba);;

val list_replace: 'a -> int -> 'a list -> 'a list = <fun>

For example, the following replaces the 6th element of the given list4 with the number 100:

# list_replace 100 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int list = [0; 1; 2; 3; 4; 100; 6; 7; 8; 9J

More sophisticated functions may also be written in terms of the chop function.

9.2.7 sub

Another function found in the Array module but not in the List module is the sub function.This function extracts a subset of consecutive elements, a sub-array in the context of arrays.A tail-recursive equivalent for lists may be written:

# let lisCsub i j 1 =

fst (liscchop_tr (j - i) (snd (liscrev_chop i 1)));;val list_sub: int -> int -> 'a list -> 'a list = <fun>

This implementation takes the back list (using snd) after chopping at i and then chops thislist at j - i, giving the result as the front list (extracted using the fst function).

For example, the sublist with indices [3,7) of the list {a ... 9} is the list {3 ... 6}:

4Remember, indices conventionally start at zero in OCaml.

9.2. LIST RELATED

# lisCsub 3 7 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;int list = [3; 4; 5; 6J

187

Just as Array. sub can be useful, so this list_sub function can come in handy in manydifferent circumstances.

9.2.8 extract

A function similar to the list_replace function (described in section 9.2.6) but which extractsthe i th element of a list, giving a 2-tuple containing the element and a list without that element,can also be useful. As for list_replace, the list_extract function may be written in termsof the list_rev_chop function:

# let list_extract i 1 = match list_rev_chop i 1 withfr, h::t -> h, List.rev_append fr t

I _ -> invalid_arg "list_extract";;val list_extract: int -> 'a list -> 'a * 'a list = <fun>

For example, extracting the element with index five from the list {O ... 9} gives the element 5and the list {O ... 4,6 ... 9}:

# list_extract 5 [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int * int list = (5, [0; 1; 2; 3; 4; 6; 7; 8; 9J)

This function has many uses, such as randomizing the order of elements in lists.

9.2.9 randomize

This function can be used to randomize the order of the elements in a list, by repeatedlyextracting randomly chosen elements to build up a new list:

# let list_randomize 1 =let extract_rand 1 = list_extract (Random. int (List . length 1)) 1 inlet rec aux accu = function

[J -> accuI 1 -> (fun (h, t) -> aux (h:: accu) t) (extract_rand 1) in

aux [J 1;;val list_randomize: 'a list -> 'a list = <fun>

Tllis implementation contains a nested function extract_rand which extracts a random element from the given list and an auxiliary function aux which repeatedly extracts randomelements, prepending them onto an accumulator to build up a randomized list. The aux function uses a A-function to prepend the extracted element onto the accumulator and recurse.Although the recursive call to aux is within this A-function, the result is not acted upon and,therefore, this implementation of the list_randomize function is tail recursive.

For example, applying the list_randomize function to the list {O ... 9} gives a random permutation containing the elements 0 ... 9 in a random order:


# list_randomize [0; 1; 2; 3; 4; 5; 6; 7; 8; 9J;;- : int list = [6; 9; 8; 5; 1; 0; 3; 2; 7; 4J

This function is useful in many situations. For example, the programs used to measure theperformance of various algorithms presented in this book used this list_randomise functionto evaluate the necessary tests in a random order, to reduce systematic effects of garbagecollection.

9.2.10 permute

The ability to compute all permutations of a list is sometimes useful. Permutations may becomputed using a simple recurrence relation, by inserting the head of a list into all positionsof the permutations of the tail of the list. Thus, a function to permute a list is most easilywritten in terms of a function which inserts the given element into the given n-element list atall n + 1 possible positions:

# let rec distribute e = function(h: :t) as 1 -> (e: :1) :: (List.map (fun x -> h: :x) (distribute e t))

I [J -> [[eJ J ; ;val distribute: 'a -> 'a list -> 'a list list = <fun>

This distribute function operates by prepending an answer, the element e prepended ontothe given list 1, onto the head of the given list prepended onto each of the distributions of theelement e over the tail t of the given list.

For example, the following inserts the element 3 at each of the three possible positions in thelist [1; 2J:

# distribute 3 [1; 2J ; ;- : int list list = [[3; 1; 2J; [1; 3; 2J; [1; 2; 3JJ

A function to permute a given list may then be written:

# let rec permute = functione :: rest -> List. flatten (List . map (distribute e) (permute rest))

I [J -> [[J J ; ;val permute: 'a list -> 'a list list = <fun>

This permut e function then operates by distributing the head of the given list over the permutations of the tail.

For example, there are 3! = 6 permutations of three values:

# permute [1; 2; 3J;;- : int list list =[[1; 2; 3J; [2; 1; 3J; [2; 3; 1J; [1; 3; 2J; [3; 1; 2J; [3; 2; 1J J

The permute function has many uses, including combinatorial optimisation.

9.3. STRING RELATED

9.2.11 Run-length encoding

189

A transformation called run-length encoding, often used for data compression, converts a listXi into a list of 2-tuples (x, n)i representing ni consecutive repeats of each Xi. A function toperform this task using a given comparison function may be written:

# let rle_eq eq 1 =

let rec aux 12 x n = function[J -> List.rev ((x, n): :12)

I h::t when eq x h -> aux 12 x (n+1) tI h::t -> aux ((x, n): :12) h 1 tinmatch 1 with [J -> [J I h::t -> aux [J h 1 t;;

val rle_eq: (, a -> 'a -> bool) -> 'a list -> (, a * int) list = <fun>

The body of this rle_eq function either maps the empty list onto the empty list or appliesthe nested auxiliary function aux with an empty accumulator, the head of the input list, one(signifying one repeat of the head) and the tail of the input list as the remainder. The auxfunction then repeatedly decapitates the remaining list, either incrementing the repeat counterifthe new head is the same as the previous head, creating a new 2-tuple (h,1) if the new headis different or returning the reverse of the accumulator if there are no remaining elements,Le. the remaining list is empty.

For example, the following run-length encodes an int list by comparing elements using thepolymorphic equality operator =:

# rle_eq (=) [1; 1; 1; 2; 2; 3; 4; 5; 6; 6; 7; 7; 7J;;- : (int * int) list =[0,3); (2,2); (3,1); (4,1); (5,1); (6,2); (7,·3)J

Clearly, many useful functions can be written in a functional style. However, functions overstrings and, particularly, over arrays are often better suited to an imperative style.

9.3 String related

Programs are often required to produce human-readable output. Many string-related functionscan be used to simplify the task of creating such output. In this section, we shall describethe conventional factoring of string-related functions for printing data structures and developa few such functions.

In the remainder of this chapter, we shall use a fold right function for strings not supplied bythe Core library:

# let string_fold_right f s x =

let r = ref x infor i = String . length s - 1 downto 0 do

r : = f s. [iJ ! rdone;!r; ;

val string_fold_right : (char -> 'a -> 'a) -> string -> 'a -> 'a = <fun>


Printing and reading data structures as strings is often accomplished by factoring the conversion into separate functions for:

1. Converting the individual parts of the data structure to and from strings.

2. Converting a whole data structure to and from a string.

3. Printing or reading the string using the usual 10 functions (which were described inchapter 5).

We shall now demonstrate the development of such functions.

The ability to print a list can often be useful. In the interests of consistency, the outputmay be productively written using OCaml syntax. In order to write a polymorphic function,capable of converting any list into a string, a function is required to convert an individualelement. Thus, a string_oLlist is most usefully implemented as a higher-order function:

# let string_of_list string_of 1 =II [II -String. concat "; " (List. map string_of 1) - "J " ; ;

val string_of_list: (' a -> string) -> 'a list -> string = <fun>

An int list may then be converted into a string by supplying the string_oLint functionto the string_oLlist function:

# string_of_list string_of_int [1; 2; 3; 4; 5J;;- : string = "[1; 2; 3; 4; 5J"

Naturally, an equivalent string_oLarray function is easily defined. We shall now examinea slightly more sophisticated example.

9.3.2 DNA sequence 10

The following variant type may be used to represent the set of DNA nucleotides:

# type nucleotide = Adenine I Cytosine I Guanine I Thymine;;type nucleotide = Adenine I Cytosine I Guanine I Thymine

A DNA sequence may then be represented by the type:

# type sequence = nucleotide list;;type sequence = nucleotide list

In order to write a function capable of reading DNA sequences, we begin by writing a functioncapable of reading a single nucleotide:

9.3. STRING RELATED 191

# let nucleotide_of_char = function'A' -> Adenine I 'c' -> Cytosine I 'G' -> Guanine I 'T' -> Thymine

I _ -> invalid_arg "nucleotide_of_char";;val nucleotide_of_char : char -> nucleotide = <fun>

For example, the Guanine constructor is the representation of the nucleotide corresponding tothe character G:

# nucleotide_of_char 'G';;- : nucleotide = Guanine

This function may then be folded over a string to build up a list of nucleotides, converting astring into a DNA sequence:

# let sequence_of _string s =

string_fold_right (fun c seq -> nucleotide_of_char c :: seq) s [];;val sequence_of _string: string -> nucleotide list = <fun>

For example, the string GATTACA may be converted into a list of explicitly-named nucleotides:

# let gattaca = sequence_of_string "GATTACA";;val gattaca : nucleotide list =

[Guanine; Adenine; Thymine; Thymine; Adenine; Cytosine; Adenine]

Finally, a function to read a line of characters as a DNA sequence is easily written in termsof the input_line function in the Pervasives module:

# let input_sequence ch = sequence_of_string (input_line ch);;val input_sequence: in_channel -> nucleotide list = <fun>

This function may then be used to read DNA sequences from a file or from standard input.

The converse operations, used to print a DNA sequence, are written in a similar fashion,beginning with a function to convert a single nucleotide into a string:

# let string_of_nucleotide = functionAdenine -> "A" I Cytosine -> "c" I Guanine -> "G" I Thymine -> "T";;

val string_of_nucleotide : nucleotide -> string = <fun>

The string_oCnucleotide function may then be used to write a function to convert a list ofnucleotides into a string by simply concatenating the string representations of each nucleotide:

# let string_of _sequence s =

String. concat '''' (List. map string_of _nucleotide s);;val string_of_sequence : nucleotide list -> string = <fun>

For example, the previously generated nucleotide list can be converted back into the stringGATTACA:

192

# string_of_sequence gattaca;;- : string = "GATTACA"

CHAPTER 9. SIMPLE EXAMPLES

Strings generated by string_oCsequence may, of course, be printed using the simple function:

# let print_sequence seq = print_endline (string_of_sequence seq);;val print_sequence: nucleotide list -> unit = <fun>

The input_sequence and print_sequence functions may then be used to perform 10 onDNA sequence information in human readable form. We shall now consider the slightly moredifficult task of printing matrices.

9.3.3 Matrix 10

Consider the more complicated problem of printing and reading matrices, represented as valuesof the type float array array such as:

# let i3 = [I [11.; 0.; O. I];[10.; 1.; 0.1];[10.; 0.; 1.IJ IJ;;

val i3 : float array array =

[1[11.; 0.; O.IJ; [10.; 1.; O.IJ; [10.; 0.; 1.IJIJ

A simple function to print such matrices may be written:

# let string_of_matrix m =

let row r =String. concat " " (List .map string_of _float (Array. to_list r)) in

String. concat "\n" (List. map row (Array. to_list m)) ;;val string_of_matrix : float array array -> string = <fun>

When applied to i3, this implementation ofthe string_oCmatrix function works perfectly:

# print_endline (string_of_matrix i3);;1. o. o.o. 1. O.o. O. 1.

unit = 0

However, when given a matrix with elements whose string representations are of differentwidths, the results produced by this implementation of the string_oLmatrix function arenot always desirable. For example, a matrix containing the numbers 0.1234 and 0:

# let m = Array.map (fun r -> Array.map (( *. ) 0.1234) r) i3;;val m : float array array =

[I [I 0 . 1234; 0.; o. IJ; [10.; O. 1234; O. IJ; [10.; 0.; o. 12341 J IJ

In this case, the result is confusing because the columns are not aligned:

9.3. STRING RELATED

# print_endline (string_of_matrix m);;0.1234 O. O.O. 0.1234 O.O. O. 0.1234

unit = 0

193

This can be remedied by padding the columns to the maximum width for each column. Afunction to pad a string to the given length may be written:

# let string_pad_left s n =

let len = String . length s inif len >= n then s else String. make (n - len) , ,- s;;

val string_pad_left : string -> int -> string = <fun>

For example, padding the string "0.1234" out to ten characters inserts four spaces:

# string_pad_left "0.1234" 10;;- : string = " 0.1234"

This string_pad_left function can then be used to create a string representation of a matrixmore carefully, by padding columns out to their maximum width:

# let string_of _matrix m =

let m = Array.map (Array.map string_of _float) m inlet width-=

let aux w s -= max w (String . length s) inArray.init (Array.length m.(O))

(fun i -> Array.fold_left aux 0 m.(i)) inlet m-=

Array.map (Array.mapi (fun j x -> string_pad_left x width. (j))) m inlet row r -= String. concat " " (Array. to_list r) inString. concat "\n" (List. map row (Array. to_list m));;

val string_of_matrix : float array array -> string = <fun>

In this implementation ofthe string_oLmatrix function, the nested width variable containsthe maximum width of each column.

For example, this string_oLmatrix function can be used to create much more readablerepresentations of matrices:

# print_endline (string_of_matrix m) ; ;0.1234 O. O.

O. 0.1234 O.O. O. 0.1234unit = 0

Further enhancements to this function might include the ability to align the decimal placedown each column (although this would not be very useful with scientific notation).


9.4 Array related

Many useful functions are provided by the List module which are not provided by the Arraymodule. In particular, the fold_left2, fold_right2 and map2 functions which handle pairs oflists. As we saw in section 3.3, these functions are useful when implementing binary operatorswhich act over pairs of vectors.

9.4.1 map2

The map2 function may be written in terms of the existing ini t function:

# let array_map2 f a b =let len = Array.length a inif len <> Array . length b then invalid_arg " array_map2" ;Array.init len (fun i -> f a.(i) b.(i));;

val array_map2: ('a -> 'b -> 'c) -> 'a array -> 'b array -> 'c array = <fun>

For example, the array_map2 function may be used to implement vector addition over vectorsrepresented by the type float array:

# let vec_add a b = array_map2 (fun a b -> a +. b) a b; ;val vec_add : float array -> float array -> float array = <fun># vec3dd [11.; 2.; 3. I] [12.; 3.; 4. I] ; ;- : float array = [13.; 5.; 7. 1]

Thus the array_map2 function clearly has a use in scientific computing.

9.4.2 Double folds

Mimicking the existing Array. fold_left function, we can write:

# let array_fold_left2 f x a b =let len = Array.length a inif len <> Array.length b then invalid_arg "array_fold_left2";let r = ref x infor i = 0 to len - 1 do

r:= f !r a.(i) b.(i)done;!r; ;

val array_fold_left2 :('a -> 'b -> 'c -> 'a) -> 'a -> 'b array -> 'c array -> 'a = <fun>

A fold_right2 function may be written equivalently to the array_fold_left2 function:

9.4. ARRAY RELATED 195

# let array_fold_right2 f a b x =let len = Array.length a inif len <> Array. length b then invalid_arg larray_fold_right2";let r = ref x infor i = len - 1 downto 0 do

r:= f a.(i) b.(i) !r

done;!r; ;

val array_fold_right2 :('a -> 'b -> 'c -> 'c) -> 'a array -> 'b array -> 'c -> 'c = <fun>

As we have already seen in section 3.3, these fold functions have natural uses in vector algebra,such as computing the vector dot product.

9.4.3 rotate

The ability to rotate the elements of an array can sometimes be of use. This can be achievedby creating a new array, the elements of which are given by looking up the elements withrotated indices in the given array:

# let array_rotate i a =

let n = Array.length a inlet aux k =

let k = (k + i) mod n ina. (if k < 0 then n + k else k) in

Array.init n aux;;val array_rotate : int -> 'a array -> 'a array = <fun>

This function creates an array with the elements of a rotated left by i. For example, rotatingtwo places to the left:

# array_rotate 2 [10; 1; 2; 3; 4; 5; 6; 7; 8; 9IJ;;- : int array = [12; 3; 4; 5; 6; 7; 8; 9; 0; 11 J

Rotating right can be achieved by specifying a negative value for i. For example, rotatingright three places:

# array_rotate (-3) [10; 1; 2; 3; 4; 5; 6; 7; 8; 9IJ;;- : int array = [17; 8; 9; 0; 1; 2; 3; 4; 5; 61J

Considering this function alone, the performance can be improved significantly by rotating thearray elements in-place, by swapping pairs of elements. This can be regarded as a deforestingoptimisation (see section 7.3.3.2). However, the more elegant approach presented here can berefactored in the case of many subsequent rotations (and other, similar operations) such thatno intermediate arrays need be created. In section 10.2, this optimisation is used to improvethe asymptotic complexity of a commonly implemented global minimization algorithm.


9.4.4 Matrix trace

A useful quantity in the context of matrices is the trace of a square matrix, defined as thesum of the diagonal elements.

# let trace a =

let aux (i, tr) r = i + 1, tr +. r. (i) insnd (Array.fold_left aux (0,0.) a);;

val trace : float array array -> float = <fun>

This function folds a nested auxiliary function aux over the rows of the matrix M, accumulatingthe current row index i and the trace tr. The aux function increments i and adds the Miielement to the trace.

For example, the trace of the 3 x 3 identity matrix is simply 3:

# trace [I [11.; 0.; o. IJ; [10.; 1.; o. IJ; [10.; 0.; 1. IJ IJ ; ;- : float = 3.

Clearly, functions written in an imperative style can be useful. We shall now consider thehigh-level factorisation of the functional and imperative functions we have just developed.

9.5 Higher-order functions

As we have already hinted, aggressively factoring higher-order functions can greatly reducecode size and sometimes even lead to a better understanding of the problem. In this section,we shall consider various different forms of higher-order functions which can be productivelyused to aid brevity and, therefore, clarity.

9.5.1 Data structures of functions

In section 7.3.3.2, we introduced the concept of deforesting computations by composing composite functions. This task can be aided by the development of data structures (e.g. a list) offunctions.

The task of mapping some functions over a list may be productively generalised to the taskof repeatedly mapping a list of functions over a given list. This can be implemented naIvelyby the following function:

# let maps fs 1 = List. fold_left (fun 1 f -> List . map f 1) 1 fs;;val maps: (' a -> 'a) list -> 'a list -> 'a list = <fun>

For example, the following multiplies each element by three and then adds two:

# maps [( * ) 3; ( + ) 2J [1; 2; 3J;;- : int list = [5; 8; 11J

9.5. HIGHER-ORDER FUNCTIONS 197

However, as discussed in section 7.3.3.2, the efficiency of this implementation of the mapsfunction may be considerably improved by first compositing the list of functions into a singlefunction and then mapping the composite function over the input list. This functionalityis most easily achieved by first writing a higher-order function to composite a given list offunctions:

# let compose fs = fun X -> List .fold_left (fun x f -> f x) x fs;;val compose: (, a -> 'a) list -> 'a -> 'a = <fun>

For example, the composite of the two functions used in the previous example is given by:

# let f = compose [( * ) 3; ( + ) 2J;;val f : int -> int = <fun>

This function may then be applied to an individual value, e.g. f(2) = 3 x 2 + 2 = 8:

# f 2;;- : int = 8

A deforested version of the maps function may then be written:

# let maps fs 1 = List . map (compose fs) 1;;val maps: (, a -> 'a) list -> 'a list -> 'a list = <fun>

For a list of n functions to be applied to a list of m values, this implementation of the maps

function produces the same result as the previous implementation but without producing theintermediate lists:

# maps [( * ) 3; ( + ) 2J [1; 2; 3J ; ;- : int list = [5; 8; 11J

However, this is only likely to be of benefit when n ~ m. Moreover, functions in the list musthave the same type and, therefore, the type has been inferred to be 'a - > 'a for all 'a. Thus,functions representing a series of transformations between different types may not by passedby list.

9.5.2 Tuple related

Functions to perform operations such as map over tuples of a particular arity are also useful.For example, the following implements some useful functions over 2-tuples:

# let map_2 f (a, b) = (f a, f b)and list_of_2 (a, b) = [a; bJand array_of_2 (a, b) = [Ia; bIJ;;

val map_2 : ('a -> 'b) -> 'a * 'a -> 'b * 'bval list_of_2 : 'a * 'a -> 'a listval array_of_2: 'a * 'a -> 'a array

For example, mapping a function f over a 2-tuple (a, b) results in the 2-tuple (f (a), f (b)):

# map_2 string_of_float (5.7, 9.3);;- : string * string = ("5.7", "9.3")

Such functions can be used to reduce code size in many cases.


9.5.3 Generalised products

The vector dot product is a specialised form of inner product. The inner and outer productsmay, therefore, be productively written as higher-order functions which can then be used as abasis for more specialised products, such as the dot product.

The inner product is most easily written in terms of a given fold_left2 function:

# let inner fold_left2 base f 11 12 g =fold_left2 (fun accu e1 e2 -> g accu (f e1 e2)) base 11 12;;

val inner:«'a -> 'b -> 'c -> 'd) -> 'e -> 'f -> 'g -> 'h) ->'e -> ('b -> 'c -> 'i) -> 'f -> 'g -> ('a -> 'i -> 'd) -> 'h = <fun>

The vector dot product for vectors represented by values of the type float list may then bewritten in terms of this inner function:

# let dot a b = inner List.fold_left2 O. (*. ) a b ( +. );;val dot: float list -> float list -> float = <fun>

For example, (1,2,3) . (2,3,4) = 20:

# dot [1.; 2.; 3. J [2.; 3.; 4. J ; ;- : float = 20.

The generalised outer product is not easily generalised over data structure. Thus, we shallsuffice with a tail-recursive implementation specific to lists:

# let outer f 11 12 =

let aux 1 e1 =

List. fold_left (fun 1 e2 -> f e1 e2 :: 1) [J 12 :: 1 inList.rev_map List.rev (List.fold_left aux [J 11);;

For example:

(1,2,3) 0 (2,3,4) = (~ ~ :)6 9 12

# outer ( *. ) [1.; 2.; 3. J [2.; 3.; 4. J ; ;- ; float list list = [[2.; 3.; 4.J; [4.; 6.; 8.J; [6.; 9.; 12.JJ

Aggressive factoring of higher-order functions can clearly be useful in the context of numericalcomputation. In fact, the inner and outer functions may be further generalised to apply totensors of different ranks. We shall leave this as an exercise for the interested reader!

9.5. HIGHER-ORDER FUNCTIONS

9.5.4 Converting between container types

199

The elements in a container may be copied into a container of a different type by folding aninsertion function over the input, for the fold function of the input container and the insertionfunction of the output container. A function to convert a container into a list using a givenf old function (with the interface of a fold_right function) may, therefore, be written:

# let list_of fold c = fold (fun h t -> h: :t) c [J;;val list of ((, a -> 'a list -> 'a list) -> 'b -> 'c list -> 'd) -> 'b -> 'd

= <fun>

This may be used to create a list _of _array function, equivalent to the existing higher-orderto_list function in the Array module, by passing the Array.fold_right function to thehigher-order list _of function:

# let list_of_array a = list_of Array. fold_right a;;val list_of_array: 'a array -> 'a list = <fun>

The functionality provided by the higher-order list_of function may also be applied to considerably more sophisticated containers with ease. For example, the following implements aset of strings:

# module StringSet = Set. Make (String) ; ;

A function to convert a StringSet back into a string list may be written in terms oflist_of:

# let list_of _string_set = list_of StringSet. fold; ;val list_of_string_set : StringSet. t -> StringSet. elt list = <fun>

Equivalently to the list_of function, a higher-order function to convert data structures ofstrings into a StringSet may be written by taking the fold_right function of the datastructure as an argument:

# let string_set_of fold c =

fold (fun e s -> StringSet. add e s) c StringSet. empty; ;val string_set_of :

((StringSet.elt -> StringSet.t -> StringSet.t) -> 'a -> StringSet.t->'b) -> 'a -> 'b = <fun>

A list of strings may be converted into a StringSet by passing the List. f Old_right functionas an argument to string_set_oClist:

# let string_set_of_list = string_set_of List.fold_right;;val string_set_of_list : StringSet. elt list -> StringSet. t = <fun>

For example, let us create a set called myset by inserting "tree", "plug", "bug" and then"slug":


# let myset = string_set_of_list ["slug"; "bug"; "plug"; "tree"];;val myset : StringSet. t = <abstr>

However, as the fold_right function in the List module is not tail recursive, the resultingintset_oLlist function will be unnecessarily inefficient on input lists with many elements.This is most easily addressed by using a higher-order function rev_fold to convert betweenthe argument-orders of left and right folds:

# let rev_fold fold f a b = fold (fun a b -> f b a) b a;;val rev_fold:

«'a -> 'b -> 'e) -> 'd -> 'e -> 'f) -> ('b -> 'a -> 'e) -> 'e -> 'd -> 'f= <fun>

When the order of the elements is unimportant, this rev_f old function may then be used toapply a left fold where a right fold was expected and vice-versa. In the context of filling aset, the order of insertion makes no difference. Thus, the intset_of_list function may bewritten using the more efficient, tail-recursive List. fold_left function:

# let string_seCof_list = string_set_of (rev_fold List. fold_left) ; ;val string_set_of_list : StringSet. elt list -> StringSet. t = <fun>

This may be used to create a set called myset2 by inserting "slug", "bug", "plug" and then"tree":

# let myset2 = string_set_of _list ["slug"; "bug"; "plug"; "tree"];;val myset2 : StringSet. t = <abstr>

As expected, the two different versions of the string_set_oLlist function produced thesame result (as a set is a sorted container) despite inserting the elements into the set indifferent orders:

# List.map list_ot_string set rmyset; myset21;;- : StringSet.elt list list =[["bug"; "plug"; "slug"; "tree"]; ["bug"; "plug"; "slug"; "tree"]]

Factoring higher-order functions dealing with data structures can clearly be very productive,not only in terms of brevity but also because alterations required to change functionalitybecome more localised. For example, in some circumstances, the ideal choice of data structureis not obvious and, therefore, the ability to pic'n'mix different data structures can be useful.This can be achieved by providing consistent interfaces to data structures in terms of higherorder functions.

We shull now exumine the design and implementation of some practically useful progrul1ls.

Chapter 10

Complete Examples

In this chapter, we shall develop several complete programs used in scientific computing. Inparticular, we shall take examples from each of the most generic computational problemsencountered in scientific computing. The programs presented in this chapter could be optimised to improve performance but we have chosen to illustrate the advantages of clear andsuccinct code. In particular, we use comments for the first time, to describe the purpose andspecification for portions of code. Comments should be used to clarify all but the simplest ofprograms.

10.1 Maximum entropy method

In this section, we shall develop a program which makes use of two important concepts commonly required in scientific computing a:::; well as an arguably under-appreciated third concept:

• Fourier transform - a transform which converts between temporal and spectral representations of signals, commonly occurring in the mathematical descriptions of naturalsystems and often used in analysis.

• Local function minimization - algorithms used to find a minimum of a given function inthe region of given initial arguments.

• Maximum entropy method - a technique used to extend available data whilst introducingminimal new information.

Specifically, we shall develop a program to arbitrarily extend experimentally observed diffraction data in order to facilitate transformation into real space via the Fourier transform.

Experimental measurements of a function of interest are typically limited to measuring over afinite range. In many cases, an interesting or important property can be represented in termsof the function over an infinite range. Diffraction experiments are one example of this.

201

202

S(k)2

1.75

1.5

1.25

1

0.75

0.5

0.25

CHAPTER 10. COMPLETE EXAMPLES

,i!~:, .

.. . A'. '. f\

;~!\~~/~.• : : i;V \JI.

!.5 10

Figure 10.1: Experimentally measured static structure factor S(k) of amorphous siliconmeasured over a finite range 0.424 < k < 23.001 in a neutron-diffraction experiment [13].

F(k)

3

-2

Figure 10.2: Reduced static structure factor F(k) = k(S(k) - 1) interpolated to F(O) = 0for k < 1q and clamped to F(k) = 0 for k > ku .

10.1.1 Formulation

In a diffraction experiment, the scattering of incident waves diffracted by a material is measured as a function of the wavelength of the incident waves (illustrated in figure 10.1). Themeasured function, known as the static structure factor S(k) where k is the wavelength, mayonly be measured over a finite range of wavelengths 1q ::; k ::; ku .

When diffracting neutrons, for example, the lower limit 1q is determined by experimentalerrors due to slow moving neutrons and the upper limit ku is determined by the maximummomentum which can be imparted to a neutron. However, the real-space radial distributionfunction g(r), which conveys information about the atomic structure on length scales r aroundlA, is related to S(k) by a Fourier sine transform over all k 2 0 [14]:

1 100

g(r) = 1 + -4- (S(k) - l)ksin(kr)dk7rpor 0

In the remainder of this discussion we shall concentrate on the treatment of the subexpressionF(k) = k(S(k) - 1), known as the reduced static structure factor.

The missing data for 0 ::;. k < 1q and k > ku may be treated in several different ways. Themost naIve approach is to interpolate F(k) to F(O) = 0 for k < 1q and truncate k > ku by

10.1. MAXIMUM ENTROPY METHOD 203

setting F(k) = 0 in this range (illustrated in figure 10.2). The interpolation is typically veryreasonable, thanks to the linearity of the function in this region. However, the truncation isa poor approximation to the true signal, which is expected to continue oscillating to muchhigher k. Despite the fact that this truncation introduces severe oscillations when Fouriertransformed, this approach is commonly used in practice as more appropriate approaches areregarded as being too difficult to implement. One such approach is the Maximum EntropyMethod (MEM), which we shall now describe and, amazingly, implement as a little OCamlprogram.

The MEM regards observed data as constants and extends these data by adding variables.The values of these variables are then determined by maximising a suitably chosen measure ofentropy with respect to the variables. In practice, the Shannon entropy of the discrete Fourierpower spectrum is often used as the measure of entropy.

Thus, the reduced static structure factor shown in figure 10.2 may be objectively extended toarbitrarily higher k using the maximum entropy method. As F(k) appears in a Fourier sinetransform, we shall use the Shannon entropy of the sine transform.

10.1.2 Implementation

Our program is split into a lexer, a parser and the main program which performs the corecomputation.

10.1.2.1 Lexer

We begin by defining a lexer "mem_lexer.mll". This lexer is based upon the parser "mem_parser.mly"and tracks the current line number in order to provide helpful error messages for unexpectedinput:

{

open Mem_parserlet line = ref 1

}

The lexer must be able to handle signed floating point numbers or integers. Thus we definethe necessary regular expressions:

let digit = [ '0' -' 9' Jlet mantissa = digit+ ' .' digit* I digit* '.' digi t+let exponent = [ , e ' , E' J [ , +' , -' J digit+let floating = [, +' , -' J? mantissa exponent?let integer = ['+' '-'J? digit+

The lexer contains a single rule which ignores whitespace, counts new lines and lexes curlybraces, commas and numbers (which are treated as floating-point numbers):

{ FLOAT(float_of_string (Lexing .lexeme lexbuf)) }{ EOF }{ failwith ("Mistake at line n-string_of_int !line) }

204 CHAPTER 10. COMPLETE EXAMPLES

rule token = parse[' , '\t'J {token lexbuf}

I '\n' { iner line; token lexbuf }I ,{, { OPEN}

I '}' { CLOSE}I ',' { COMMA }I floatingI integerI eofI _

As usual, the tokens used in the lexer are defined in and used by the parser.

10.1.2.2 Parser

The parser ''mem_parser.mly'' simply reads a list of comma separated numbers enclosed incurly braces. The EOF, OPEN, CLOSE, COMMA and FLOAT tokens produced by the lexer aredeclared first:

%token EOF OPEN CLOSE COMMA%token <float> FLOAT

The main rule of the parser simply returns a float list:

%start main%type <float list> main

%%

The parser contains only two rules. The recursive tail rule interprets the remainder of thelist, finishing with a close brace:

tail:I FLOAT COMMA tail

{ $1 :: $3 }I FLOAT CLOSE

{ [$lJ };

The main rule interprets a whole list, beginning with an open brace:

main:I OPEN tail EOF

{ $2 }I OPEN CLOSE EOF

{ [J };

The result produced by this simple parser is then ready to be analyzed by the main program.

10.1. MAXIMUM ENTROPY METHOD


205

The program implementing the maximum entropy method uses the FFTW library (describedin section 8.5) to perform the discrete Fourier transforms. The OCaml bindings to this libraryrepresent data as big arrays (described in section 8.3). Thus, we begin by opening the namespace of the Bigarray module in order to access its members without having to prefix themwith Bigarray. each time:

open Bigarray

We shall use the square root of the machine epsilon to determine the accuracy required by thelocal minimisation algorithm:

let delta = sqrt epsilon_float

A map2 function over arrays will also be used:

let array_map2 f a b =

let len = Array.length a inif len <> Array . length b then invalid_arg "array_map2";Array.init len (fun i -> f a. (i) b. (i))

The local function minimization algorithm, which will be applied to the cost function f(x),requires a function to compute \7xf. This is most simply achieved by computing:

(\7xfh = f(y(k)) - f(x)

8

where:

{Xi i =1= k

Yi (k) = Xi + 8 i = k

The may be implemented by the following numerical grad function:

(* Numerical approximation to the gradient of "f" at "x". *)let n_grad f x =

let n = Array. length x inlet f_x = f x inlet f' = Array. create nO. infor i = 0 to n - 1 do

let old_x_i = x.(i) inx. (i) <- x. (i) +. delta;f' . (i) <- (f x -. f_x) ;. delta;x. (i) <- old_x_i;done;

f'


Note that the old value of Xi is stored, rather than trying to recompute it using the expressionXi + 0 - 0 which would be prone to numerical error.

The local function minimization can be performed using the gradient descent algorithm. Thisalgorithm repeatedly tries to step in the opposite direction to the grad by an amount A:

xn+I = Xn - X'Vxf

If f(Xn+l) < f(x n ) then the step is accepted and the step size A is increased slightly. Iff(Xn+l) 2. f(x n ) then the step is not accepted and the step size A is greatly reduced. Inparticular, steps which do not alter f(x) give f(Xn+l) = f(xn ) to within machine precisionand are rejected. Consequently, when x is as close to a local minimum as possible, smallproposed steps will not alter f(x) and A will be reduced rapidly. The algorithm may thenterminate.

The following function implements this algorithm:

(* Gradient-ascent local-minimisation algorithm *)

let grad_descent f f' x =

let rec aux lambda x f _x =

if lambda < delta then x elselet new_x = array_map2 (fun x d -) x +. lambda *. d) x (f' x) inlet f_new_x = f new_x inif f _new_x )= f x then aux (0.5 *. lambda) x f _x elseaux (1.1 *. lambda) new_x in

aux delta x (f x)

Note that the value of f(xn ) is passed as an argument to the nested auxiliary function aux,thus avoiding the need to recompute f(xn ) each iteration.

The gradient descent and numerical grad functions may be combined to produce a higherorder n_grad_descent function which will minimize the given function f using numericalapproximations to the grad '\Jxf:

(* Gradient ascent using numerical gradient. *)let n_grad_ascent f = grad_descent f (n_grad f)

As described in section 8.5, a fourier function, to compute the discrete Fourier transformof an array using the FFTW library, may be written by converting to and from big arrayformats:

(* Fast Fourier Transform. *)let (fourier, ifourier) =

let to_big a =

let big_a = Array1. create Fftw. complex c_layout (Array. length a) inArray. i teri (fun i z -) Array1. set big_a i z) a;big_a in

let of _big big3 =

Array.init (Array1.dim big_a) (fun i -) Array1.get big_a i) inlet fft norm dir a =

let (n, big_a) = (Array. length a, to_big a) inof_big (Fftw.create dir n big_a) in

(fft false Fftw,forward, fft true Fftw.backward)

10.1. MAXIMUM ENTROPY METHOD 207

The discrete Fourier sine transform of an arrayl x = {O, Xl, ... , Xn-l} may be computed usingthe fourier function by transforming to a double-length array with odd symmetry:

y = {O, YI, . .. ,Y2n-l} = {O,XI, ... ,xn-l, 0, -Xn-l,··· ,-Xl}

The Fourier sine transform y can then be extracted as the first half of the discrete Fouriertransform y of y:

ih = Yi tj i E {O ... n -l}

Thus, the Fourier sine transform is implemented by the following function:

(* Fourier Sine Transform in terms of FFT. *)

let fist a =

let n = Array.length a inlet aux i = {Complex.re = a.(i); im = O. } inlet aux i =

if i = 0 I I i = n then Complex. zero elseif i < n then aux i elseComplex.neg (aux (2 * n - i)) in

let b = Array. init (2 * n) aux inlet b = fourier b inArray.init n (fun i -> b.(i).Complex.im)

The Shannon entropy H(x) is conventionally defined for a probability distribution x as:

H(x) = - L Xi InXii

where x is assumed to be normalised such that:

In order to compute the Shannon entropy of an unnormalised distribution, such as Yi, wemust account for the normalisation, giving:

H(x) = In (~Xi) -~XilnXi

This is most easily computed by accumulating the two sums simultaneously:

(* Compute the Shannon entropy of the constants and variables. *)let entropy consts vars =

let a = fist (Array. append consts vars) inlet aux (s, h) x =

let x = abs_float x inif x < delta then (0. , 0.) else(s+. x, h+. X*. log x) in

let s, h = Array. fold_left aux (0., 0.) a inlog s -. h

1 As we are analysing real-valued functions related by the Fourier sine transform, the first element (whichrepresents zero frequency) is always zero.


The main body of the program begins by parsing the command-line arguments in order toobtain the desired length n to which the input data is to be extended:

let _ =(* Parse command-line arguments. *)

let n =let i ters = ref [] inArg.parse [] (fun s -> iters := s .. liters) "mem <n>";match liters with

[n] -> int_of_string nI _ -> invalid_arg "Usage: mem <n>" in

The data themselves are then loaded using the parser and converted into array form:

(* Load the experimental data as the constants. *)let consts =

Mem_parser .main Mem_lexer. token (Lexing. from_channel stdin) inlet consts = Array. of_list consts in

The number of input data provided is referred to as i and a check is performed to ensure thati < n:

let i = Array.length consts inif n <= i then invalid_arg tIn too small";

The new variables vars are initialised to zero before being determined by locally maximizingthe Shannon entropy with respect to the variable values:

(* Locally maximise entropy. *)

let vars =let vars = Array. init (n - i) (fun _ -> 0.) inn_grad_descent (entropy consts) vars in

Finally, the resulting data are output in the same form as the input:

(* Output extended data. *)let out = Array. to_list (Array. append consts vars) inlet out = List .map string_of_float out inprint_endline ("{"-(String.concat ", " out)-"}")

In the interests of efficiency, the lexer, parser and main program should be compiled intonative code before being executed.


This program, implementing the maximum entropy method approach to the extension ofexperimentally observed diffraction data, may be compiled using:

10.1. MAXIMUM ENTROPY METHOD

F(k)

3

209

2

-1

k

-2

Figure 10.3: Reduced static structure factor F(k) extended to k ~ 50.

$ ocamllex mem_lexer.mll17 states, 375 transitions, table size 1602 bytes$ ocamlyacc mem_parser.mly$ ocamlopt -c mem_parser.mli$ ocamlopt -I +fftw -cclib -lfftw_stub bigarray.cmxa fftw.cmxa mem_lexer.mlmem_parser.ml mem.ml -0 mem

The resulting executable mem may then be used to extend diffraction data.

10.1.2.5 Results

The mem program may be used to extend the experimentally observed data shown in figure 10.2.The number of samples can be extended2 from 910 to 2,048 in under 4 hours, the result ofwhich is illustrated in figure 10.3.

The ability of the maximum entropy method to extend such data is almost magical.

10.1.2.6 Optimisation

As this program spends most of its time computing FFTs, it is ideally written in a languagesuch as OCaml where FFTs are easily computed and the important, but not performancecritical, remainder of the program may be written clearly and succinctly. However, there isstill room for optimisation.

In order to optimise this program, we must first consider improvements to the asymptotic complexity. This is tricky because the complexity of the gradient descent algorithm is unknown.However, two potential improvements spring to mind:

• Use previous x n , f(xn ) and [\i'f] (xn ). For example, by using another approach to localfunction minimization such as the conjugate gradient algorithm.

2We chose to extend to an integral power of two 211 = 2048 number of samples because the FFT is mostefficient when applied to products of small primes.


• Use the correlation between the values of nearby Xi. For example, by partially solvingthe problem for a subset of the variables and reusing the partial solution to createprogressively more complete solutions. This could be done by interpolating the valuesof missing variables.

Having introduced a local function-minimization algorithm (gradient descent) in this section,we shall now examine the topic of global function minimization.

10.2 Global minimization

Finding a deeper, more global minimum of an arbitrary function is a significantly more challenging problem than local function minimization, and is also extremely important in thecontext of scientific computing. Several different global function minimization algorithmsexist, many of which make repeated use of local function-minimization algorithms.

Simulated annealing is one such algorithm. This is a Monte-Carlo approach3 which considersa randomly altered set of parameter values x E nand accepts or probabilistically rejects theproposed change based upon the increase E E lR in f(x), fEn -t R If the change does notincrease f(x) (=? E ~ 0) then the change is always accepted. If the change increases f(x)(=? E > 0) then the change is randomly accepted with a probability P(E) = e-(3E for some{3 E R This probabilistic process is repeated many times with progressively larger values for{3.

As the value of {3 is analogous to (kBT)-l in thermodynamics, {3 may be considered to bethe inverse of a fictitious temperature. Increasing the value of {3 as the simulation progressestherefore corresponds to cooling the system, hence the name simulated annealing. As thefictitious temperature falls, proposed changes which increase the energy of the system areprogressively less likely to be taken, and the system tends to fall into local minima.

Unlike our local function minimization example, we shall use a discrete problem to demonstrate global function minimization. Both discrete and continuous minimization problems arecommonplace in science. In particular, the task of annealing a real system of atoms may beconsidered either continuously (in terms of the vector coordinates of the atoms r E (lR3)n [6])or discretely (in terms of the nearest neighbour topology {N1 .. . Nn } [15]). Discrete, globalminimization problems also appear outside science. For example, in complicated routing problems such as integrated circuits and printed circuit boards in electronics.

We shall address the d-dimensional travelling salesman problem, defined by a list of vertexcoordinates r E (lRd)n. The task is to find the route which traverses each vertex in the graphexactly once and has the shortest length. For a path P, defined as a list of vertex indicesPEnn where n= {1 ... n} for some n> 1, the path length l(P) is:

n-l

l(P) = L IrPi - rpi+l Ii==l

In theory, this problem may be solved exactly by considering all permutations of PEnn andfinding the shortest length permutation. However, the number of permutations is n! which,

3Known as mndomised algorithms in computer science.

10.2. GLOBAL MINIMIZATION

OldPath

NewPath

211

- Removed

- Inserted

Figure 10.4: Reducing the length of a path by swapping the order of a pair (i, i + 1) ofadjacent vertices in the path P, showing: a) removed edges (red), and b) inserted edges(blue).

even for reasonably small problems, is too vast to compute explicitly. Simulated annealingprovides a practical solution to this problem by relinquishing exactness in favour of an approximate solution.

10.2.1 The mutate function

The practical solution to this problem, using simulated annealing, requires an auxiliary function to mutate a given path. The capabilities of this function are very important. Indeed,improvements to the mutate function are often more productive than any of the possible wholeprogram optimisations, even the more important low-level optimisations such as deforesting.In fact, improvements to the mutate function are likely to fall into the category of algorithmic optimisations but this is difficult to prove rigorously as the complexities of the travellingsalesman problem are difficult to quantify in sufficient detail.

Understanding the ways in which the mutate function can be improved requires a deeperknowledge of the mechanism by which simulated annealing finds possible solutions. Althoughwe shall concentrate on solving travelling salesman problem, the points made are equallyapplicable to other applications of simulated annealing, including continuous optimisationproblems.

Essentially, simulated annealing improves upon the naIve approach of trying randomly selectedpaths P by evolving the path gradually. In order to evolve the path gradually, simulatedannealing must use a function which mutates a path, only makes small alterations to thepath. However, the meaning of the phrase "small alterations to the path" is not obvious.Intuitively, this might mean introducing mutations which leave most of the Pi unaltered. Forexample, by altering only two vertex indices i and j in P at a time, by swapping Pi, with Pj(illustrated in figure lOA). In theory, all possible paths, Le. permutations, can be reached byrepeatedly swapping pairs of elements in P.

In fact, this intuitive picture is horribly misleading. The phrase "small alterations to thepath" actually relates to exchanging small numbers of edges, Le. altering m edges at a timewhere m(n) is 0(1). Thus, swapping pairs of adjacent vertices is only one of several ways toexchange edges in the path. Several other forms of mutation are also possible, all of whichmay be written in terms of O(n) adjacent-vertex swaps, Le. these mutations are asymptoticallyfaster:

212

OldPath


NewPath

•- Removed

- Inserted

Figure 10.5: Reducing the length ofa path by rotating the path P by one vertex, showing:a) 1 removed edge (red), and b) 1 inserted edge (blue).

OldPath

•

NewPath

Pj - Removed

- Inserted

Figure 10.6: Reducing the length of a path by reversing the order of indices between anarbitrary pair (i,j) in the path P, showing: a) 2 removed edges (red), and b) 2 insertededges (blue).

OldPath

IntermediatePath

<I..,:( ...••]

.........

NewPath

ff--+---' Pj

- Removed

....... Intermediate

- Inserted

Figure 10.7: Reducing the length of a path by moving a vertex from ito j in the path P,showing: a) 3 removed edges (red), and b) 3 inserted edges (blue).


• Rotation:

{PI, ,Pn}

-4 {PI+i, ,Pn,PI",.,Pn-i}

• Reversal:

{PI, ,Pi, ... ,Pj , ,Pn}

-4 {PI, ,~-I,Pj, ,~,Pj+l,"" Pn}

• Splice:

{PI, ,Pi, ... ,Pj, ,Pk, Pk+Il···, Pn}

-4 {PI, ,Pi-I, Pj+Il ,Pk,~, ... ,Pj, Pk+l,'" ,Pn}

213

We shall now define a program capable of approximating the solution to the travelling salesmanproblem for arbitrary n and d using the rotation, reverse and splice mutations.

10.2.2 Efficiency

We shall use an unconventional implementation of this algorithm which is asymptoticallyfaster than the conventional, array-based implementation presented in most monographs [12].Thus, before describing our implementation we shall review the conventional approach.

The rotate, reverse and splice functions are conventionally implemented by altering thearray of indices used to represent the path. For randomly chosen mutations, the number ofaltered vertex indices is O(n) where n is the number of vertices on the path. As these O(n)operations form the bottleneck of the whole algorithm they are conventionally optimised by:

• Deforesting - rather than creating new arrays, the old arrays are altered in-place byswapping elements.

• Premature termination - the cost of a proposed mutation is calculated in 0(1) timein terms of the lengths of added and removed edges. Only accepted mutations are thenperformed explicitly.

However, these low-level optimisations do not improve the asymptotic complexity.

In theory, the rotate, reverse and splice functions only alter 0(1) edges in the path. Thus, abetter implementation should be able to improve upon the O(n) complexity of the conventionalapproach. We shall use the simplest improvement of representing the path implicitly, as anarray of vertex indices and a composite indirection function. This approach begins by storingthe array explicitly and indirecting through the identity function. A mutation is representedby:

• The change in path length.

214

• An indirection function f II ---* II (llnecessary.


{O ... n - 1}) which reorders the indices as

x=vn

If a mutation is accepted then the path is altered implicitly by composing f with the currentindirection function 9 to give [f 0 g] (i) = f (g(i)) : II ---* ll.

When k indirections have been accumulated, the cost of vertex lookups in the path becomesO(k) rather than 0(1). Thus, the number of indirections must be controlled in order to obtaingood performance.

Indirections can be removed by explicitly generating a new array PI = Pg(i) and replacing thecurrent indirection function 9 with the identity function. We shall refer to this as flattening.As this requires copying each vertex index, flattening is O(nk). If the algorithm flattenswhen k ?: x for some unknown x E JR., the average complexity of flattening is O(nk/x). Theaverage complexity of the O(x) indirections between flattens is O(xk). Thus, the asymptoticcomplexity of this algorithm is optimal when:

nk =xkx

Therefore, we shall flatten whenever k ?: vn, giving the mutate function O(vn) asymptoticcomplexity.


This implementation is split into a lexer, parser and main program.

10.2.3.1 Lexer

We begin by defining a lexer and parser to load a description of the problem. The lexer,described by the file "salesman_lexer.mll", understands floating-point numbers and begins byopening the namespace of the parser and initialising the current line number to one:

{

open Salesman_parserlet line = ref 1

}

Regular expressions matching integer and floating point types are then defined before thelexing rule:

let digit = [ '0'-'9' ]let exponent = ['e' 'E' ] [ '+' '-' ] digit+let floating = (digit+ '.' digiu I digiU '.' digit+) exponent?

The lexer contains a single lexing rule which ignores whitespace, emits CR tokens for new lines,FLOAT tokens for floating-point numbers (in either usual or integer notation) and an EOF tokenat the end of the input:


rule token = parse[' , '\t 'J {token lexbuf }'\n' { iner line; CR }floating[' 0' - '9' J+ { FLOAT(float_of_string (Lexing .lexeme lexbuf)) }eof { EOF }

{ failwith ("Mistake at line "-string_of_int !line) }

As usual, the tokens used in the lexer are defined in and used by the parser.

10.2.3.2 Parser

215

The parser, described by the file "salesman_parser.mll", comprehends a list of vectors as linesof space-separated floating-point numbers. The CR, FLOAT and EOF tokens are declared first:

%token CR EOF%token <float> FLOAT

This is followed by the definition of the entry point main and its expected type:

%start main%type <float list list> main

%%

The parser uses two rules to parse input. The list rule reads a list of whitespace-separatedfloating-point numbers ending with a new line:

list:I FLOAT list

{ $1 :: $2}CR

{ [J };

The main rule reads a list of these lists, ending with EOF:

main:I list main

{ $1 :: $2}I EOF

{ [J };

The result produced by this parser is then ready to be analyzed by the main program.



The main program, in the file "salesman.ml", begins with the definitions of infix operators4 toperform vector arithmetic:

let ( +1 ) = List.map2 (+. )let ( - 1 ) = List. map2 ( -. )let dot = List. fold_left2 (fun dab -> d +. a *. b) O.let length a = sqrt (dot a a)

A helper function is then defined:

(* (i in {a .. n-1}, j in {O .. n-2}) -> k <> i in {a .. n-1} *)

let including i j = if j >= i then j + 1 else j

The including function is intended to act upon values i E {O ... n - 1} and j E {O ... n - 2}to produce a value k =J. i E {O ... n - 1}.

Two helper functions for dealing with random values of type int are then defined:

(* 0 -> 0 1 n -> k in {a .. n-1} *)

let rand = function 0 -> 0 1 n -> Random. int n(* (i, n) -> k <> i in {a .. n-1} *)let rand_except i n = including i (rand (n - 1»

The rand function is an alias to the Random. int function which, given n, returns a value oftype int in the range {O ... n - 1}. The rand_except function produces a random numberk=j:iE {O .... n-1}.

The main body of the program is then defined in the usual construct:

let

The command-line arguments are then parsed to extract the required number of iterations:

(* Extract the number of iterations as a command-line argument. *)let iters =

let i ters = ref [J inArg.parse [J (fun s -> iters:= s .. liters) "salesman <iters>";match! i ters with

[iJ -> int_oCstring i1 _ -> invalid_arg "Usage: salesman <iters>" in

The input list of vertex coordinates is then parsed from standard input:

(* Parse the vertex coordinates from stdin. *)let vert_coords =

let lexbuf = Lexing. from_channel stdin intry Salesman_parser.main Salesman_lexer.token lexbuf

4Infix operator definitions are discussed in section A.3 of the appendices.

10.2. GLOBAL MINIMIZATION 217

Parsing errors raise the Parsing.Parse_error exception which is caught and handled bynaming the line number on which the error was noticed:

with Parsing.Parse_error ->let line = string_of _int (! Salesman_lexer . line) infailwith ("Syntax error at line "-line);exit 1 in

The list of vertex coordinates vert_coords is then converted from a float list list into afloat list array:

let vert_coords = Array. of_list vert_coords in

The number of vertices is denoted n:

let n = Array.length vert_coords in

The path is initialised to {O ... n - 1} which, for randomly-ordered input, is a random path:

let path = Array. init n (fun i -> i) in

The number of indirections through rotate, reverse or splice is zero and the accumulatedindirection function is the identity function:

let indirects, indirect = ref 0, ref (fun i -> i) in

A function to get the indirected i th vertex on the path:

let get i = path. (! indirect i) in

An edge_length function to calculate the distance between a given pair of vertices on thepath, returning zero if either vertex is invalid:

(* Calculate separation of vertices i and j. *)let edge i j =

if i>=O && i <n && j >=0 && j <n thenlength (vert_coords.(get i) -I vert_coords. (get j))

else O. in

A rotate_cost function to calculate the change in path length due to a proposed rotation:

let rotate_cost i =(edge (n - 1) 0) -. (edge (i - 1) i) in

The rotate function indirects array lookups to k using modulo arithmetic:

218

(* Rotate by i in {1 .. n-l}. *)

let rotate i k = (k + i) mod n in


A reverse_cost function to calculate the change in path length due to a proposed reversal:

let reverse_cost i j =

let edge i j =

if i >= 0 && i < n && j >= 0 && j < n then edge i j else O. inedge (i -1) j +. edge i (j + 1) -. edge (i -1) i -. edge j (j + 1) in

The reverse function indirects array lookups to k, reversing the order of vertices between iand j inclusive:

(* Reverse indices [i .. j]. *)let reverse i j k =

if i <= k && k <= j then j + i - k else k in

The splice function is more sophisticated than the rotate and reverse functions. Indicesinto the intermediate form (see figure 10.7) are first indirected through a function f. Indicesinto the final form are indirected through a function 9 which indirects through f if required:

f(k)

g(k)

{k k< ik+l+1 i~k

{

f(k) k ~ jk-j+i j<k~j+l

f(k-l-1) j+l<k

The f(k) function, also a function of 1 and i in the program, is productively factored fromsplice_cost and splice:

let f 1 i k = if k < i then k else k + 1 + 1 in

The splice_cost function calculates the change in path length due to a proposed splice:

let splice_cost 1 i j =let j 1, j2 = f 1 i (j - 1), f 1 i j inedge j 1 i +. edge (i + 1) j 2 +. edge (i - 1) (i + 1 + 1)

-. edge (i - 1) i -. edge (i + 1) (i + 1 + 1) -. edge j1 j2 in

The splice function is easily implemented in OCaml:

(* Splice indices from [i, i+l] into {a .. j-1, ... , j .. n-1}. *)

let splice 1 i j k =if k < j then f 1 i k elseif k <= j + 1 then k - j + i elsef 1 i (k - 1 - 1) in

10.2. GLOBAL MINIMIZATION 219

The mutate function randomly chooses to use either rotate, reverse or splice, passingrandomly generated arguments and returning a 2-tuple of the change in path length and theindirection function:

(* Randomly rotate, reverse or splice. *)let mutate 0 =

match rand 3 witho ->

let i = 1 + rand (n - 1) inrotate_cost i, rotate i

1 ->let i = rand n in let j = rand_except i n inlet i, j = min i j, max i j inreverse_cost i j, reverse i j

->let 1 = rand (n - 1) inlet i = rand (n - 1) inlet j = rand_except i (n - 1 - 1) insplice_cost 1 i j, splice 1 i j in

A path_length function to compute the total length of a path simply loops through the edges,accumulating the edge lengths:

(* Compute the length of the given path. *)let path_length path =

if n < 2 then O. else beginlet len = ref O. infor i=O to n - 2 do

len := !len +. edge i (i + 1)done;!len

end in

The shortest path and its length are stored as a mutable 2-tuple, initialised to the initial pathand its length:

let ~horte~t - ref (!path, path_length !path) in

Some subjective algorithm is required to decrease the fictitious temperature as the simulation progresses. We choose to decrease the temperature exponentially, increasing the inversetemperature {3 from to to tl:

(* Initial and final inverse fictional temperatures, beta. *)let to = 100. and tl = 10000. inlet beta = ref to in

This can be achieved by multiplying {3 by:

where p is the number of iterations, called i ters in this program, giving:


let delta = (tl I. to) ** (1. I. floaCof_int iters) in

The program then loops though i ters iterations:

for i = 1 to iters do

At each iteration, {3 is increased by multiplying by <5:

(* Lower the temperature. *)beta := !beta *. delta;

A mutation to the path is proposed, returning the change in path length delta_E and anindirection function f:

let delta_E, f = mutate () in

The mutation is accepted if it shortens the path or probabilistically accepted if it lengthensthe path:

if delta_E < O. I I Random. float 1. < exp (-. !beta *. delta_E) thenbegin

The record of the current path length is updated.

len := !len +. delta_E;

The number of indirections is incremented:

incr indirects;

The new indirection function indirect is obtained by applying the new mutate function f toa given index i before applying the current indirect5:

indirect: = let g = ! indirect in fun i -> g (f i)end;

If the new path is believed to be the shortest path so far then the path length is computedexplicitly, to remove any accumulated errors:

if ! len < snd ! shortest then len := path_length 0 ;

If the new path really is the shortest path so far then the mutable 2-tuple shortest is updatedto contain a copy of the path and its length:

5Note that! indirect has been factored out as this must be evaluated now. Delaying the evaluation of thissubexpression would result in the !indirect function executing itself and, therefore, looping indefinitely.


if ! len < snd ! shortest then(* Accept the shortened path. *)shortest := (Array.copy path, !len);

221

In order to be user friendly, the current shortest path is printed to stderr whenever thecurrent iteration number i satisfies log2 i E Z:

if 0 = i land (i - 1) then beginlet len = string_of _float (snd !shortest) inoutput_string stderr ("Length = "-len-"\n");flush stderr;

end;

If the current number of indirections exceeds ..;n then the implicit indirections are flattenedby replacing the array representation of the path and implicit indirections by an explicit copyof the path, zero indirections and an identity indirection function:

if ! indirects * ! indirects >= n then beginArray.blit (Array.init n (fun i -> get i)) 0 path 0 n;indirects := 0;indirect := (fun i -> i);

end;done;

In the interests of efficiency, the lexer, parser and main program should be compiled intonative code.


This program, which implements the simulated annealing approach to the travelling salesmanproblem, may be compiled using:

$ ocamllex salesman_lexer.mll12 states, 322 transitions, table size 1360 bytes$ ocamlyacc salesman_parser.mly$ ocamlopt -c salesman_parser.mli$ ocamlopt unix.cmxa salesman_parser.ml salesman_lexer.ml salesman.ml -0

salesman

For input in a file "verts.dat" of the form:

0.140791689359 0.5827513663060.410471260708 0.3300209208060.126148465548 0.315248400694

The salesman program may then be used to find a short path, stored in "path.dat", betweenthe vertices by performing, say, 104 iterations:


$ ./salesman 10000 <verts.dat >path.datLength = 26.7949782142Length = 26.7949782142Length = 25.5792771459Length = 25.435009196Length = 23.2092782558Length = 19.5810248816Length = 15.6036848984Length = 13.4418641956

We shall now use this program to find short paths in a randomly generated array of vertices.

10.2.3.5 Results

The shortest paths found after various numbers of iterations of the full simulated annealingimplementation are shown in figure 10.8.

Measuring the number of iterations and shortest path found (illustrated in figure 10.9), themore capable mutate function is clearly substantially more efficient.

10.3 Finding nth-nearest neighbours

In this section, we shall develop a program which uses some operations required by advancedscientific programs:

• Set-theoretic operations: union, intersection and difference.

• Graph-theoretic operations: nth-nearest neighbours.

The graph-theoretic problem of finding the nth-nearest neighbours allows useful topologicalinformation to be gathered from many forms of data produced by other scientific computations.For example, in the case of simulated atomic structures, where topological information canaid the interpretation of experimental results when trying to understand molecular structure.Such topological information can also be used indirectly, in the computation of interestingproperties such as the decomposition of correlation functions over neighbour shells [16], andshortest-path ring statistics [17].

We shall describe our unconventional formulation of the problem of computing the nth-nearestneighbours of atoms in an atomic structure simulated under periodic boundary conditionsbefore describing a program for solving this problem and presenting demonstrative results.

10.3.1 Formulation

The notion of the nth-nearest neighbours Nt of a vertex i in a graph is rigorously defined bya recurrence relation based upon the set of first nearest neighbours ~1 == Hi of any atom i:

n=On=1

n22

10.3. FINDING NTH_NEAREST NEIGHBOURS 223

a)

c)

e)

b)

d)

f)

Figure 10.8: Shortest paths found using simulated annealing after: a) 104 , b) 105 , c) 106,

d) 107, and e) 108 , and f) 109 iterations.

224

1092 (I-Imin )

10 ...~8

6

4

2

5-2

-4


Figure 10.9: Performance of different mutate functions as number of iterations i vsexcess path length l - lmin over the minimum path length lmin found, for: a) swappingpairs of vertices only (blue), and b) rotate, reverse and splice (red).

As a recurrence relation, this computational task naturally lends itself to recursion. As thisrecurrence relation only makes use of the set-theoretic operations union and difference, thedata structure manipulated by the recursive function is most naturally a set (described insection 3.4).

In order to develop a useful scientific program, we shall use an infinite graph to representthe topology of a d-dimensional crystal, Le. a periodic tiling. Computer simulations of noncrystalline materials are typically formulated as a crystal with the largest possible unit cell,known as the 8upercell. Conventional approaches to the analysis of these structures referenceatoms by their index i E Jr ={1. " N} within the origin supercell. Edges in the graph representing bonded pairs of atoms in different cells are then handled by treating displacementsmodulo the supercell (illustrated in figure 10.10). However, this conventional approach is wellknown to be flawed when applied to insufficiently large supercells [17, 6], requiring erroneousresults to be identified and weeded out manually.

Instead, we shall choose to reference atoms by an index i = (io , ii) where io E Zd and ii E Jr.This explicitly includes the offset io of the supercell as well as the index ii within the supercell(illustrated in figure 10.11). Neighbouring vertices in the graph representing the topology aredefined not only by the index of the neighbouring vertex but also by the supercell containingthis neighbour (assuming the origin vertex to be in the origin supercell at {O}d).


We shall now develop a complete program for computing the nth-nearest neighbours of a givenvertex with index i from a list of lists ri of the indices of the neighbours of each vertex. Webegin by defining a lexer and parser to interpret a file defining the graph-representation ri asa list of space-separated integers on each line.

10.3. FINDING NTH_NEAREST NEIGHBOURS

" , Co 1" 1')" """ .. ,

Q

tl,O},,'

('~~;'~~i :,(O~- i,)'~ ',';' 'f1,:-J1" .,'

() <)?i '-' ~ ~!: ~ "

225

Figure 10.10: Conventionally, atoms are referenced only by their index i E ][ ={1 ... N}within the supercell. Consequently, atoms i, j E J[ at opposite ends of the supercell areconsidered to be bonded.

Figure 10.11: We use an unconventional representation which allows all atoms to beindexed by the supercell they are in as well as their index within the supercell. In thiscase, the pair ofbonded atoms are referenced as ((0,0), ii) and ((1,0), ij), i.e. with i in theorigin supercell (0,0) and j in the supercell with offset (1,0).


10.3.2.1 Lexer

The "nth_lexer.m1l" file, defining the lexer, begins by opening the namespace of the parserand initialising the line number to one:

{

open Nth_parserlet line = ref 1

}

Regular expressions floating and integer are then defined, capable of handling signed numbers:

let digit = [ '0' - '9' Jlet exponent = [ 'e' 'E' J [ '+' '-' J? digit+let floating = [ '+' ,-, J? (digit+ ' . ' digit* I digit*let integer = ['+' '-' J? digit+

, , digit+) exponent?

Lexing uses a single rule (called tOken) which ignores whitespace and new lines and producesOPEN and CLOSE tokens for curly braces, a COMMA token for commas, INT and FLOAT tokens forintegers and floating-point numbers and an EOF token when the end of the input is reached:

rule token = parse[, , '\t'J {token lexbuf }

I '\n' { iner line; token lexbuf }I '{' { OPEN}I ,}' { CLOSE}

I ',' { COMMA }I integer { INT(int_of_string (Lexing.lexeme lexbuf)) }I floating {FLOAT(float_of_string (Lexing.lexeme lexbuf)) }I eof { EOF }I _ { failwith ( lIMistake at line lI-s tring_of_int ! line) }

As usual, the tokens used in the lexer are actually defined in the parser.

10.3.2.2 Parser

The "nth_parser.m1y" file, defining the parser, begins with a header defining the tokens, entrypoint and type returned by the parser:

%token EOF OPEN CLOSE COMMA%token <int> INT%token <float> FLOAT

%start main%type <float list * (float list * (int * int list) list) list> main

%%


The type returned by the parser contains an initial float list representing the vector extentof the periodic supercell, followed by a list of vertex descriptions. Each vertex is described bya float list giving the vertex coordinate and a list of neighbours. Each neighbour is definedby an int index to another vertex and an int list giving the supercell offset.

A comma-separated list of integers enclosed in curly braces is parsed by the int_list andint_list_tail rules:

int_list_tail:INT COMMA int_list_tail

{ $1 :: $3 }INT CLOSE

{ [$1J };

int list:OPEN int_list_tail

{ $2 }

OPEN CLOSE{ [J };

A comma-separated list of numbers enclosed in curly braces is parsed in a similar way usinga generic number rule which interprets floating-point numbers and integers, converting thelatter into floating-point numbers:

number:FLOAT

{ $1 }INT

{ float_oCint $1 };

This number rule is then used to parse lists of numbers:

float_list_tail:I number COMMA float_list_tail

{ $1 :: $3 }I number CLOSE

{ [$1J };

float_list:I OPEN float_list_tail

{ $2 }OPEN CLOSE

{ [J };

Neighbour information is parsed with the supercell offset appearing first, as an int list6,followed by the int index of the neighbour:

neighbour:I OPEN int_list COMMA INT CLOSE

{ $4, $2 };

6Supercell offsets could be represented by arbitrary-precision integers but the int type is more than satisfactory in practice.


The description of a single vertex is parsed by decapitating the vertex coordinate and thenloading the list of neighbours:

vertex_tail:I neighbour COMMA vertex_tail

{ $1 :: $3 }I neighbour CLOSE

{ [$1] };

vertex:I OPEN float_list COMMA OPEN vertex_tail CLOSE{ $2, $5 };

The body of the input contains a list of vertex descriptions:

vertex_list_tail:I vertex COMMA vertex_list_tail

{ $1 :: $3 }I vertex CLOSE

{ [$1] };

vertex_list:I OPEN vertex_list_tail

{ $2 };

Finally, the whole input is parsed as the supercell extent followed by the vertex list:

main:OPEN float list COMMA vertex_list CLOSE EOF

{ $2, $4 };

The result of parsing the input is then ready to be analyzed by the main program.


The "nth.ml" file, containing the main program, may then be written. This begins with thedefinition of a data structure to represent a set of vertices. The key of this set is a vertexdescription, the int index and int list supercell offset:

(* A vertex in a set. *)module VertexKey = struct

type t = int * int listlet compare = compare

end

We could have defined a custom compare function for the VertexKey module but, in theinterests of simplicity, we shall use the built-in polymorphic compare function.

A set of these elements may then be defined as:


(* A set of vertices. *)

module VertexSet = Set. Make (VertexKey)

229

The program then contains three helper functions (similar to those defined in chapter 9) usedto manipulate and act upon lists and sets of integers.

A function to initialise a list:

let list_init n f =

let rec aux 1 = functiono -> 1

I n -> aux (f n :: 1) (n-1) inaux [J n

A function to add a pair of integer lists i o + jo:

let add_i = List .map2 ( + )

The higher-order list_rev_iteri function applies its function argument to each element ofthe given list in turn:

let list_iteri f 1 =

ignore (List.fold_left (funn e ->f n e; n+1) 0 1)

The list_oLset and set_oLlist functions to convert between lists and sets of vertices:

let list_of_set s = VertexSet.fold (fun e 1 -> e: :1) s [Jlet set_of_list 1 =

List. fold_left (fun s e -> VertexSet. add e s) VertexSet. empty 1

A list_map3 function, equivalent to the List .map2 func(;ion hut. act.ing over three lists simultaneously:

let rec list_map3 f 11 12 13 = match 11, 12, 13 withh1: :t1, h2: :t2, h3: :t3 ->

f h1 h2 h3 :: list_map3 f t1 t2 t3I [J, [J, [J - > [JI _ -> invalid_arg "list_map3"

After these definitions, the main part of the program is nested within the conventional construct:

let

The command-line arguments are parsed (see section 8.1) to extract the values of nand i:


(* Extract the centre-atom index "i" and neighbour shell "n". *)let n, i =

let input = ref [J inArg.parse [J (fun x -> input := x :: !input) "nth_nn <n> ";match! input with

[i; nJ -> int_of_string n, int_of_string iI _ -> invalid_arg "Usage: nth_nn <n> " in

The lexer and parser are then used to interpret the input from stdin, returning a value of thetype int list list:

(* Load the supercell extent, atomic coordinates and bonds. *)let supercell, pos, bonds =

let supercell, atoms =

trylet lexbuf = Lexing.from_channel stdin inNth_parser.main Nth_lexer.token lexbuf

In the event of an error during parsing (due to invalid input) the exception is caught and ahelpful error message generated which includes the line of input at which the error was noticed:

with Parsing. Parse_error ->let line = string_of_int !Nth_lexer . line inprint_endline ("Syntax error at line II-line);exit 1 in

The list is the ideal data structure for loading the data because the number of vertices in thegraph is determined by the size of the input and, therefore, is not already known.

However, the core of the program will be randomly accessing the sets of nearest neighboursfor each vertex, before performing set-theoretic operations on these sets. Thus, the list of listsof vertices may be productively converted into an array of sets of vertices. This is most easilydone by creating arrays and then filling in the vertex coordinate and nearest neighbour set foreach vertex using the list_iteri function:

(* Make an array of atomic coordinates and an array of sets ofnearest-neighbours. *)

let bonds = Array. create (List . length atoms) VertexSet. emptyand pos = Array. create (List . length atoms) [J inlet aux i (r, 1) =

pos. (i) <- r;bonds. (i) <- set_of_list 1 in

list_iteri aux atoms;

This expression then returns the vector extent supercell of the supercell, the array pos ofvertex coordinates and the array bonds of nearest neighbour sets:

supercell , pos, bonds in

Before performing any neighbour computations, we perform some sanity checks on the input:


(* Check dimension consistency. *)

letlet d = List . length supercell in

231

The vertex coordinates are tested to make sure they are all of the same dimensionality as thesupercell:

let test 1 = assert (d = List . length 1) inArray. fold_left (fun 0 -> test) 0 pos;

The neighbour supercell offsets are tested similarly:

let aux 0 1 =VertexSet. fold (fun C, 1) 0 -> test 1) 1 () in

Array. fold_left aux 0 bonds;assert (0 0) supercell in

The core of the program is the nth_nn function which computes the set of nth-nearest neighbours of a vertex i. The nth_nn function is defined as a A-function in order to provide a localhash table called memory:

(* Compute the "n"th nearest neighbours of "i". *)let rec nth_nn =

let memory = Hashtbl. create 1 in

The A-function implementing the nth_nn function accepts two arguments representing nandi. However, i is represented by a 2-tuple giving the int index i and the int list supercelloffset io:

funn (i, io)->

In order to improve performance, the result of the nth_nn function is memoized (described insection A.7) in the hash table memory. Thus, the function begins by checking the hash tablefor a previously computed result, returning a previous result if one was found:

(* Look for a previous result. *)try Hashtbl.find memory (n, i)

If a previous result was not found then the result is computed:

with Not_found -> match n with


The set of oth-nearest neighbours of a vertex i is the singleton set {i}:

o -> VertexSet. singleton (i, io)

The set of 1st-nearest neighbours of a vertex i E {1 ... N} was given in the input and is storedas element i - 1 of the array bonds:

1 ->let nn = bonds. (i - 1) in

If the vertex i is in the origin supercell then the neighbour is in the supercell given by itsoffset:

if io = zero then nn else

Otherwise, the neighbour's offsets jo in the set should be translated by the offset io of thevertex i using the add_ i function:

let aux (j, j 0) s = VertexSet. add (j, add i io j 0) s inVertexSet.fold aux nn VertexSet.empty

The nth-nearest neighbours for n > 1 are given by set-theoretic operations on the sets ~n-2and ~n-l, denoted pprev and prev, respectively:

n ->let pprev = nth_nn (n - 2)(i, io) inlet prev = nth_nn (n - 1) (i, io) in

The union:

can be computed using only t.wo lines of code by folding the union function over the set ~n-l:

let aux j t = VertexSet. union (nth_nn 1 j) t inlet t = VertexSet. fold aux prev VertexSet. empty in

The remainder of the computation involves removing the two previous neighbour shells:

let t = VertexSet. diff (VertexSet. diff t prev) pprev in

Finally, the result is stored in the hash table memory for future reference, before being returned:

Hashtbl.add memory (n, i) t;t in


A function pos_of is then defined to compute a string representation of the vector coordinateof a vertex at the given supercell offset, using the list_map3 function to offset the vertexcoordinate pes. (i-i) by the supercell offset io multiples of the supercell extent supercell:

(* String representation of the coordinate of atom "i" insupercell "io". *)

let pos_of (i, io) =let aux io s r = r +. s *. float_of_int io inlet r = list_map3 aux io supercell pos. (i - i) inlet r = List .map string_of_float r in"{"- (String. concat ", II r) -"}" in

The program ends with the appropriate invocation of the nth_nn function, conversion of theresult to a string and output by printing to stdout:

(* Invoke "nth_nn". *)let nn = list_of_set (nth_nn n (i, zero)) inlet nn = String. concat ", II (List.map pos_of nn) inprint_endline (II{"-nn-II}")

Before the components of this program may be executed, they must be compiled.


The lexer, parser and main program may be compiled into a native code executable nth using:

$ ocamllex nth_lexer.mll17 states, 375 transitions, table size 1602 bytes$ ocamlyacc nth_parser.mly$ ocamlopt -c nth_parser.mli$ ocamlopt unix.cmxa nth_lexer.ml nth_parser.ml nth.ml -0 nth

For appropriate input "cfg", the nth executable may be invoked to find the nth-nearest neighbour of the i th atom to produce a list of the coordinate8 of the neighbours in a file "neighbours.dat" by:

$ . /nth n i <cfg >neighbours. dat

We shall now examine some demonstrative results obtained using this program.

10.3.3 Results

In condensed matter physics, the set of nth-nearest neighbours is often known as the nth_

nearest neighbour shell. This terminology is self-evident from the approximately-sphericalshape of a neighbour shell for n »1. Figure 10.12a shows the shell formed by the 150th_nearest neighbours of a randomly chosen atom in a 105-atom model of amorphous silicon. Incontrast, figure 10.12b shows the icosahedral shell formed by the 150th-nearest neighbours of


Figure 10.12: The 150th-nearest neighbour shells: a) 83,272 neighbours from a 105-atommodel of amorphous silicon [18], and b) 56,252 neighbours from a perfect diamondstructure crystal.

an ideal diamond crystal. These examples are computed in only 10 minutes using the programwe have just presented.

These approximately-spherical shells in amorphous structures have been shown to propagateorder across surprisingly large distances, even in strongly disordered materials, and have beenshown to be responsible for some of the anomalous properties of amorphous materials [19].Of course, these results are considerably more compelling when visualised using real-time 3Dgraphics, e.g. a program similar to that developed in section 6.5.

10.4 Eigen problems

In this section, we shall develop a program which demonstrates two important concepts oftenseen in scientific computing:

• Matrix computations using LAPACK [11], specifically eigenvalue computation.

• Properties of random matrices.

A wide variety of naturally occurring phenomena may be modelled in terms of vectors andmatrices. Most notable, perhaps, is the representation of quantum-mechanical operators asmatrices, the eigenvalues of which are well known to have special importance [20]. Solvingmatrix problems can require various different forms of matrix manipulation, particularly formsof factorization. One prevelant task is the computation of eigenvalues which we shall examinehere.

10.4. EIGEN PROBLEMS 235

An interesting avenue of theoretical research aims to elucidate the properties of random matrices (ensembles of matrices with elements chosen randomly according to defined probabilitydistributions). Thus, we shall now develop a program capable of generating a random matrixand computing its eigenvalues.


As this program only requires the desired extent of the matrix, there is no need for a lexer andparser and the input can be read as a command-line argument instead. Thus, the programconsists of a single "eigen.ml" file. This file begins by opening the namespaces of the Bigarrayand Lacaml .S modules:

open Bigarrayopen Lacaml . S

The Lacaml. S module provides functions for handling matrices using single (32-bit) precisionfloating-point arithmetic.

A function f to create an n x m matrix as an array of arrays, using a given element-generatingfunction, may be written in terms of Array. init:

(* Initialise a matrix as an array of arrays. *)let init_matrix n m f =

Array. init n (fun i -> Array. init m (f i))

The eigenvalues resulting from a solution appear along the diagonal of a 2D big array (bigarrays are discussed in section 8.3). The elements along the diagonal can be extracted as anarray using the function:

(* Extract the diagonal of a 2D big array as an array. *)let array_of_diag m =

let n = Array2. dim1 m inif n <> Array2.dim2 m then invalid_arg "array_of_diag";Array. init n (fun i -> Array2 .get m (i + 1) (i + 1))

The eigenvalues of a matrix can be solved using the geev function [11]. This function alters agiven big array in-place, leaving the eigenvalues along the diagonal. Thus, the eigenvalues ofa matrix may be computing using the function:

(* Compute the eigenvalues of a matrix. *)let eigenvalues m =

let m = Mat. of_array m inignore (geev m) ;array_of_diag m

The main body of the program begins by parsing the command-line arguments to extract thedesired number of rows and columns in the random matrix:


letlet n =

let n = ref [] inArg.parse [] (fun s -> iters := s .. liters) "eigen <n>";match !iterswith

[n] -> nI _ -> invalid_arg "Usage: eigen <n>" in

A random matrix is then generated using the ini t_matrix function and the eigenvalues Akare found using the eigenvalues function:

(* Compute the eigenvalues of a random matrix. *)let f __ = float_of_int (2 * Random. int 2 - 1) inlet m = init_matrix n n f inlet lambda = eigenvalues m in

The resulting eigenvalues are sorted into ascending order and output to stdout:

(* Sort and output the eigenvalues. *)Array.sort compare lambda;let lambda = Array. to_list lambda inlet lambda = List .map string_oLfloat lambda inprint_endline ("{"- (String. concat ", II lambda) -"}")

This program can be used to compute the eigenvalues Ak of a randomly generated densematrix. For example, the eigenvalues of a randomly generated 1024 x 1024 matrix may begenerated and stored in the file "eigenvalues.dat" using:

$ ocamlopt -cclib -11apack2 -I +lacaml bigarray.cmxa lacaml.cmxa eigen.ml -0

eigen$ ./eigen 1024 >eigenvalues.dat

This computation only tal"es 1-2 minutes.

10.4.2 Results

A famous result of random matrix theory is the semi-circle law of eigenvalue densities forrandom n x n matrices from the Gaussian Orthogonal Ensemble (GOE):

P(A) = { ;1f vn - A2 -vn < A < vno otherwise

Although the derivation of the semi-circle law only applies to GOE matrices, the distributionsof the eigenvalues found by this program (for M ij = ±1) are also well approximated by thesemi-circle law (illustrated in figure 10.13).

In fact, matrix computations have given empirical evidence that the semi-circle law is far morewidely applicable than its current derivation would suggest [21, 22].

10.5. DISCRETE WAVELET TRANSFORM

n P(,l)

15..,10

5

-30 -20 -10

"

10

...'. '.

30

237

Figure 10.13: The approximately semi-circular eigenvalue density P(>..) for a dense, random, square matrix Mij = ±1 with n = 1024, showing the prediction of the semi-circlelaw (blue line) and the computed eigenvalue distribution (red dots).

10.5 Discrete wavelet transform

In this section, we shall examine a simple form of wavelet transform known as the Haar wavelettransform. Remarkably, the definition of this transform is more comprehensible when givenas a program, rather than as a mathematical formulation or English description.

The Haar wavelet transform of a length n = 2P P 2: 0 E Z float list is given by the followingfunction:

# let haar 1 =let rec aux 1 s d = match 1, s, d with

[sJ, [J, d -> s :: d[J , s, d -> aux s [J dhi: :h2: :t, s, d -> aux t (hi +. h2 .. s) (hi -. h2 :: d)

I _ -> invalid_arg "haar" inaux 1 [J [J;;

val haar : float list -> float list = <fun>

For example, the Haar wavelet transform of the sequence (1,2,3,4, -4, -3, -2, -1) is the moreredundant sequence (0,20,4,4, -1, -1, -1, -1):

# haar [1. ; 2. ; 3.; 4.; -4.; -3.; -2.; -1. J ; ;-: float list = [0.; 20.; 4.; 4.; -1.; -1.; -1.; -1.J

The aux function, nested inside the haar function, implements the transform by tail recursivelytaking pairs of elements off the input list and prepending the sum and difference of each paironto two internal lists called s and d, respectively. When the input is exhausted, the processis repeated using the list of sums of pairs as the new input. Finally, when the input containsonly a single element, the result is obtained by prepending this element (the total sum) ontothe list of differences. This algorithm is difficult to describe any other way.

The inverse transform may be written:


# let ihaar =let rec aux 1 s d = match 1, s, d with

1, [J, [J -> 1s, [J, d -> aux [J s dt, hi: : s, h2: : d -> aux (0.5 *. (hi +. h2):: 0.5 *. (hi -. h2):: t) s d

I _ -> invalid_arg "ihaar" infunction [J -> [J I s::d -> aux [J [sJ d;;

# ihaar [0.; 20.; 4. ; 4.; -1.; -1.; -1.; -1. J ; ;- : float list = [1.; 2.; 3.; 4.; -4.; -3.; -2.; -1.J

We shall now describe and formulate the fundamentals of wavelet transforms in order to put. these functions into context.

All wavelet transforms consider their input (taken to be a function of time) in terms of oscillating functions (wavelets) which are localised in terms of both time and frequency. Specifically, wavelet transforms compute the inner product of the input with child wavelets whichare translated dilates of a single, mother wavelet. As the mother wavelet is both temporally and spectrally localised, the child wavelets (as dilated translates) are scattered over thetime-frequency plane. Thus, the wavelet transform of a signal simultaneously conveys bothtemporal and spectral content simultaneously. This property is the foundation of the utilityof wavelets.

Discrete wavelet transforms of a length n input restrict the translation and dilation parametersto n discrete values. Typically, the mother wavelet is defined such that the resulting childwavelets form an orthogonal basis. In 1989, Ingrid Daubechies introduced a particularlyelegant construction which allows progressively finer scale child wavelets to be derived via arecurrence relation [23]. This formulation restricts the wavelet to a finite width, a propertyknown as compact support. In particular, the pyramidal algorithm [24, 25] implementingDaubechies' transform (used by the above functions) requires only O(n) time complexity,even faster than the FFT. The Haar wavelet transform is the simplest such wavelet transformand the haar function above implements all of these features.

The Haar wavelet transform is our last example to demonstrate the remarkable expressivenessofOCaml.

Bibliography

[1] E. Chailloux, P. Manoury, and B. Pagano, Developing applications with Objective Caml.Cambridge, England: O'Reilly, 2000.

[2] X. Leroy, D. Doligez, J. Garrigue, D. Remy, and J. Vouillon, The Objective Caml system.2004.

[3] C. A. Gunter and J. C. Mitchell, Theoretical Aspects of Object-Oriented Programming.Boston, MA, USA: MIT Press, 1994.

[4] M. Abadi and L. Cardelli, A Theory of Objects. New York, USA: Springer-Verlag, 1996.

[5] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms.Cambridge, MA, USA.: MIT Press, 200l.

[6] D. Frenkel and B. Smit, Understanding Molecular Simulation from Algorithms to Applications. New York, USA: Academic Press, 1996.

[7] W. Rankin and J. Board, "A portable distributed implementation of the parallel multipoletree algorithm," in IEEE Symposium on High Performance Distributed Computing, (LosAlamitos), pp. 17-22, IEEE Computer Society Press, 1995.

[8] D. E. Knuth, The Art of Computer Programming. Boston, MA, USA: Addison Wesley,1997.

[9] J. Shewchuck, "Adaptive precision floating-point arithmetic and fast robust geometricpredicates," Discrete (3 Computational Geometry, vol. 18, no. 3, pp. 305-363, 1997.

[10] D. Shreiner, M. Woo, J. Neider, and T. Davis, OpenGL Programming Guide: The OfficialGuide to Learning OpenGL, Version 1.4. Harlow, England: Addison Wesley, 2004.

[11] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz,A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users' Guide.Philadelphia, USA: Society for Industrial and Applied Mathematics, third ed., 1999.

[12] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipiesin C The Art of Scientific Computing. Cambridge, UK: Cambridge University Press,1992.

[13] S. Kugler, L. Pusztai, L. Rosta, P. Chieux, and R. Bellissent, "The structure of evaporatedpure amorphous silicon. neutron diffraction and reverse monte carlo investigations," Phys.Rev. B, vol. 48, p. 7685, 1993.

239

240 BIBLIOGRAPHY

[14] J.-P. Hansen and I. R. McDonald, Theory of Simple Fluids. New York, USA: AcademicPress, 1990.

[15] N. Mousseau and G. T. Barkema, "Thaveling through potential energy landscapes ofdisordered materials: the activation-relaxation technique," Phys. Rev. E, vol. 57, p. 2419,1998.

[16] S. R. Elliott, The physics and chemistry of solids. New York, USA: John Wiley & sons,2000.

[17] D. S. Franzblau, "Computation of ring statistics for network models of solids," Phys. Rev.B, vol. 44, no. 10, pp. 4925-4930, 1991.

[18] G. T. Barkema and N. Mousseau, "High-quality continuous random networks," Phys. Rev.B, vol. 62, p. 4985, 2000.

[19] J. D. Harrop, Structural properties of amorphous materials. PhD thesis, CambridgeUniversity, UK, 2004.

[20] S. Gasiorowicz, Quantum physics. London, England: John Wiley and Sons, 2003.

[21] T. A. Brody, J. Flores, J. B. French, P. A. Mello, A. Pandey, and S. S. M. Wong, "Randommatrix physics - spectrum and strength fluctuations," Rev. Mod. Phys., vol. 53, p. 385,1981.

[22] P. A. Lee and T. V. Ramakrishnan, "Disordered electronic systems," Rev. Mod. Phys.,vol. 57, p. 287, 1985.

[23] I. Daubechies, "Orthonormal bases of compactly supported wavelets," Comm. Pure Appl.Math., vol. 41, p. 909, 1988.

[24] S. Mallat , "Multiresolution approximations and wavelet orthonormal bases of L2(R),"Transactions of the American Mathematical Society, vol. 315, no. 1, pp. 69-87, 1989.

[25] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet representation," IEEE Trans. on Patt. Anal. and Mach. Intel.,. vol. 11, no. 7, pp. 674-693, 1989.

Appendix A

Advanced Topics

A.I Data sharing

An important concept which sometimes arises in OCaml is the ability to share mutable databetween several data structures. This can be achieved using double indirection, e.g. a referenceto a reference to the shared data.

For example, this creates a (trivial) imperative data structure which we wish to share:

# let data = ref 3;;val data: int ref = {contents = 3}

Two different data structures may then reference data:

# let a = ref data;;val a : int ref ref = {contents = {contents = 3}}# let b = ref data;;val b : int ref ref = {contents = {contents = 3}}

The original data is now shared between a and b (illustrated in figure A.1). Therefore, thecontents of a and of b may be altered simultaneously by altering data:

# data := 4; ;- unit = 0# a;;- : int ref ref ={contents = {contents =4}}

data 3

Figure A.I: By sharing a reference data to the value 3 between two references a and b,the value 3 may be shared.

241

242 APPENDIX A. ADVANCED TOPICS

The ability to share data in this way is useful in many circumstances.

However, an important caveat arises from the sharing of doubly indirected references. Themake function in the Array module creates an array with all elements set to the given value.Thus, if the given value is mutable (e.g. a reference) then, as mutable data structures themselves, the elements of the resulting array will all share the same data. For example, thefollowing array will have all elements sharing a single reference to zero:

# let a :::: Array.make 3 (ref 0);;

val a : int ref array = [I {contents:::: O}; {contents = O}; {contents:::: O} I]

Altering any element in the array then affects the value of all other elements in the array:

# a. (0) := 7; a;;- : int ref array:::: [I {contents:::: 7}; {contents:::: 7}; {contents = 7} I]

This is usually not the desired behaviour. Solutions to this problem are given in section B.6.

A.2 Labelled and optional arguments

The OCamllanguage supports two unusual forms of function argument:

Labelled arguments allow function arguments to be named and then supplied in any order.

Optional arguments allow function arguments to be omitted, replaced either by a defaultvalue (specified in the definition of the function) or by an option type.

Labelled arguments use the syntax - arg :var where arg is the name of the argument labeland var is the name of the variable bound to this argument in the body of the function. Forexample, the ipow function may be written using named arguments:

# let rec ipow -x: x -n: n :::: if n=O then 1. else x *. ipow -x: x -n: (n-1) ; ;val ipow : x:float -> n:int -> float:::: <fun>

The arguments to this function may be specified conventionally, ignoring the labelling:

# ipow 5. 3;;- : float:::: 125.

Alternatively, the labelled arguments to this function may then be specified in any order whenthe function is called by referring to them by name:

# ipow -n:3 -x:5.;;- : float:::: 125.

In particular, this facility may be used to create more specialised functions by specifying someof the arguments of more general functions in any order. In the case of ipow, we are morelikely to want to raise to the power of a constant rather than to raise a constant to any givenpower. For example, we may wish to define a function to cube a given number:

A.2. LABELLED AND OPTIONAL ARGUMENTS

# let pow3 = ipow -n:3;;val pow3 : x: float -> float = <fun># pow3 5.;;- : float = 125.

243

The ability to name function arguments is of particular use when dealing with complicatedinterfaces to libraries, such as the interfaces to graphical libraries discussed in chapter 6.

Optional arguments are defined using the syntax? (var=val) where var is the argument andvariable name and val is the default value of this variable, used if no value is specified. Whencalled, optional arguments are specified using the same syntax as labelled arguments. Forexample, the following function creates a vector from the given pair of values, defaultingeither to zero if unspecified:

# let make_vee ? (x=O .) ? (y=O .) () = (x, y) ; ;val make_vee: ?x:float -> ?y:float -> unit -> float * float = <fun>

Applying this function with both, only one or none of its arguments results in the omittedcoordinates being substituted with zero:

# make_vee -x: 1. -y: 2. 0;;

- : float * float = (1., 2.)# make3ee -y:2. 0;;- : float * float = (0., 2.)# make3ee 0;;- : float * float = (0., 0.)

In order to infer which optional arguments have not been specified, optional arguments mustalways be accompanied by a non-labelled argument. Hence the trailing value of type unit

in the previous example. The compilers spot failure to specify a non-labelled argument andcomplain:

# let f ? (x=O .) = 0;;Warning: This optional argument cannot be erasedval f : ?x:float -> unit = <fun>

Optional arguments may also be specified without default values, in which case the corresponding variable in the function body becomes an option type. This is best illustrated by afunction which returns the argument it receives:

# let f ?x 0 = x; ;val f : ?x:' a -> unit -> 'a option = <fun>

Note that the return type of this function is 'a option, rather than simply 'a.

Specifying the optional argument results in a Some value being passed to the function:

# f -x:5 0;;- : int option = Some 5


Omitting the optional argument results in None being passed to the function:

# f ();;- : 'a option = None

A short-hand notation exists for function definitions and calls with labelled arguments whenthe argument or variable name is the same as that of the label. In such cases, the argumentor variable name and preceding colon may be omitted, Le. -x:x may be written -x.

Optional arguments are particularly useful in the context of interfaces, such as libraries implementing graphical user interfaces, where optional arguments allow full functionality to beaccessible whilst providing a simpler alternative for specifying common subsets of functionarguments. This is used to good effect in the glut bindings for OCaml, described in chapter 6.

A.3 Defining binary infix operators

In the context of scientific computing, data types are often used to represent mathematicalobjects, such as vectors, matrices, quaternions and hyper-complex numbers. However, mathematical expressions written in terms of function calls are obfuscated compared to conventionalmathematical notation.

As we have already seen, the arithmetic binary infix operators may be referred to as conventional functions by enclosing them in parentheses. For example, integer addition:

#(+);;- : int -> int -> int = <fun># ( + ) 3 4;;- : int = 7

Similarly, binary infix operators may be defined by writing the function name in this syntax.For example, a += operator which increments and returns an int ref:

# let ( += ) a b = a := ! a + b; a;;val ( +=) : int ref -> int -> int ref = <fun># (ref 3) +=4;;- : int ref = {contents = 7}

A handy trick when defining infix operators over a type is to define the infix operators in amodule called Inf ixes which is nested within the module which defines the type and associatedfunctions. This allows the namespace of the Name. Infixes module to be opened, providingaccess to the infix operators without providing access to other functions implemented by themodule. This trick is exploited by the GMP bindings for arbitrary-precision integer arithmetic(Gmp. Z. Infixes).

A.4. INSTALLING TOP-LEVEL PRETTY PRINTERS

A.4 Installing top-level pretty printers

245

In section we created a FloatRange module for handling ranges [l, u). However, when playingwith this module from the top-level, values of type FloatRange. t are printed <abstr> as thetype t is abstract:

# let a = FloatRange.make 1. 3. and b = FloatRange.make 2. 5.;;val a : Range. t = <abstr>val b : Range. t = <abstr>

The ability to print such values in the form [l, u) would be very useful.

This can be achieved by installing a custom pretty printer in the top-level, for printing anyvalues of the type FloatRange. t. We shall begin by defining a function capable of convertingsuch a value to a string:

# let string_of_floatrange r = mat ch FloatRange. to_pair r with(1, u) -> "["-string_of_float 1-", "-string_of_float u-")";;

val string_of _floatrange : FloatRange. t -> string = <fun>

A function to print a range to a given format stream! may then be written:

# let print_floatrange f r = Format. fprintf f "%s" (string_of _floatrange r);;val print_floatrange : Format. formatter -> FloatRange. t -> unit = <fun>

This function may then be installed as the top-level pretty printer for values of the typeFloatRange. t:

# #install_printer print_floatrange;;

For example:

# let a = FloatRange .make 1. 3. and b = FloatRange .make 2. 5.;;val a : FloatRange. t = [1., 3.)val b : FloatRange. t = [2., 5.)

This functionality can be useful in many circumstances, particularly when dealing with mathematical constructs.

A.5 Monomorphism

Although we have previously dedicated little discussion to the topic of monomorphic types,such as ' _a, we have occasionally presented code which has been deemed to contain monomorphic types by OOaml. For example, the type of an empty hash table, as discussed in section3.5:

IThe Format module described in the manual [2].

246

# Hashtbl.create 1;;

- : (' _a, '_b) Hashtbl. t = <abstr>

APPENDIX A. ADVANCED TOPICS

Monomorphic types can also appear as the result of performing an 'I]-reduction, when a closureis formed by the application of some of a polymorphic function's arguments. For example,when applying the first argument to the List. map 2 function:

# let combine a b = List.map2 (fun a b -> (a, b)) a b;;val combine: 'a list -> 'b list -> ('a * 'b) list = <fun># let combine'" List .map2 (fun a b -> (a, b)); ;val combine: '_a list -> '_b list -> (' _a * '_b) list = <fun>

They can even appear as the result of seemingly trivial expressions. For example, althoughthe list containing the empty array is polymorphic, the array containing the empty list ismonomorphic:

#[[IIJJ;;-: 'a array list = [[IIJJ# [I [J 1J ; ;- : '3 list array = [I [J 1J

The appearance of monomorphic types can be something of a mystery. Firstly note that,whereas a polymorphic type 'a denotes an expression which is valid for all types 'a, amonomorphic type' _a denotes an expression which is valid for some specific type' _a. Consequently, a monomorphic type will be ossified into a specific type as soon as its type can beinferred.

Thus, monomorphism is often a result of mutability, as demonstrated by the last example.The empty array is not mutable. Therefore, the list containing the empty array is not mutable.Consequently, the type of the list containing the empty array is truly polymorphic. In contrast,the array containing the empty list is mutable - the empty list could be replaced by a list ofelements of some particular type.

This appearance of monomorphic types can be undesirable. For example, the 'I]-reduced combine function was, most likely, intended to be a polymorphic function. Fortunately, monomorphic types can be generalised to polymorphic types by wrapping the offending expression ina polymorphic function ('I]-expansion).

A.6 Functors

As we have seen, functors act as functions which map modules to modules, such as theSet. Make and Map. Make functors introduced in chapter 3. In the case of the Set and Map

modules, a functor is used to enforce the correct use of operations between sets and maps,such as set unions.

The functionality of Set and Map could have been provided without functors:

• Each of the member functions could have been made to accept the comparison functionas an argument. However, users of Set and Map could then accidentially pass the wrongcomparison function, resulting in unexpected and undefined behaviour.

A. 7. MEMOIZATION 247

• The functor could be replaced by a higher-order function which accepts the comparison function and returns a record containing all of the member functions. Sharing thecomparison function improves safety. However, the resulting records are then only distinguished by their type. Therefore, the compiler would not ensure that functions fromone record were not accidentally applied to data from another record which happenedto have the same type, e.g. when the records represents sets of integers with differentcomparison functions. Such a mistake would also result in unexpected and undefinedbehaviour.

Thus, the use of functors in the Set and Map modules improves safety.

Functors can be used not only to add safety assurance but also to provide a form of specialisation. For example, a particle simulator may be written as a functor which maps a modulerepresenting a particle onto a module representing a simulation of such particles.

A.7 Memoization

Caching is a productive way to optimise a program or function. In many cases, when the resultof a function depends only upon its argument values and the function produces no side-effects,a mapping from arguments to the resulting return value can be used to cache the effect of thefunction. This is known as memoization.

For example, the following function fib computes the nth Fibonacci number:

# let ree fib n = if n < 3 then 1 else fib (n - 2) + fib (n - 1);;val fib: int -> int = <fun># Array.init 10 (fun i -> fib (i + 1»;;- : int array = [11; 1; 2; 3; 5; 8; 13; 21; 34; 551J

This implementation is quite slow, even for small n. A higher-order timer function can beused to measure the performance of different implementations of the fib function:

# let time f x =

let t = Sys.time 0 in let fx = f x in fx, Sys.time 0 - t;;val time: ('a -> 'b) -> 'a -> 'a * float = <fun>

For example, using this fib function to compute fib 35 takes 2.26 seconds:

# time fib 35;;- : int * float = (9227465, 2.26)

An important cause of the inefficiency of this function is the lack of reuse of previous results.This can be addressed by caching the return value of the fib function for each argument valuen in a hash table memory:


# let rec cached3ib =let memory = Hashtbl. create 1 infun n ->

try Hashtbl.find memory nwith Not_found ->

let fn ::=if n < 3 then 1 else cached_fib (n - 2) + cached_fib (n - 1) in

Hashtbl.add memory n fn;fn; ;

val cached_f ib : int -> int ::= <fun>

Caching the effect of the recursive fib function greatly improves its performance, to the extentthat the previous benchmark now takes an immeasurably small time to execute:

# time cached_fib 35;;- : int * float = (9227465,0.)

Moreover, the process of caching the effect of a recursive single-argument function, such as thefib function, can be factored out into a higher-order function. However, this is non-trivial inthe case of recursive functions as they must call the memoized version of themselves and notthe original unmemoized version. This requires the fib function to be rewritten in the formof a higher-order function which accepts the function f which it is to call as an argument:

# let rec fib' f n::= if n<3 then 1 else f (n - 2) + f (n - 1);;val fib' : int -> int = <fun>

Functions such as this can then be memoized using the higher-order memoizel function:

# let memoize1 f =

let cache = Hashtbl. create 1 inlet rec f' n::=

try Hashtbl.find cache nwith Not_found -> (fun fn -> Hashtbl. add cache n fn; fn) (f f' n) in

f'; ;val memoize1 : «' a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>

For example, a memoized variant of the original fib function may be created by simplyapplying the unmemoized fib function to the memoizel function:

# let memoized_fib::= memoize1 fib';;val memoized_fib : int -> int = <fun># time memoized_fib 35;;- : int * float = (9227465, 0.)

The memoizel function may be productively used to memoize many other functions.

A.8. POLYMORPHIC VARIANTS

A.8 Polymorphic variants

249

The OCamllanguage allows the use of a special type known as a polymorphic variant, thenames of which are denoted by an initial back-tic, e.g. 'Stomp. The utility of polymorphicvariants lies in their typing.

Polymorphic variants are used in the same way as variant types:

# let extract 1 = function'None -> 0

I 'Some a -> a;;val extract1: [< 'None I 'Some of int J - > int = <fun>

Note that there was no need to declare the polymorphic variants 'None and 'Some. Instead,the type [< 'None I 'Some of int ], attributed to the argument of the extract 1 function,denotes any subset of the set containing these two types. For example:

# let a = [ 'None; 'Some 1; 'Some 2 J;;val a: [> 'None I 'Some of int J list = ['None; 'Some 1; 'Some 2J# List.map extract1 a;;- : int list = [0; 1; 2J

Writing this function in a different style results in a different inferred type for the polymorphicvariant argument:

# let extract2 = function'Some a -> a

I _ -> 0;;val extract2: [> 'Some of int J -> int = <fun>

In this case, the type [> 'Some of int ] denotes any superset of the set containing 'Some.

For example:

# let a = [ 'None; 'Some 3; 'Other J;;val a: [> 'None I 'Other I 'Some of int J list =

['None; 'Some 3; 'OtherJ# List.map extract2 a;;- : int list = [0; 3; OJ

Polymorphic variants can be said to "weaken the type system" as they allow a wider range ofpotentially incorrect uses compared to conventional variants. However, polymorphic variantscan be used in many productive ways. Most simply, polymorphic variants can be used to evadethe verbosity associated with namespaces. For example, the lablGL bindings to OpenGL(described in chapter 6) uses polymorphic variants to refer to the considerable number ofenumerated values used by OpenGL.

Some discussion of the implementation and use of polymorphic variants is given in the literature [2].


A.9 Phantom types

Thus far, we have only examined straightforward use of the OCaml type system. This typesystem is clearly very sophisticated, automating a great many tedious and error-prone chores.In addition to simple use, the type system can be exploited in some non-trivial ways.

In particular, types may be made to include constraints which are then statically verifiedby the compiler. As OCaml is statically typed, the process of verifying the correct use ofthe constraints is performed at compile-time. Consequently, these methods do not incur anyrun-time performance cost.

For example, when writing a program which deals with both sorted and unsorted lists, amodule can be written which implements sorted lists, verifying their correct use at compiletime:

# module SortedListsig

type ('a, 'b) tval sorted_of: 'a list -> ('a, [ 'Up]) tval rev_up: ('a, [ 'Up J) t -> ('a, [ 'Down J) tval rev_down: ('a, [ 'Down J) t -> ('a, [ 'Up J) tval list_of : ('a, [< 'Up I 'DownJ) t -> 'a list

end =

structtype ('a, 'b) t = 'a list

let sorted_of 1 = List. sort compare 1let rev_up 1 = List. rev 1 and rev_down 1 = List. rev 1let list_of 1 = 1

end; ;module SortedList

sigtype (, a, ' b) tval sorted_of: 'a list -> ('a, [ 'Up J) tval rev_up: ('a, [ 'Up J) t -> ('a, [ 'Down J) tval rev_down: ('a, [ 'Down J) t -> ('a, [ 'Up J) tvallist-..of : ('a, [ 'Up I 'Down J) t -> 'a list

end

In this case, sorting a list oftype ' a list using the SortedList. sorted_of function producesa list of type (' a, ['Up]) SortedList. t:

# let 1 = SortedList. sorted_of [1; 3; 2; 8; 6; 9J ; ;vall: (int, [ 'Up J) SortedList. t = <abstr># SortedList.list_of 1;;- : int list = [1; 2; 3; 6; 8; 9J

The phantom type [' Up] in the type of 1 is used to indicate that the elements in the list arein non-descending order. Applying the SortedList . rev_up function to 1 produces a list indescending order:

A.10. EXPONENTIAL TYPE GROWTH

# let rl = SortedList. rev_up 1; ;val rl : (int, [ 'Down J) SortedList.t = <abstr># SortedList.list_of rl;;- : int list = [9; 8; 6; 3; 2; lJ

251

Again, the type [' Down] is used to convey whether or not the list is sorted, in this case innon-ascending order.

The OCaml type system will enforce the appropriate use of functions restricted by theirphantom type. For example, trying to apply the rev_up function to a down sorted list resultsin a type error caught at compile-time:

# SortedList.rev_up rl;;This expression has type (int, [ 'Down J) SortedList. tbut is here used with type (int, [ 'Up J) SortedList. tThese two variant types have no intersection

Thus, phantom types may well be useful in the context of scientific computing. For example,to enforce the correct use of different types of similar mathematical objects, such as row andcolumn vectors.

A.IO Exponential type growth

Surprisingly, the ML type inference algorithm has exponential complexity. In particular,the inferred type of an expression may grow exponentially with the size of the expression.For example, the following nested expressions results in a type which grows exponentially incomplexity for each repeated nesting:

# let x x y z = z x y inlet x y = x (x y) inlet x y = x (x y) inlet x y = x (x y) inlet x y = x (x y) inx; ;

- : 'a -> 'b -> «'c -> «'d -> «'e -> «'f -> «'g -> «'h -> «'i -> «'j -> «'k->«'1-> «'rn -> «'n -> «'0 -> «'p -> «'q -> ('a -> 'q -> 'r) -> 'r) -> 'p -> 's) ->'s) -> '0 -> 't) -> 't) -> 'n -> 'u) -> 'u) -> 'rn -> 'v) -> 'v) -> '1 -> 'w) -> 'w) -> 'k-> 'x) -> 'x) -> 'j -> 'y) -> 'y) -> 'i -> 'z) -> 'z) -> 'h -> 'ai) -> 'ai) -> 'g -> 'bi)-> 'bl) -> 'f -> 'c1) -> 'ci) -> 'e -> 'dl) -> 'di) -> 'd -> 'ei) -> 'ei) -> 'c -> 'fi)-> 'fl) -> 'b -> 'gi) -> 'gl = <fun>

Exponential types are totally useless but fun.


Appendix B

Troubleshooting

This appendix describes and solves some non-trivial problems typically encountered whenlearning the OCaml language.

B.l Dangerous if

The if construct is easily misused in the absence of sufficient bracketing. Specifically, manyfunctions begin with preliminary tests, such as sanity checks, which are often written as singleline if constructs such as:

# let rec factorial n =if n=O then 1 elsen * factorial (n-1);;

val factorial: int -> int = <fun># factorial 5;;- : int = 120

In this case, the expression following the else may be equivalently bracketed in begin andend:

# let rec factorial n =if n=O then 1 else begin

n * factorial (n-1)end; ;

val factorial : int -> int = <fun># factorial 5;;- : int = 120

However, the latter form is mOi:lt robui:lt to i:leemingly i:limple alteratioIli:l. For example, allincorrect attempt to supplement the termination of the function with some printed, debuggingoutput:

# let rec factorial n =

if n=O then print_endline "Finished! "; 1 elsen * factorial (n-1);;

Syntax error

253

254 APPENDIX B. TROUBLESHOOTING

In this case, the language is trying to parse this input as the non-sensical:

let rec factorial n =if n=O then print_endline "Finished!";1

Bracketing can save the day:

# let rec factorial n =if n=O then (print_endline "Finished!"; 1) elsen * factorial (n-1) ;;

val factorial: int -> int = <fun># factorial 5;;Finished!- : int = 120

More confusingly, the grammar of the OCaml language is also eager to bind the first validexpression after an else. An incorrect attempt to print debugging information after the elseis caught as a type error:

# let rec factorial n =if n=O then 1 elseprint endline "Working... ";n * factorial (n-l);;

This expression has type unit but is here used with type int

In this case, the language has parsed this input as an attempt to present the result of theexpression print_endline "Working ... ", which is of type unit, as an alternative in the ifexpression to the 1 of type into

Such mistakes will not always be caught by the type checker. The following example issupposed to print out whether or not the given integer is an integer power of two, incrementinga counter if it was not:

# let counter = ref 0;;val counter: int ref = { contents = 0 }# let ipow_oL2 i =

if i land (i - 1) = 0 then print_endline "yes" elseincr counter;print_endline "no";;

val ipow_of_2 : int -> unit = <fun>

Applying this function to a value which is not an integer power of two produces the expectedresponse:

# ipow_oL2 3;;no- : unit = 0

B.2. SOOPING SUBTLETIES 255

However, when applied to an integer power of two, the ipow_of_2 function appears somewhatindecisive:

# ipow_of_2 4;;yesno- : unit = 0

In fact, the function is incorrectly printing "no" in all cases because only the incr counter

expression has been associated with the else of the if construct, Le. the function was parsedas:

# let ipow_of_2 i =

if i land (i - 1) = 0 thenprint_endline "yes"

elseincr counter;

print_endline "no";;val ipow_of_2 : int -> unit = <fun>

This problem is easily fixed by inserting correct bracketing:

# let ipow_oL2 i =if i land (i - 1) = 0 then print_endline "yes" else begin

incr counter;print_endline "no"

end; ;val ipow_oL2 : int -> unit = <fun>

Giving:

# ipow_oL2 3;;no- : unit = 0# ipow_oL2 4;;yes- : unit = 0

as expected.

In general this problem can be avoided by adhering to a simple guideline: bracket all nontrivial expressions appearing after the then or else keywords. Notably, this guideline may bebroken when the expression raises an exception and, therefore, control will never propagatebeyond this point.

B.2 Scoping subtleties

The region of a program in which a bound variable name may be referred to is known asthe scope of the variable. Although scoping is relatively simple in the context of imperativelanguages, the presence oflocally defined functions can make things more interesting in OCaml.For example, the following function creates and returns a function for raising to the power oftwice the given number:

256

# let f y =

let z = 2. *. y in(fun x -> x ** z) ; ;

val f : float -> float -> float = <fun>

APPENDIX B. TROUBLESHOOTING

In this case, the variable z is used in the A-function result.

This can be a source of confusion and errors when nested functions use the same or similarvariable names. Slight mistakes can then result in the wrong variable being used in a givencontext. As the typing of the program is likely to be correct, the compiler is unlikely to pickup such mistakes.

B.3 Evaluation order

Unlike other languages, arguments are evaluated in an unspecified order in OCaml. In thecontext of purely functional programs this makes no difference as expressions are independent.In the context of imperative programs, this can affect the result. Consequently, programmersmust strive to avoid the temptation of assuming that arguments are evaluated in any particularorder.

For example, the following mutable variable can be used to determine evaluation order:

# let x = ref 1;;val x : int ref = { contents = 0 }

The following function doubles its mutable argument:

# let double x = x : = 2 * !x; ;val x : int ref -> unit = <fun>

Evaluating the following expression shows that, in this case, OCaml has chosen to evaluatethe subexpressions in reverse order, incrementing x to 2, doubling it to 4 and storing it:

# (!x, double x, iner x);;- : int * unit * unit = (4,0,0)

The order of evaluation of expressions may be guaranteed in a number of ways. In particular,the let ... in construct guarantees to evaluate the new definition before proceeding. Forexample, this extracts the current value of x (4) before doubling and incrementing it:

# let a = !x in let b = double x in (a, b, iner x);;- : int * unit * unit = (4,0,0)

In the context of imperative programming, evaluation order is also guaranteed by compoundexpressions formed by the; operator. For example, the following guarantees to double xbefore incrementing it:

# double x;iner x;!x; ;

- : int = 19

In summary, programs should always be written in an evaluation-order independent manner.Short-circuit evaluated operators (Le. && and I I) are notable exceptions as these operatorsare guaranteed to evaluate arguments in order, only as necessary.

BA. CONSTRUCTOR ARGUMENTS

B.4 Constructor arguments

257

Multiple-argument variant-type constructors cannot have their arguments supplied in a tuple.This confusion arises because the syntax used to supply the arguments of a variant constructorlooks like that of a tuple.

For example, in the context of the 2-argument constructor On:

# type button = Off I On of int * string;;type button = Off I On of int * string

The following attempt to use the On constructor by supplying the two arguments as a 2-tuplemine fails:

# let mine = (1, "mine");;val mine: int * string = (1, "mine")# On mine;;The constructor On expects 2 argument(s),but is here applied to 1 argument(s)

This problem is easily circumvented by using a function to map a tuple to the variant type,such as:

# let button_of (i, s) = On (i, s);;val button_of: int * string -> button = <fun># button_of mine;;- : button = On (1, "mine")

Although the distinction between constructor arguments and tuples is comparatively uncommon, it can be a source of confusion.

B.5 Recycled types

Type definitions are always treated uniquely by OCaml. Even if a type definition is identicalto a previous definition, the language guarantees to treat it separately.

For example, the following defines two types (mytypel and mytype2) and two values (a andb) of these types:

# type mytype1 = On; ;type mytype1 = On# let a = On;;val a : mytype1 = On# type mytype2 = On; ;type mytype2 = On# let b = On; ;val b : mytype2 = On

The values are clearly incomparable because they are of different types.

258 APPENDIX B. TROUBLESHOOTING

# a =Q;;This expression has type mytype2 but is here used with type mytypel

In this case, the error is entirely self-explanatory. However, the same situation arises even ifthe types have the same name, in which case the error message is more confusing:

# type mytype2 = On; ;type mytype2 = On# let c = On;;val c : mytypel = On# b = ~;;

This expression has type mytype2 but is here used with type mytype2

Future versions of the OCaml compilers may produce a more useful error message in suchcases but this is clearly worth remembering.

B.6 Mutable array contents

If an array is initialised using the Array. make function1 with mutable contents then thecontents will be shared between all elements in the array. This is usually not the desired effect.For example, the following creates an array containing 3 elements, all of which reference thesame integer:

# let a = Array.make 3 (ref 0);;val a : int ref array = [I {contents = O}; {contents = O}; {contents = O} I]

Assigning from any elements then affects all elements:

# a.(O) := 7; a;;- : int ref array = [I {contents = 7}; {contents = 7}; {contents = 7} I]

This problem is most easily solved by using the Array. ini t function to create the array,specifying a function which returns a new reference at each invocation:

# let a = Array. init 3 (fun _ -> ref 0);;val a : int ref array = [I {contents = O}; {contents = O}; {contents = O} I]# a.(O) := 7; a;;

- : int ref array = [I {contents = 7}; {contents = O}; {contents = O} I]

The A-function creates a new reference to zero for each array element.

lOr, equivalently, the deprecated Array. create function.

B.7. POLYMORPHIC PROBLEMS

B.7 Polymorphic problems

259

As we have seen, the built-in polymorphic functions (e.g. =) can be inappropriately appliedto:

• Data structures containing functions .

• Values of abstract types.

Also, note that this set of polymorphic functions includes compare and hash:

# compare;;-: 'a->'a->int=<fun># Hashtbl.hash;;- : 'a -> int = <fun>

If applied to data structures containing function values, these polymorphic functions are likelyto raise the Invalid_argument exception:

# let a = (1, fun i -> 1) and b = (1, fun i -> 2) ;;val a : int * (, a -> int) = (1, <fun»val b : int * (, a -> int) = (1, <fun»# a =b;;

Exception: Invalid_argument "equal: functional value".

Inappropriate application of these polymorphic functions is likely to become a problem duringthe development of a program, as the data structures used by a program evolve. For example, ifthe development of a program results in a non-performance-critical portion of a whole programbecoming performance critical then subsequent optimisation is likely to involve the use of moresophisticated data structures, such as replacing an association list with a map or hash table.Any applications of built-in polymorphic functions to this data structure (e.g. comparing setsof mappings by applying =) then become erroneous.

Such errors are currently difficult to track down. However, in theory, a tool may be writtenwhich finds some of these errors automatically by searching for applications of these polymorphic functions to types which contains functions or abstract types.

B.8 Local and non-local variable definitions

Although similar in appearance, the let keyword is used to create two different constructs.Specifically, non-nested (outermost) let = ;; constructs make new definitions in thecurrent namespace whereas nested let = in ... constructs make local definitions.

The difference between nested and non-nested definitions can sometimes be confusing. Forexample, the following is valid OCaml code which defines a variable a:

260

# let a =let b = 4 in

b * b;;val a : int = 16

APPENDIX B. TROUBLESHOOTING

In contrast, the following tries to make a non-local definition for a within the nested expressionfor b, which is invalid:

# let b = 4 in let a = b * b..Li..Syntax error

Regardless, the latter code can appear when programmers are drunk or tired.

Documents

Ocaml for Scientists