Lazy Evaluation in Numeric Computing 20 Октября 2006 Игорь Николаевич Скопин

Lazy Evaluation in Numeric Computing

20 Октября 2006

Игорь Николаевич Скопин

Agenda• Elementary introduction to functional programming: reduction,

composition and mapping functions• Lazy Evaluation:

– Scheme– Principles– Imperative Examples– Analysis– Necessity in Procedure Calls

• Dependencies of Lazy Evaluation on the Programming Style• Newton-Raphson Square Roots

– Functional Program• Numerical Differentiation• FC++ library

– “Lazy list” data structure in FC++ • Memoization• Lazy Evaluation in Boost/uBLAS: Vectors/Matrices Expressions

– Vectors/matrices expressions and memoization– style of programming

• The Myths about Lazy Evaluations• Q&A

reduce f x nil = xreduce f x (cons a l) = f a (reduce f x l)

sum nil = 0sum (cons num list) = num + sum list

Gluing Functions Together: reduce

[ ] nil

[1] cons 1 nil

[1,2,3] cons 1 (cons 2 (cons 3 nil))

reduce sum 0 ( 1 , 2 , 3 , [ ] ) ⇒ 1 + 2 + 3 + 0reduce multiply 1( 1 , 2 , 3 , [ ] ) ⇒ 1 * 2 * 3 * 1

list of x = nil | cons x (list of x)

(reduce f x) l

specific to sum

sum = reduce add 0add x y = x + y

product = reduce multiply 1(reduce cons nil) α ⇒ α //copy of αreduce (:) [ ] // — Haskell

reduce — is a function of 3 arguments, but it applies to 2 only ⇒ the result is a function!

append a b = reduce cons b aappend [1,2] [3,4] = reduce cons [3,4] [1,2]

= (reduce cons [3,4]) (cons 1 (cons 2 nil))

= cons 1 (reduce cons (cons 3 (cons 4 (cons nil)))) (cons 2 nil))

= cons 1 (cons 2 (reduce cons (cons 3 (cons 4 (cons nil))) nil))

= cons 1 (cons 2 ((cons 3 (cons 4 (cons nil))))

f x a l

f x a l

Gluing Functions Together: Composition and MapA function to double all the elements of a list

doubleall = reduce doubleandcons nilwhere

doubleandcons num list = cons (2*num) list

Further:doubleandcons = fandcons double

where double n = 2*nfandcons f el list = cons (f el) list

reduce f nil gives expansion f to list

specific to double

An arbitrary function

Function composition — standard operator “.”:(f . g) h = f (g h)

So fandcons f = cons . f

Next version of doubleall: doubleall = reduce (cons . double) nil

This definition is correct:

fandcons f el = (cons . f) el= cons (f el)

fandcons f el list = cons (f el) list

specific to doubleFunction map (for all the elements of list):

map f = reduce (cons. f) nil

Final version of doubleall :

doubleall = map double

One more example:

summatrix = sum . map sum

Lazy Evaluation: SchemeF and G — programs:

( G.F ) input G ( F input )It is possible: F input → tF; G tF, but it is not good!

The attractive approach is to make requests for computation:

Using a temporary file

G:

Needed data produced by F

F:

Data are ready

Hold up

Hold upResume F

Resume G

G:

Needed dataF:

Hold up

Hold up

Resume F

Resume G

More precisely:

…Data are ready

Lazy Evaluation: PrinciplesPostulates:• Any computation is activated if and only if it is necessary for

one or more other computations. This situation is named as necessity of computation.

• An active computation is stopped if and only if its necessity vanishes (for example it has been satisfied).

• The computation, as a whole, is activated forcibly as a request (necessity) to obtain the results of computational system execution.

Consequences:• The activation of all computational units is driven by a

dataflow started when a necessity of computation as a whole arises.

• A control flow of the computation is not considered as a priori defined process. It is formed up dynamically by the necessities of computations only.

Lazy Evaluation: Imperative ExamplesBoolean expressionℜ ≡ α&β&γ : (α == false) ⇒ ℜ == false

(α == true) & (β == false) ⇒ ℜ == falseif ( (precond) &&

(init) && (run) && (close) )

{ printf (“OK!”); }else …

Vector/matrix expressionsWhen we write

vector a(n), b(n), c(n); a = b + c + d;

Necessity of

computation

may not

appear!

The compiler does the following:Vector* _t1 = new Vector(n);

for(int i=0; i < n; i++) _t1(i) = b(i) + c(i);Vector* _t2 = new Vector(n);

for(int i=0; i < n; i++) _t2(i) = _t1(i) + d(i);for(int i=0; i < n; i++) a(i) = _t2;

delete _t2;delete _t1;

So we have created and deleted two temporaries! for(int i=0; i < n; i++) a(i) = b(i) + c(i) + d(i);

The same for matrices

Not everything is so good! We’ll discuss this later

Arithmetic expression

x = ( … ) * 0; x = 0;⇒

without compu-tation of (…)!

Lazy and eager evaluation of files handling

cat File_F | grep WWW | head -1 Subject of next slide

Analysis of cat File_F | grep WWW | head -1 Lazy and Eager EvaluationEager variant of execution:

cat File_F grep WWW head -1stdout stdin stdout stdin stdout

On

e string

All strin

gs o

f File_F

Strin

gs w

ith ‘W

WW

’

Lazy variant of execution:

cat File_F grep WWW head -1stdout stdin stdout stdin stdout

On

e string

On

e string

with

‘WW

W’

Fo

r string

s o

f File_F

UNIX pipeline may be considered as optimization of lazy evaluation (in this case)!

Suppose that the needed string was not detected at first

We omit intermediate steps here

F

A nice question: is this version more correct?

Note: It is the same object in both cases!

Analysis of Vector/matrix ExampleConsider the example more closely:

a = b + c + d;t1 = b + c;t2 = t1 + d;a = t2;

Order of calculationsTraditional scheme

Necessity of

computation (↓)

appears only

when a [i] is

needed

a[i] +

=

b[i] c[i] d[i]

a b dc

t1t2

+

+

I

II

III

I ⇒ II ⇒ III (or I ⇒ II U III)c[i]

d[i]

=

a[i] +

b[i]

+

a b dcLazy evaluation

Necessity in Procedure Callsprocedure P (in a, out b);…P (6*8, x);

When does the necessity of computations appear?in parameters• Real and forced necessity• There are many details• Ingerman’s thunks (algol 60)out parameters• Only forced necessity is possible in imperative languagesWhat hampers the real necessity?• Dependences on context• Possibility of reassignment for variablesSISAL and others languages with a single assignment — this

palliative seems to be not good

P ( in a, out b);

{ … a…

… b = …; … }

6*8 X

Q ( in a ); { … a = 7; … … r = 9; …

… t = 2 + a; …… }

…r = 1;Q ( r + 5);

… r = x*5; …

Dependencies of Lazy Evaluation on the Programming Style

Functional programming • Exactly needed necessities• Automatic dataflow driven

necessities

• Combining functions and composition oriented approach

• Declarative programming

Operational programming• Forcibly arising necessities• Manual control flow and

agreement driven necessities

• Control flow and data transforming oriented approach

• Imperative programmingNevertheless, there are useful possibilities to apply lazy evaluation in both cases!

The key notion is the definition of necessities.

Functional style may be characterized as a style that allows automatic dataflow driven necessities.

Newton-Raphson Square Roots Functional programs are inefficient. Is it true? Algorithm:

– starting from an initial approximation a0– computing better approximation by the rule

a(n+1) = (a(n) + N/a(n)) / 2 If the approximations converge to some limit a, then a = (a + N/a) / 2so 2a = a + N/a, a = N/a, a*a = N ⇒ a = squareroot(N)

Imperative program (monolithic):X = A0Y = A0 + 2.*EPS

100 IF (ABS(X-Y).LE.EPS) GOTO 200Y = XX = (X + N/X) / 2.GOTO 100

200 CONTINUE

This program is indivisible in conventional languages.

We want to show that it is possible to obtain:

• simple functional program

• technique of its improving

•The result is a very expressive program!

Newton-Raphson Square Roots: Functional Program• First version:next N x = (x + N/x) / 2

[a0, f a0, f(f a0), f(f(f a0)), ..]repeat f a = cons a (repeat f (f a))repeat (next N) a0within eps (cons a (cons b rest))

= b, if abs(a-b) <= eps= within eps (cons b rest), otherwise

sqrt a0 eps N = within eps (repeat (next N) a0)

• Improvement:relative eps (cons a (cons b rest))

= b, if abs(a-b) <= eps*abs(b)= relative eps (cons b rest), otherwise

relativesqrt a0 eps N = relative eps (repeat (next N) a0)

Numerical Differentiation easydiff f x h = (f (x + h) - f (x)) / h

A problem: small h ⇒ small (f ( x + h ) - f (x)) ⇒ errordifferentiate h0 f x = map (easydiff f x) (repeat halve h0)halve x = x/2

within eps ( differentiate h0 f x ) (1)But the sequence of approximations converges fairly slowly

elimerror n (cons a (cons b rest)) == cons ((b*(2**n)-a)/(2**n-1)) (elimerror n (cons b rest))

But n is unknownorder (cons a (cons b (cons c rest))) = round(log2((a-c)/(b-c)-1))

So a general function to improve a sequence of approximations isimprove s = elimerror (order s) s

More efficient variants:within eps (improve (differentiate h0 f x)) (2)Using halve property of the sequence we obtain the fourth order method within eps (improve (improve (improve (differentiate h0 f x)))) (3)Using the following

super s = map second (repeat improve s)second (cons a (cons b rest)) = b

Here repeat improve is used to get a sequence of more and more improved sequences. So we obtain very difficult algorithm in easy manner

within eps (super (differentiate h0 f x)) (4)

Let A is the right answer and B is the error term B*hn. Then a(i) = A + B*2n*hn and a(i+1) = A + B*(h**n).

A = (a(i+1)*(2**n) — a(i)) / 2**n – 1

Let A is the right answer and B is the error term B*hn. Then a(i) = A + B*2n*hn and a(i+1) = A + B*(h**n).

A = (a(i+1)*(2**n) — a(i)) / 2**n – 1

n ≈ log2( (ai+2 – ai) / (ai+1 – ai) – 1 )n ≈ log2( (ai+2 – ai) / (ai+1 – ai) – 1 )

FC++ library• High order functions — functions with functional arguments• FC++ library is a general framework for functional programming• Polymorphic functions—passing them as arguments to other

functions and returning them as results. • Support higher-order polymorphic operators like compose(): a

function that takes two functions as arguments and returns a (possibly polymorphic) result

• Large part of the Haskell • Support for lazy evaluation • transforming FC++ data structures ⇔ data structures of the C++

Standard Template Library (STL)• operators for promoting normal functions into FC++ functoids.

Finally, the library supplies• “indirect functoids”: run-time variables that can refer to any functoid with a given monomorphic type signature.

“Lazy list” data structure in FC++ List<int> integers = enumFrom(1); // infinite list of all the integers 1, 2,

List<int> evens = filter(even, integers); // infinite list of all the integers 2, 4,

bool prime( int x ) { ... } // simple ordinary algorithm

filter( ptr_to_fun(&prime), integers );

// ptr_to_fun transform normal function to functoid

plus ( x, y ) ⇒ x + y

plus ( 2 ) ⇒ 2 + x

Limitations

– Lambda functions

– Dependences on context (?)

– Possibility of reassignment for variables (?)

– Template technique is insufficient (blitz++)

Memoization

Imperative scheme 1. We need to call expressions explicitly

only2. Procedure calls depend on the context3. Strong sequence of computation units

(hard for parallelization)4. If we want to memoize previous results

we should do this explicitly5. Memoization process is controlled by

programmer6. Only manual transforming to a suitable

scheme of data representation is possible

7. Circle head and circle body are joined monolithicly

8. Controlflow centric approach9. Require a difficult technique for def-use

chains analysis

Functional scheme1. Expressions do not depend on context2. Context independent procedure calls 3. The sequence of computation units is

chosen by execution system (more flexible for parallelization)

4. Automatic memoization5. Memoization process is not controlled,

but filters are allowed (indirect control)6. Stack technique of program

representation and execution is not appropriate

7. Constructions like reduce and composition of functions allow considering circle head and body independently

8. Dataflow centric approach9. Suitable for def-use chains analysis

Fibonacci example: F (n) = F (n-1) + F (n-2) It is a classical bad case for imperative computations. Why?Previous functional programs are easier for development and understanding. Why?

Lazy Evaluation in Boost/uBLAS: Vectors/Matrices Expressions

Example: A + prod (B,V)– Assignment activate evaluation– Indexes define evaluation of expression tree:

if we write the example, we initiate the following computation for all i { A [i] + ∑k (B [i,k] *V[k]) } (1)

• “for all” means a compatibility of computations (order of computations is chosen by the execution system)

• It is possible (but not necessary) to have the following representation: A + [ ∑k (B [i,k] *V[k])] (”[ ]“denotes a vector constructor)

• Necessity of computation is defined by this fragment for each i (may be dynamically)

• Postulate that “=“ operator always leads to appearance of necessity of computations: its left hand side should be computed and assigned to the right hand side:

D = A + prod (B,V);• Common expression tree is used for computations for each i in (1)

– Types coordination is hold:• Correct vectors/matrices expression includes constituents with types allowed

by operators of expression (including prod and others)

Boost/uBLAS: vectors/matrices expressions: temporary and memoization problems

• Let us consider x = A*x expression (this is an error from the functional style viewpoint, but correct for operational C++)– Naive implementation is for all i { x [i] = ∑k (A [i,k] * x[k]); }

It is not correct!– The suitable implementation should use a temporary t :

for all i { t [i] = ∑k (A [i,k] *x[k]); } x = t; The last assignment should not be a copy of the value, but a reference coping and deleting the previous value of x.

• Let us consider A * ( B * x). – If we don’t use the lazy evaluation, we obtain an n2 complexity

(C1*n2 for B * x plus C2*n2 for other multiplication). – But in a straightforward lazy case the obtained complexity would be n3.

• It is not a problem for a real functional language implementation because of a value propagation technique (automatic memoization)

• Instead of the value propagation technique in C++ implementation, we can provide a temporary. Its assignment breaks the expression

• We use temporaries in both cases. But what part of information should be really saved?

Boost/uBLAS: style of programming

• Object-oriented style– Standard technique and patterns should be used

• C++ style (as addition to the previous)– Standard template technique

• Vectors/matrices expressions (as addition to the previous)– Tendency to use matrix and vector objects instead of

variables with indexes

– Tendency to write expressions instead of simple statements

– Use uBLAS primitives as specializations of general templates

– Don’t use direct classes extensions by multilevel inheritance

Boost/uBLAS. Example task: Jacobi method

• Let us consider the Jacobi method of solving the linear system

– We are able to write it using such formulas as

– Instead of this, we should use it in matrix terms as

– As the result, we obtain the following program (next slide)

aa ii

ki

ijjii

ki xbx

,

)1(

,

)( /)(

ij

n

jji bxa

1,

bDxULDx kk 1)1(1)( ))(

Boost/uBLAS. Example program Jacobi method

matrix<double> A (n, n);

vector<double> B (n);

vector<double> X (n);

matrix<double> D (n, n);

matrix<double> D_1 (n, n);

triangular_adaptor<matrix<double>, unit_lower> L (A);

triangular_adaptor<matrix<double>, unit_upper> U (A);

identity_matrix<double> I(n,n);

…

D = A - L - U + 2*I;

D_1 = inverse_matrix(D);

for (i = 0; i < count; ++ i)

X = -prod(prod( D_1,L+U-2*I),X) + prod (D_1,B);

For input data

For output data

Diagonal

Diagonal-1

D-1

bDxULDx kk 1)1(1)( ))(

A BX* =

Example task in matrix terms

L

U

D

L(A)

11

11

..

.1

U(A)1

11

1.

..

1

Element of uBLAS data structure

Diagonal receiving

We need to write a special function!

Boost/uBLAS style of programming in comparison of indexes using

for ( k = 0; k < count; ++ k ) {for (i = 0; i < A.size1 (); ++ i) {

T (i) = 0; for (j = 0; j < i; ++ j)

T (i) += A (i, j) * Y (j);for (j = i+1; j < A.size1 (); ++ j)

T (i) += A (i, j) * Y (j);T (i) = ( B (i) – T (i) ) / A (i, i);

}Y = T;

}• Obviousness is lost• Sequence of computations is stated hard (losses of possibilities for

compiler’s optimization)• It is harder for development of programs than in the alternative case• Possibilities of indexes using are not lost in vectors/matrices

expressions• Vectors/matrices expressions are more suitable to finding patters

needed for using special optimized external libraries

We force a temporary using!

We force to divide the process off for selecting an diagonal activities with diagonal

Using D-1 (This case may be better than vector/matrix expression)

Boost/uBLAS: Gauss-Seidel method

Jacoby method:…

D_1 = inverse_matrix(D);

for (i = 0; i < count; ++ i)

X = - prod(prod( D_1,L + U-2*I),X) +

prod (D_1,B);

bxAx kk ~~ )1()( Jacoby method:

Gauss-Seidel method:

)( )1()( kk xFx

),( )()1()( kkk xxFx ii

kjji

ij

kjji

iji

ki axaxabx ,

)1(,

)(,

)( /)(

X(k) X(k-1)

(k)ix

)()( )1(1)( bUxLDx kk

Gauss-Seidel method:…

D_1 = inverse_matrix(D+L);

for (i = 0; i < count; ++ i)

X = prod( D_1, B - prod (U, X));

Recall x=Ax problem. It solves by temporary: t=Ax; x=t. But Seidel method

may be consider as Jacoby method when the temporary is avoided!

It is a new approach to organizing of computations!

X(k) X(k-1)

(k)ix

Boost/uBLAS: style of programming (limitations)• Frequently a rejection of the vectors/matrices style is required • A problem of styles compatibility

• Not closed set of operators with matrices and vectors are presented

– Instead of vectors/matrices operator “*” one should use a generic function prod (mv1, mv2) provided for all needed cases

– uBLAS Vectors and Matrices operators are not presented as an algebraic system (for instance “-1”, “|| ||” are not offered because of the problems of vectors/matrices expression lazy evaluation)

An acceptable approach may be proposed as follow:– The results of “-1”, “|| ||” and so on computation are considered as

attributes of matrix and vector classes – These attributes are computed out of the expression computation by

outside control if it is necessity– Expression’s constituents with these operators are replaced by

extraction needed items from corresponded attributes

The direction to Vectors and Matrices expression is very promising

The myths about lazy evaluations

1. Lazy evaluations is possible

only in functional languages

2. Lazy evaluations and

functional languages may be

applied only in artificial

intelligence area

3. We are able to realize lazy

evaluations using an arbitrary

programming language

4. Using lazy evaluations

decreases performance

Examples above indicate that it is not right

As we have seen it is not right: high order functions using is very prospective in many cases

This statement depends on quality of algorithms programming only

Language states a lot of limitation for lazy evaluations using*

* C++ is subjected to criticism from this point of view by blitz++ project developers

Q&A

Documents

Lazy Evaluation in Numeric Computing 20 Октября 2006 Игорь Николаевич Скопин