55
Approches fonctionnelles de la programmation parallèle et des méta- Frédéric Gava Sous la direction de Frédéric Loulergue

Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs

Embed Size (px)

DESCRIPTION

Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification. Background. Implicit. Explicit. Automatic parallelization. Skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. - PowerPoint PPT Presentation

Citation preview

Page 1: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

Approches fonctionnelles

de la programmation parallèle

et des méta-ordinateurs

Sémantiques, implantations et certification

Frédéric Gava Sous la direction de Frédéric Loulergue

Page 2: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

2/51

Parallel programming

Implicit Explicit

Data-parallelismParallel

extensionsConcurrent

programmingAutomatic

parallelizationSkeletons

Background

Page 3: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

3/51

Projects

• 2002-2004

• ACI Grid

• 4 partners

• Design of parallel and grid libraries of primitives for OCaml with applications to distributed SGBD and numeric computations

• 2004-2007

• ACI Young researchers

• Production of a programming environment in which certified parallel programs can be written, proved and safely executed

Page 4: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

4/51

Outline

• Introduction

I. Semantics of BSML and certification

II. Extensionsa. New primitives : parallel composition & parallel IO

b. Library of parallel data structures

III. Globalized operations

• Conclusion and future work

Page 5: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

5/51

Introduction

Page 6: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

6/51

Characterized by:– p number of processors– r processors speed– L global synchronization– g communication phase (1 word at most

sent or received by each processor)

BSP architecture:

The BSP model

P/M P/M P/M P/M P/M

Network

Synchronization unit

Page 7: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

7/51T(s) = (max0i<p wi) + hg + L

BSP model of execution

Page 8: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

8/51

-calculus ML

BS-calculus

Parallel constructions

BSML

Parallel primitives

• Structured parallelism as an explicit parallel extension of ML

• Functional language with BSP cost predictions

• Allows the implementation of skeletons

• Implemented as a parallel library for the "Objective Caml" language

• Using a parallel data structure called parallel vector

The BSML language

Page 9: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

9/51

fp-1…f1f0

gp-1…g1g0

Parallelpart

Sequentialpart

Replicatedpart

A BSML program

Page 10: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

10/51

mkpar: (int )par

f (p-1)…(f 1)(f 0)(mkpar f )

apply: ( ) par par par

fp-1…f1f0

vp-1…v1v0

fp-1 vp-1…f1 v1f0 v0

apply

Asynchronous primitives

Page 11: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

11/51

put: (int option) par(int option) par

NoneNoneSome v4Some v1

NoneNoneSome v3None

NoneSome v5NoneNone

NoneNoneSome v2None

3210

NoneNoneNoneNone

NoneNoneSome v5None

Some v4Some v3NoneSome v2

Some v1NoneNoneNone

3210

put

proj: option par(int option)

vp-1…v1v0

proj

fsuch that (f i)=vi

Synchronous primitives

Page 12: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

12/51

Semantics and certification

Page 13: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

13/51

Natural semantics

Small steps semantics

Distributed semantics

Abstract machine

Programming model

Easy for proofs

Easy for costs

Make asynchronous steps appear

Execution modelClose to a real

implementation

Outline

Page 14: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

14/51

Expression of our mini language :

e ::=.e functional core language | (e e) | … | (mkpar e) parallel primitives | … | <e, e, … , e> parallel vector | (e)[s] substitution | .e[s] closure

Mini language

Page 15: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

15/51

• Semantics = set of axioms and inference rules• Easy to understand, makes proofs more easy• Example:

Natural semantics

Confluent

Page 16: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

16/51

• Semantics = set of rewriting rules• Using contexts for the strategy• Easier understanding of costs and errors• Example:

Confluent (costs and values)Equivalent to the previous semantics

Small steps semantics

Global cost

Local costs

Page 17: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

17/51Distributed evaluation

scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog

Distributed semantics• Semantics = set of parallel rewriting rules• SPMD style:

Small steps

scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog

Parallel vector

Parts of the Parallel vector

ConfluentEquivalent to the previous semantics

Page 18: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

18/51

PUSHSWAPPIDCONSAPPSENDPUSHSWAPAPP

CAMPUSHSWAPPIDCONSAPPSENDPUSHSWAPAPP

CAMPUSHSWAPPIDCONSAPPSENDPUSHSWAPAPP

CAM

PID of the machine

for mkpar

Synchronous

instruction

for put

Minimal set of parallel instructionsEquivalence with the distributed semantics

BSP-CAM = p*CAM + BSP instructions (style SPMD)

Abstract machine

COMMUNICATIONS

Page 19: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

19/51

• The Coq Proof assistant: Typed-calculus with dependent types Specification = term (goal) Language of tactics to build a proof of this goal Extraction of the proof (certified program)

• BSML and Coq : Axiomatization of the primitive semantics in Coq Proof of BSML programs as usual proof of ML

programs Certification and extraction of BSML programs:

a) Broadcast, total exchange …b) Prefixesc) Sort

Certification of BSML programs

Page 20: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

20/51

Example: replicate

Specification of replicate:

intros T a.exists (mkpar T (fun pid: Z a)).rewrite mkpar_def.

Certified extraction: let replicate a = mkpar (fun pid a)

Page 21: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

21/51

Extensions and parallel data structures

Page 22: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

22/51

BSML

• New primitive• Divide-and-conquer• Properties

Parallel composition

Parallel Data-structures• Simplify programming• OCaml interfaces• Load-balancing

• Confluent semantics• Two equivalent semantics

Implemented with

Outline

External memory (IO)• New primitives• New cost model• Property

Confluent semantics

Page 23: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

23/51

• Several programs on the same machine

• New primitives for parallel composition:– Superposition– Juxtaposition (implemented with the superposition)

• Divide-and-conquer BSP algorithms

Multiprogramming

Page 24: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

24/51

• super : (unit (unit )

super E1 E2 = (E1 (), E2())

• Fusion of communications/synchronization

• Preserves the BSP model

• Pure functional semantics

Parallel superposition

Page 25: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

25/51

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

E1 E2 super E1 E2

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

ConfluentBSPEquivalence

Parallel superposition

Page 26: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

26/51

Example: parallel prefixes

Size of the polynomials

Time(s)

Direct version (BSML+MPI)Superposition version

Juxtaposition version

Page 27: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

27/51

• Observations:

Data Structures are as important as algorithms

Symbolic computations use these data structures massively

• A parallel implementation of data structures:

Interfaces as close as possible to the sequential ones

Modular implementation to get a straightforward maintenance

Load-balancing of the data

Parallel data structures

Page 28: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

28/51

• 5 modules: Set, Map, Stack, Queue, Hashtable

• Interfaces:

Same as in OCaml

With some specific parallel functions such as parallel reductions

• A parallel data structure = one data structure on each processor

• Manual or Automatic load-balancing:

To get similar sizes of the local data structures

Better performances for parallel iterations

A two super-steps algorithm using histograms

Parallel data structures

Page 29: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

29/51

Computation of the “nth” nearest neighbors atom in a molecule :

Example

Number of atoms

Time(s)

Sequential version

Parallel version (BSML+PUB)

Page 30: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

30/51

Example with load balancing

Number of atoms

Time(s)

Without balancing

With balancing

Page 31: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

31/51

Motivations :

External memories

Number of elements

Time(s)

Measured

Predicted

Page 32: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

32/51

Disc 1

Bus

Processor

Memory

Disc 2

Disc D

P/MP/M P/M P/M P/M

Network

We add to the BSP model: • D = the number of disks• B = the size of the blocs• O = latency of the disks• G = time to read/write a byte

The EM-BSP model

Page 33: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

33/51

P/M P/M P/M P/M P/M

Network

Disc 1 Disc 2 Disc M

We add to the BSP model: • Shared disks• With parameters similar to those of the local disks

Shared disks

Page 34: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

34/51

• For safety, two kinds of files: local and global ones

• New primitives to manipulate these files (IO primitives)

New semantics Confluent EM-BSP cost of the primitives

External memory in BSML

Page 35: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

35/51

BSMLlib

PUBMPI TCP/IP Threads

Comm SuperIO

Low

er le

vel

Primitives Std library

Modular implementation

Parallel datastructures

Page 36: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

36/51

Cost prediction

Number of elements

Time(s)

ListsArrays

Predicted (max)Predicted (avg)

Page 37: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

37/51

IO cost prediction

Number of elements

Time(s)

Predicted BSML

Measured BSML-IO

Predicted BSML-IO

Page 38: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

38/51

Globalized operations

Page 39: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

39/51

BSML DMML

+MSPML

Desynchronize

SemanticsCost modelsImplementations

Outline

Page 40: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

40/51

• Using the MPM model (parameters similar to that of BSP)

• But with a different execution model:

• Same language as BSML (parallel vector) but with new primitives of communication: put mget

MSPML

Page 41: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

41/51

MSPML

Natural semantics

Small steps semantics

Distributed semantics

Programming model

Easy for proofs

Easy for costs

Execution model Makes

asynchronous steps appear

Similar to BSML

Very different

Similar to BSML

Page 42: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

42/51

Proc. 0 1 2

Empty

Local computation

get v 1

0,v’

Environment of Communications

0,v 0,v’’

Asynchronous communications

communication

request 0 1v’

A bit later

Page 43: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

43/51

Proc. 0 1 2

empty 0,v’0,v’’1,w’2,w’’

0,v0,v’1,w

request 2 0Not ready

Asynchronous communications

Page 44: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

44/51

BSML

MSPML

BSML

BSML

Intranet

Departmental meta-computing

Page 45: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

45/51

• BSML+ MSPML-like for coordination

• Two kinds of vectors: parallel vector: par departmental vectors: dep

• Operational semantics (confluent)

• Performance model (the DMM model)

• Implementation

Departmental Meta-computing ML

Page 46: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

46/51

• Computation of the prefixes where each processor contains a value

• Naive method: each processor sends its value to other processors

• Better method:

1) Each BSP unit computes a parallel prefix

2) One processor of each BSP unit receives values of other units

3) Each BSP unit finishes its computation with this value

Example: departmental prefixes

Page 47: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

47/51

Experiments

Size of the polynomials

Time(s)

Better algorithm

Naive algorithm

BSP algorithm (one cluster)

Page 48: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

48/51

Conclusion and future work

Page 49: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

49/51

Conclusion

I. Semantics of BSML:1) Confluent and equivalent semantics2) Abstract machine3) Proof of BSML programs

II. Expressivity:1) Parallel composition2) Parallel data structures 3) Parallel IO

III. Meta-computing:1) Desynchronization of BSML (MSPML)2) Departmental Meta-computing ML (DMML)

SemanticsCost modelsImplementations

Page 50: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

50/51

• Cost prediction:1. Static analysis of the programs2. Cost prediction of certified programs

• Proofs of BSP imperative programs:

Future work in the Propac project

IMP ML

Coq Program correction

Extension with BSP

operations

BSML

Extension of the logical assertions

Page 51: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

51/51

• Design of parallel model checkers for High-level Petri Nets

• Using BSML to implement a toolkit:

a) Using the BSP model to dynamically load-balance

b) Using a modular and generic implementation to

ease the use of this toolkit

• Using the Propac tools to certify this implementation

Vérification efficace par Interaction de

Techniques (VITE)

Page 52: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

Merci de votre attention

Page 53: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

53/51

Proofs of programs (with Coq)

Natural semantics

BSML MSPML

Small steps semantics

Distributed semantics

CAM

BSP

MPM

PUB MPI TCP/IP TCP/IP

Programming model

Usefullfor costs

Execution model

Execution model

BSML and MSPML

Page 54: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

54/51

Place

Token

Transition

State

Arc

Petri nets

Page 55: Approches fonctionnelles de la programmation parallèle  et des méta-ordinateurs

55/51

BSMLParallel Semantics

Distributed evaluation

Abstract Machines

Parallel Implementation

Performance model

Design ofBSP-CAM

High Level Semantics

Nat Step Distr

SequentialImplemen-

tation

Coq Axioma-tisation

Proofs of BSML

programs

Dynamic cost analysis

Propac