Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs

Preview:

DESCRIPTION

Approches fonctionnelles de la programmation parallèle et des méta-ordinateurs Sémantiques, implantations et certification. Background. Implicit. Explicit. Automatic parallelization. Skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. - PowerPoint PPT Presentation

Citation preview

Approches fonctionnelles

de la programmation parallèle

et des méta-ordinateurs

Sémantiques, implantations et certification

Frédéric Gava Sous la direction de Frédéric Loulergue

2/51

Parallel programming

Implicit Explicit

Data-parallelismParallel

extensionsConcurrent

programmingAutomatic

parallelizationSkeletons

Background

3/51

Projects

• 2002-2004

• ACI Grid

• 4 partners

• Design of parallel and grid libraries of primitives for OCaml with applications to distributed SGBD and numeric computations

• 2004-2007

• ACI Young researchers

• Production of a programming environment in which certified parallel programs can be written, proved and safely executed

4/51

Outline

• Introduction

I. Semantics of BSML and certification

II. Extensionsa. New primitives : parallel composition & parallel IO

b. Library of parallel data structures

III. Globalized operations

• Conclusion and future work

5/51

Introduction

6/51

Characterized by:– p number of processors– r processors speed– L global synchronization– g communication phase (1 word at most

sent or received by each processor)

BSP architecture:

The BSP model

P/M P/M P/M P/M P/M

Network

Synchronization unit

7/51T(s) = (max0i<p wi) + hg + L

BSP model of execution

8/51

-calculus ML

BS-calculus

Parallel constructions

BSML

Parallel primitives

• Structured parallelism as an explicit parallel extension of ML

• Functional language with BSP cost predictions

• Allows the implementation of skeletons

• Implemented as a parallel library for the "Objective Caml" language

• Using a parallel data structure called parallel vector

The BSML language

9/51

fp-1…f1f0

gp-1…g1g0

Parallelpart

Sequentialpart

Replicatedpart

A BSML program

10/51

mkpar: (int )par

f (p-1)…(f 1)(f 0)(mkpar f )

apply: ( ) par par par

fp-1…f1f0

vp-1…v1v0

fp-1 vp-1…f1 v1f0 v0

apply

Asynchronous primitives

11/51

put: (int option) par(int option) par

NoneNoneSome v4Some v1

NoneNoneSome v3None

NoneSome v5NoneNone

NoneNoneSome v2None

3210

NoneNoneNoneNone

NoneNoneSome v5None

Some v4Some v3NoneSome v2

Some v1NoneNoneNone

3210

put

proj: option par(int option)

vp-1…v1v0

proj

fsuch that (f i)=vi

Synchronous primitives

12/51

Semantics and certification

13/51

Natural semantics

Small steps semantics

Distributed semantics

Abstract machine

Programming model

Easy for proofs

Easy for costs

Make asynchronous steps appear

Execution modelClose to a real

implementation

Outline

14/51

Expression of our mini language :

e ::=.e functional core language | (e e) | … | (mkpar e) parallel primitives | … | <e, e, … , e> parallel vector | (e)[s] substitution | .e[s] closure

Mini language

15/51

• Semantics = set of axioms and inference rules• Easy to understand, makes proofs more easy• Example:

Natural semantics

Confluent

16/51

• Semantics = set of rewriting rules• Using contexts for the strategy• Easier understanding of costs and errors• Example:

Confluent (costs and values)Equivalent to the previous semantics

Small steps semantics

Global cost

Local costs

17/51Distributed evaluation

scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog

Distributed semantics• Semantics = set of parallel rewriting rules• SPMD style:

Small steps

scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vecscan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid

Prog

Parallel vector

Parts of the Parallel vector

ConfluentEquivalent to the previous semantics

18/51

PUSHSWAPPIDCONSAPPSENDPUSHSWAPAPP

CAMPUSHSWAPPIDCONSAPPSENDPUSHSWAPAPP

CAMPUSHSWAPPIDCONSAPPSENDPUSHSWAPAPP

CAM

PID of the machine

for mkpar

Synchronous

instruction

for put

Minimal set of parallel instructionsEquivalence with the distributed semantics

BSP-CAM = p*CAM + BSP instructions (style SPMD)

Abstract machine

COMMUNICATIONS

19/51

• The Coq Proof assistant: Typed-calculus with dependent types Specification = term (goal) Language of tactics to build a proof of this goal Extraction of the proof (certified program)

• BSML and Coq : Axiomatization of the primitive semantics in Coq Proof of BSML programs as usual proof of ML

programs Certification and extraction of BSML programs:

a) Broadcast, total exchange …b) Prefixesc) Sort

Certification of BSML programs

20/51

Example: replicate

Specification of replicate:

intros T a.exists (mkpar T (fun pid: Z a)).rewrite mkpar_def.

Certified extraction: let replicate a = mkpar (fun pid a)

21/51

Extensions and parallel data structures

22/51

BSML

• New primitive• Divide-and-conquer• Properties

Parallel composition

Parallel Data-structures• Simplify programming• OCaml interfaces• Load-balancing

• Confluent semantics• Two equivalent semantics

Implemented with

Outline

External memory (IO)• New primitives• New cost model• Property

Confluent semantics

23/51

• Several programs on the same machine

• New primitives for parallel composition:– Superposition– Juxtaposition (implemented with the superposition)

• Divide-and-conquer BSP algorithms

Multiprogramming

24/51

• super : (unit (unit )

super E1 E2 = (E1 (), E2())

• Fusion of communications/synchronization

• Preserves the BSP model

• Pure functional semantics

Parallel superposition

25/51

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

E1 E2 super E1 E2

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

ConfluentBSPEquivalence

Parallel superposition

26/51

Example: parallel prefixes

Size of the polynomials

Time(s)

Direct version (BSML+MPI)Superposition version

Juxtaposition version

27/51

• Observations:

Data Structures are as important as algorithms

Symbolic computations use these data structures massively

• A parallel implementation of data structures:

Interfaces as close as possible to the sequential ones

Modular implementation to get a straightforward maintenance

Load-balancing of the data

Parallel data structures

28/51

• 5 modules: Set, Map, Stack, Queue, Hashtable

• Interfaces:

Same as in OCaml

With some specific parallel functions such as parallel reductions

• A parallel data structure = one data structure on each processor

• Manual or Automatic load-balancing:

To get similar sizes of the local data structures

Better performances for parallel iterations

A two super-steps algorithm using histograms

Parallel data structures

29/51

Computation of the “nth” nearest neighbors atom in a molecule :

Example

Number of atoms

Time(s)

Sequential version

Parallel version (BSML+PUB)

30/51

Example with load balancing

Number of atoms

Time(s)

Without balancing

With balancing

31/51

Motivations :

External memories

Number of elements

Time(s)

Measured

Predicted

32/51

Disc 1

Bus

Processor

Memory

Disc 2

Disc D

P/MP/M P/M P/M P/M

Network

We add to the BSP model: • D = the number of disks• B = the size of the blocs• O = latency of the disks• G = time to read/write a byte

The EM-BSP model

33/51

P/M P/M P/M P/M P/M

Network

Disc 1 Disc 2 Disc M

We add to the BSP model: • Shared disks• With parameters similar to those of the local disks

Shared disks

34/51

• For safety, two kinds of files: local and global ones

• New primitives to manipulate these files (IO primitives)

New semantics Confluent EM-BSP cost of the primitives

External memory in BSML

35/51

BSMLlib

PUBMPI TCP/IP Threads

Comm SuperIO

Low

er le

vel

Primitives Std library

Modular implementation

Parallel datastructures

36/51

Cost prediction

Number of elements

Time(s)

ListsArrays

Predicted (max)Predicted (avg)

37/51

IO cost prediction

Number of elements

Time(s)

Predicted BSML

Measured BSML-IO

Predicted BSML-IO

38/51

Globalized operations

39/51

BSML DMML

+MSPML

Desynchronize

SemanticsCost modelsImplementations

Outline

40/51

• Using the MPM model (parameters similar to that of BSP)

• But with a different execution model:

• Same language as BSML (parallel vector) but with new primitives of communication: put mget

MSPML

41/51

MSPML

Natural semantics

Small steps semantics

Distributed semantics

Programming model

Easy for proofs

Easy for costs

Execution model Makes

asynchronous steps appear

Similar to BSML

Very different

Similar to BSML

42/51

Proc. 0 1 2

Empty

Local computation

get v 1

0,v’

Environment of Communications

0,v 0,v’’

Asynchronous communications

communication

request 0 1v’

A bit later

43/51

Proc. 0 1 2

empty 0,v’0,v’’1,w’2,w’’

0,v0,v’1,w

request 2 0Not ready

Asynchronous communications

44/51

BSML

MSPML

BSML

BSML

Intranet

Departmental meta-computing

45/51

• BSML+ MSPML-like for coordination

• Two kinds of vectors: parallel vector: par departmental vectors: dep

• Operational semantics (confluent)

• Performance model (the DMM model)

• Implementation

Departmental Meta-computing ML

46/51

• Computation of the prefixes where each processor contains a value

• Naive method: each processor sends its value to other processors

• Better method:

1) Each BSP unit computes a parallel prefix

2) One processor of each BSP unit receives values of other units

3) Each BSP unit finishes its computation with this value

Example: departmental prefixes

47/51

Experiments

Size of the polynomials

Time(s)

Better algorithm

Naive algorithm

BSP algorithm (one cluster)

48/51

Conclusion and future work

49/51

Conclusion

I. Semantics of BSML:1) Confluent and equivalent semantics2) Abstract machine3) Proof of BSML programs

II. Expressivity:1) Parallel composition2) Parallel data structures 3) Parallel IO

III. Meta-computing:1) Desynchronization of BSML (MSPML)2) Departmental Meta-computing ML (DMML)

SemanticsCost modelsImplementations

50/51

• Cost prediction:1. Static analysis of the programs2. Cost prediction of certified programs

• Proofs of BSP imperative programs:

Future work in the Propac project

IMP ML

Coq Program correction

Extension with BSP

operations

BSML

Extension of the logical assertions

51/51

• Design of parallel model checkers for High-level Petri Nets

• Using BSML to implement a toolkit:

a) Using the BSP model to dynamically load-balance

b) Using a modular and generic implementation to

ease the use of this toolkit

• Using the Propac tools to certify this implementation

Vérification efficace par Interaction de

Techniques (VITE)

Merci de votre attention

53/51

Proofs of programs (with Coq)

Natural semantics

BSML MSPML

Small steps semantics

Distributed semantics

CAM

BSP

MPM

PUB MPI TCP/IP TCP/IP

Programming model

Usefullfor costs

Execution model

Execution model

BSML and MSPML

54/51

Place

Token

Transition

State

Arc

Petri nets

55/51

BSMLParallel Semantics

Distributed evaluation

Abstract Machines

Parallel Implementation

Performance model

Design ofBSP-CAM

High Level Semantics

Nat Step Distr

SequentialImplemen-

tation

Coq Axioma-tisation

Proofs of BSML

programs

Dynamic cost analysis

Propac

Recommended