BOINC Workshop 10

Preview:

DESCRIPTION

VOLPEX. BOINC Workshop 10. Enabling interprocess communication for BOINC applications. Hien Nguyen, Eshwar Rohit University of Houston Supervisors: Dr. Jaspal Subhlok University of Houston Dr. David P. Anderson SSL – U.C, Berkeley. RESEARCH GOAL. - PowerPoint PPT Presentation

Citation preview

BOINC Workshop 10

Hien Nguyen, Eshwar Rohit

University of Houston

Supervisors:

Dr. Jaspal Subhlok

University of Houston

Dr. David P. Anderson

SSL – U.C, Berkeley

Enabling interprocess communication for BOINC applications

2

RESEARCH GOAL

Hien Nguyen University of Houston

•Enable BOINC to efficiently support apps that require interprocess communication.

Goals: Easier programming for communicating applicationsReduce execution time (not increase throughput)

3

Example Applications

Hien Nguyen University of Houston

REMD Protein Folding applicationEach process runs a standard molecular simulation at different temperature

270 280 290 300 310 320 330 340

280 270 290 300 320 310 340 330

280 290 270 300 320 340 310 330

P1 P2 P3 P4 P5 P6 P7 P8

STEP-1

STEP-2

STEP-3

4

Example Applications

Hien Nguyen University of Houston

Or many other applications:

Differential equation solvers (grid) (synchronous)Game playing with alpha/beta pruning (asynchronous)Search application.…..

Suitable applications: moderate amount of communication.

Synchronization point

5

DIFFICULTIES

Hien Nguyen University of Houston

Job execution

Fast host

Slow host

X

X

Slow down overall execution speed

X

XWorse as number of

host increases

6

OUTLINE

Hien Nguyen University of Houston

1. Volpex Dataspace Overview• IPC for volunteer environment

2. Integration With BOINC• Process management• Host selection

3. Future and Related Work

7

Volpex Dataspace

Hien Nguyen University of Houston

•Dataspace: global shared space that processes can use for information exchange without a temporal or spatial coupling.

Volpex Dataspace Server

Put(ABC, 800)

Get(ABC,?) (800)

8

Volpex Dataspace – Fault Tolerance

Hien Nguyen University of Houston

Put(ABC, 800)

Get(ABC,?) (800)

replicated

X

Volpex DSS is designed to support redundant Put/Get operations unlike Linda & variants

9

Volpex Dataspace

Hien Nguyen University of Houston

•Why centralized? Scale issues

•But: firewalls, no incoming connections

10

INTEGRATION WITH BOINC

Hien Nguyen University of Houston

•Mechanics: Process managementSimultaneous process startingFault toleranceCheckpoint/restart

•Policy: Host selection.

11

Job execution scheme

Hien Nguyen University of Houston

X

BOINC Scheduler

Volpex Dataspace Server

Get work

Put data item

Get data item

Get checkpoint

Put checkpoint

12

PROCESSES MANAGEMENT

Hien Nguyen University of Houston

•Simultaneous process starting:All processes start computation together: reduce wasted resources because processes will have to wait for eachothers.

Volpex jobs have highest (infinite) priority: uninterruptible by other jobs.

While waiting for all processes of a Volpex job to be ready: host can do other finite priority volunteer jobs.

Use of boinc_temporary_exit()

13

PROCESSES MANAGEMENT

Hien Nguyen University of Houston

•Fault tolerance

Dead instance spotted by heartbeat mechanism: process instances regularly send heartbeat to Volpex DSS.

BOINC scheduler recruits a new volunteer host to replace the dead one.

14

PROCESSES MANAGEMENT

Hien Nguyen University of Houston

•Hot spare policy: BOINC Scheduler

Volpex Dataspace Server

XFast replacement

Hot spare group

15

PROCESSES MANAGEMENT

Hien Nguyen University of Houston

•Checkpointing:

Process instance commits and uploads checkpoints to Volpex DSS (only stores latest checkpoint for each process).

Restarted process instance requests checkpoint from Volpex DSS.

16

HOST SELECTION

Hien Nguyen University of Houston

•Volpex job: consists of processes that form a job, submitted by scientist.•Has requirements on:

DeadlineNumber of processes

•Has estimates of:Total flops per process.Flops between 2 consecutive checkpoints.Memory usage, disk usage

17

HOST SELECTION POLICY

Hien Nguyen University of Houston

Criteria for selecting volunteer hosts to assign to a Volpex job: Speed and Availability.

Availability: the interval that a host is available w/o interruption (BOINC client allowed to compute).

18

HOST SELECTION POLICY

Hien Nguyen University of Houston

Job’s minimum requirements:

•Minimum speed : Fast enough to meet job’s deadlineMin speed = Total flops / Deadline

•Minimum expected availability : host is continuously available for x hours to commit at least 1 checkpoint.

19

Evaluate Host's Availability

Hien Nguyen University of Houston

•We want to predict host's length of availability interval.

•Method based on : Exploiting Non-Dedicated Resources for Cloud Computing Artur Andrzejak, Derrick Kondo, David P. Anderson. (NOMS10)

20

Evaluate Host's Availability

Hien Nguyen University of Houston

Last value predictor: simplistic predictor which uses the availability value in the last hourly interval before prediction as the prediction of availability for the next x hours interval.

Combined with ranking hosts by predictability: number availability changes per week.

21

Evaluate Host's Availability

Hien Nguyen University of Houston

In essence: select hosts which change availability very rarely.

A process assigned to a host with high predictability does not necessarily need to be replicated.

22

IMPLEMENTATION STATUS

Hien Nguyen University of Houston

•Volpex utilities: for scientists to submit, abort or query status of a Volpex job.

•Modified BOINC scheduler: includes host selection for Volpex job.

•Modified Volpex DSS: handling new type of requests.

23

IMPLEMENTATION STATUS

Hien Nguyen University of Houston

BOINC Server

submit job specs

Database

Create job & WU

Volpex Dataspace Server

X

Hot spare group

Scheduler request

Scheduler reply

dynamically create result from WU

BOINC Scheduler

heartbeat

checkpoint

request procID

get procID

procID of failed instance

replacement

Scientist

24

FUTURE WORK

Hien Nguyen University of Houston

Experiment and evaluate with different degrees of freedom:

•Number of processes 10-1M•Communication pattern (local/global, synch/asynch)•Size and frequency of communication

Goal: study (via live experiment or simulation) the performance of Volpex/BOINC over this space

Application: Eratosthenes, REMD Protein Folding

Enhance host selection policy

25

OTHER WORK

Hien Nguyen University of Houston

Volpex MPI: •An MPI library designed for executing parallel applications in volunteer environment.•Direct communication between processes.•Key Features

Controlled redundancy Receiver based direct communication Distributed sender based logging

More detail: “VolpexMPI: an MPI Library for Execution of Parallel Applications on Volatile Nodes” by Troy LeBlanc, Rakhi Anand, Edgar Gabriel, and Jaspal Subhlok.

26

If you have application?

Hien Nguyen University of Houston

We would be happy to cooperate with you.

Our team contacts:•Dr. Jaspal Subhlok: jaspal@uh.edu•Dr. David Anderson: davea@ssl.berkeley.edu•Dr. Edgar Gabriel: gabriel@cs.uh.edu•Hien Nguyen: hien.nguyen.nx@gmail.com•Eshwar Rohit: eshwar.rohit@gmail.com•Rakhi Anand: rakhi@cs.uh.edu

Our Website: http://www2.cs.uh.edu/~jsteach/volpex/index.htm

Recommended