Parallel System for Interactive Multi-Experiment Computational Studies (pSIMECS)

Preview:

Citation preview

Parallel System for Interactive Multi-Experiment

Computational Studies(pSIMECS)

Simecs – Problem Description

● Multi-Experiment Computational Studies:– Computational Studies involving multiple

experiments, each corresponding to an individual execution of a simulation software

● Example: Design Space Exploration– Goal: Given a set of possible parameter values (a

parameter space), an experiment that maps a parameter value to a performance metric, find a subset of the parameter space whose performance metrics fit certain criteria.

Simecs – Problem Description

● Model Application: Pareto Frontier Discovery. ● Pareto Frontier is a set of points on the parameter

space that is not completely dominated by any other point in the parameter space.– p “completely dominates” q iff there is all

components in p's performance metric perform better than q's.

Simecs – Pareto Frontier Insights

● Simulations are independent – embarrassingly parallel

● An experiment corresponds to an execution of a simulation software, which can itself be parallel or sequential

● Result from one simulation can be used to speed up simulations of nearby parameter values (e.g., as initial guess for Newton Iteration.)

Simecs – Pareto Frontier Insights

● Decisions can be made with imprecise results: can trade off precision Vs resources

● If parameter space is large, sweeps are inefficient.● Need to prune portions of the space as the study

progresses, either automatically or interactively. ● Active Sampler can automatically pick

"interesting" simulations (e.g., close to boundary)

Simecs – Example Problem

● Bridge design computational study: 1D bridge in 2D space, with end points clamped. Two elastic supports are added to the middle of bridge.

● Parameter space: distance of the two supports from the end of the bridge.

● Performance measures: maximum deflection of the bridge, and the cost of supports

● Bridge is clamped at all support points, with bending and stretching forces, and uniform load.

Simecs – Example Problem

Test Problem.Parameter: <r

0, r

1>

Performance metric: <max

0<r<Lf(r), c(r

0 ) + c(r

1)>.

Cost function: c(r)

Simecs – Goal

● Simecs: Software on parallel systems that manages simulation processes in a Multi-Experiment Computational Study.

● Frees users and application developers from micromanaging every simulation process

● Goal: Interactive, Steerable Design Space Exploration

Simecs – User View

● Two types of parameters – technique parameters (e.g., discretisation of nodes,

convergence tolerance) – model parameters (e.g., young's modulus of a

material, viscosity of a fluid).

● Goal: As the Pareto frontier obtained from one set of parameters is forming, the user can switch to another setup and continue the study. – e.g., Limit the exploration space but increase the

resolution.

Simecs – Developer View

● Application Developer provides 3 modules:– Simulation: Maps a parameter space point to

performance space point– Visualisation & interaction: Displays the relevant

information to user; Collects information from user, and maps the information into the Simulation module

– Transformation: Transform a state of a simulation on one technique parameter into another.

● e.g., interpolate checkpoints from different resolutions

Simecs – System View

● Shared object layer, Active sampler, Resource Allocator

Simecs – System View

● Shared object space layer: System-wide repository of shared objects (e.g., checkpoints, error estimations, results)

● Sampler: Based on users' specifications, issues sample points where simulations will be run

● Resource Allocator / Manager: Maps simulations into computing elements, decides whether to use a checkpoint.

Simecs – SISOL

● Spatially-Indexed Shared Object Layer (SISOL)● Used for storing system-wide shared objects.● For the model problem, checkpoints, and results

(performance metric at each parameter point). ● <Index, object set id> names a unique object in

the system.

Simecs – SISOL

● Objects are typed: SISOL requires pack() and unpack() implementations for each type. For parallel object types, also requires a function to map parallel objects into different decompositions.

● Supports split-phase create, delete, read and write: to enforce read-modify-write consistency

● Supports neighborhood query

Simecs – SISOL Implementation

● Ideal implementation: directory-based cache, where each node participates in storing of objects.

● Current implementation: – Single TCP Server – In core– Hash-map based lookup– Linear lookup for nearest neighbor– Supports only sequential objects

Simecs – SISOL Implementation

– Object sets created on server– Nearest neighbor query retrieves coordinates only– Supports Sequential Petsc Vector object type by

default.

● Sufficient for small sets, small objects

Simecs – SISOL Use

● Current Pareto Frontier problem uses two object sets:– Result set (parameter point => performance metric) – Checkpoint set (parameter point => Sequential Petsc

vectors)

● In the test problem, parameter point is a 2D vector, so result set & checkpoint set have 2D indices.

Simecs – FUEL

● Frame/Update Exchange Layer: Control layer between the manager and simulation processes

● Codes that represent a functional aspect of a steerable application are grouped together (called a Satellite).

● Event-based on manager process; Poll-based on simulation processes

● Dynamic model: Satellites can be activated and decommissioned as a simulation is running

Simecs – FUEL Interaction

● As simulator runs one simulation for a parameter point, the manager is processing the last one(s). Simulator Process

Manager Process

Calculate point X

Query Sampler, gets point Y

Time

Register X result, Query Sampler, get point Z

Calculate point Y Calculate point Z

Register Y result, Query Sampler, get point A

X resultY

Z resultA

Y resultZ

Simecs – Active Sampler

● Resolves the pareto frontier progressively– Maintains a task queue and a result set– Task queue = points in parameter space of interest,

result set = points discovered so far that are undominated (i.e., current pareto set candidates)

– Seeds a task queue with points from a lattice on the parameter space.

– Run the task queue.

Simecs – Active Sampler

– For each result that comes back, decide if the point is undominated by all points in the result set. If so, remove all points in the result set that are dominated by it, add it to the result set, and insert its lattice neighbors into the task queue.

– Continue until task queue is empty. – Refine the lattice, then repeat

● Effect: result set contains a set of pareto point candidates that had originated from a lattice. The lattice is finer as more time is spent.

Simecs – Active Sampler

Initial Grid

Simecs – Active Sampler

1st level results

Simecs – Active Sampler

First Level Pareto Frontier

Simecs – Active Sampler

First Refinement

Simecs – Active Sampler

2nd level results

Simecs – Active Sampler

Second level Pareto Frontier

Simecs – Active Sampler

2nd Refinement

Simecs – Active Sampler

3rd level results

Simecs – Active Sampler

3rd level Pareto Frontier

Simecs – Manager

● Spawns off simulation processes● When the result of a simulation comes back (via a

FUEL callback):– Registers the result– Asks active sampler for the next point to run– Looks up the SISOL for a checkpoint to jump-start

the next point– Sends the parameters of the next simulation,

coordinates of the checkpoint, and error tolerances to the simulation process.

Simecs – Test System

● Single Server implementation of SISOL to store checkpoint set

● 3 Versions Samplers: Active, Random, and Sweep

● TCP-based FUEL● Simulation implemented with PETSc SNES

solver. ● Jump-start from Checkpoints = use checkpoint's

configuration as the starting guess

Simecs – Test System

● Heterogenous cluster: – 1 1.5GHz Athlon node (manager, SISOL Server), – 22 1.2GHz Duron nodes (simulation processes)– 10 3 GHz Pentium 4 nodes. (simulation processes)– 100Mbps switched Ethernet network between Athlon

and Duron nodes, 10Mbps Ethernet between Pentium 4 nodes.

Simecs – Test Result (Sampler)

● Active Sampler compared against: 1) Grid-based sampler, which performs a parameter sweep on the grid with increasing refinement, 2) Random sampler

● Both run for 1500 simulations, and the partial frontiers are dumped at periodic intervals. Housedorff distance is measured, using the final Active Sampler-based frontier with 1500 simulations as the ground truth.

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Result (Sampler)

Simecs – Test Results (Sampler)

Simecs – Test Results (Sampler)

Simecs – Test Results (Sampler)

Simecs – Test Results (Sampler)

Simecs – Test Results (Sampler)

Simecs – Test Results (Sampler)

Simecs - Test Result (Checkpoints)

● Cuts down number of iterations per simulation.

Simecs – Test Result (Scaling)

Duron nodes added (Slower speed, faster communication)

Simecs – Test Result (Scaling)

Simecs – Conclusions

● Multiple experiments can be managed automatically

● Interactive speed can be achieved via re-use of checkpoints, active sampling, and partial results – run time goes from 3088 seconds down to 17, and lower if partial frontiers can be used

Simecs – Conclusions

● TCP-based communication framework provides system with portability - can be used on heterogeneous clusters

● Spatially-indexed object sets are useful communication substrate

Simecs – Future work

● Distributed implementation of SISOL ● Parallelise individual simulations (SISOL

Support for Parallel Objects)● MPI-based communication for SISOL and FUEL● Interactivity

Recommended