Data Structures and Algorithms in Parallel Computing Lecture 4

Data Structures and Algorithms in Parallel Computing

Lecture 4

Parallel computer

• A parallel computer consists of a set of processors that work together on solving a problem

• Moore’s law has broken down in 2007 because of increasing power consumption– O(f3) where f is the frequency– Tianhe-2 has 3.12 million processor cores and consumes 17.8

MW– It is as cheap to put 2 cores on the same chip as putting just one– 2 cores running at f/2 consume only ¼ of the power of a single

core running at f– However it is hard to achieve the same total computational

speed in practice

Bulk Synchronous Parallelism

• BSP model 1st proposed in 1989 • Alternative to the PRAM model• Used for distributed memory computers• Fast local memory access• Algorithm developers need not worry about

network details, only about global performance• Efficient algorithms which can be run on many

different parallel computers

• Processors + network + synchronization• Superstep– Concurrent parallel computation– Message exchanges between processors– Barrier synchronization

• All processors reaching this point wait for the rest

Supersteps

• A BSP algorithm is a sequence of supersteps– Computation superstep

• Many small steps– Example: floating point operations (addition, subtraction, etc.)

– Communication superstep• Communication operations each transmitting a data word

– Example: transfer a real number between 2 processors

• In theory we distinguish between the 2 types of supersteps

• In practice we assume a single superstep

Communication superstep

• h-relation– Superstep in which every processor sends and receives

at most h data words– h=max{hs, hr}

• hs is the maximum number of data words sent by a processor

• hd is the maximum number of data words received by a processor

• Cost– T(h)=hg+l

• Where g is the time per data word and l is the global synchronization time

Time of an h-relation on a 4-core Apple iMac desktop

• Taken from www.staff.science.uu.nl/~bisse101/Book/PSC/psc1_2.pdf

Computation superstep

• T=w+l– Where w is the maximum number of flops of a

processor in a superstep, and l is the global synchronization time

• Processors with less than w flops wait idle

Total cost of a BSP algorithm

• Add cost of all supersteps• a+bg+cl– g and l are a function of the number of processors– a, b, and c depend on p and the problem size n

BSP implementations

• Google Pregel• MapReduce• Apache Giraph– Open source implementation of Pregel

• Apache Hama– Inspired from Pregel

• BSPLib• …

Pregel framework

• Computations consist of a sequence of iterations called supersteps

• During a superstep, the framework invokes a user defined function for each vertex which specifies the behavior at a single vertex V and a single superstep S The function can: – Read messages sent to V in superstep S-1 – Send messages to other vertices that will be received in superstep S+1 – Modify the state of V and of the outgoing edges – Make topology changes (add/remove/update edges/vertices)

http://people.apache.org/~edwardyoon/documents/pregel.pdf

Model of computation: Progress

• In superstep 0 all vertices are active• Only active vertices participate in a superstep– They can go inactive by voting for halt– They can be reactivated by an external message

from another vertex• The algorithm terminates when all vertices

have voted for halt and there are no messages in transit

Model of computation: Vertex

Pregel execution

Pregel executionMaster-worker paradigm

1. Partition graphI. Random hash basedII. Custom (mincut, …)

2. Assign vertices to workers (processors)3. Mark all vertices as active4. Each active vertex executes a Compute() function and delivers messages

which were sent in the previous superstep1. Get/Modify current vertex value using GetValue()/MutableValue()

5. Respond to master with active vertices for the next superstep

All workers execute the same code! Master is used for coordination

Fault tolerance

• At the start of each superstep the master instructs workers to save their state

• Master pings workers to see who is running• If failure is detected, master reasigns

partitions to available workers

Example: find largest value in a graph

What’s next?

• BSP model– SSSP, connected components, pagerank

• Vertex centric vs. subgraph centric• Load balancing– Importance of partitioning and graph type

• ...

Data Structures and Algorithms in Parallel Computing Lecture 4

Documents

Parallel algorithms for Partial Differential Equations · Parallel Algorithms for Partial Differential Equations ... I GAMG solver: generalised ... Parallel Algorithms for Partial

Data Structures and Algorithms in Parallel Computing Lecture 1

Parallel and Concurrent Programming Classical Problems ... · Data structures and Algorithms Marwan Burelle Introduction Dining Philosophers Problem Tasks Systems Data Structures

Efficient Data-Structures and Parallel Algorithms for ...cerin/slideshow.pdf · PCS - COLIMA, Sept 20-21, 2004 Efficient Data-Structures and Parallel Algorithms for Association Rules

DASH: Distributed Data Structures and Parallel Algorithms in a … · 2020. 7. 30. · DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space Karl Fürlinger,

Advanced Topics in Algorithms and Data Structures Page 1 An overview of lecture 3 A simple parallel algorithm for computing parallel prefix. A parallel

Parallel Write-Efficient Algorithms and Data Structures ... · algorithms and data structures in computational geometry. Here, optimal write-efficiency means that the number of writes

2.1 Parallel Algorithms

Data Structures and Algorithms in Parallel Computing Lecture 3

Data Structures and Methods for Unstructured Distributed ...roystgnr/parmesh_talk.pdf · 1 Introduction 2 Data Structures Unstructured Meshing Distributed Meshing 3 Algorithms Parallel

DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16)

Data Structures and Algorithms in Parallel Computing Lecture 8

Parallel vs Sequential Algorithms. Design of efficient algorithms A parallel computer is of little use unless efficient parallel algorithms are available

Parallel Algorithms and Data Structures CS 448 , Stanford ... · Parallel Algorithms and Data Structures CS 448!, Stanford University 20 April 2010 John Owens Associate Professor,

Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006

Data structures and algorithms for data-parallel

Parallel Peeling Algorithms - Yahoo Research Peeling Algorithms Jiayang Jiang Michael Mitzenmacher† Justin Thaler ‡ Abstract The analysis of several algorithms and data structures

Parallel Algorithms

CSE373: Data Structures & Algorithms Lecture 27: Parallel Reductions, Maps, and Algorithm Analysis Nicki Dell Spring 2014

(C) Ph. Tsigas 2003-2004© Ph. Tsigas 2003- 2004 Algorithm Engineering of Parallel Algorithms and Parallel Data Structures Philippas Tsigas