Parallel Architectures: Topologies
Heiko Schröder, 2003
Heiko Schröder, 2003 Parallel Architectures 2
Types of sequential processors (SISD)
Types of sequential processors (SISD)
processor memory
processor memory
memory
memory
cache memory
memory
memory
processor
Von Neumann bottleneck
Heiko Schröder, 2003 Parallel Architectures 3
SIMD MIMDSIMD MIMD
PE
PE
PE
PE
PE
Globalcontrol unit
Interconnection network
PE +control unit
PE +control unit
PE +control unit
PE +control unit
Interconnection network
SPMDSIMD
Heiko Schröder, 2003 Parallel Architectures 4
Message passing /shared address space
Message passing /shared address space
PE + Mcontrol unit
PE + Mcontrol unit
PE + Mcontrol unit
PE + Mcontrol unit
Interconnection network
P
P
P
P
P
M
M
M
M
Interconnection network
P/M
Heiko Schröder, 2003 Parallel Architectures 5
Various communication networksState of the art technologyImportant aspects of routing schemesKnown results (theory)
The internet
Heiko Schröder, 2003 Parallel Architectures 6
Desirable feature of a network
1. Algorithmic•Low diameter (1, complete graph)•High bisection width (complete graph) n(n-1)/2 edges
Degree n-1
2. Technical•Low degree (pin limitations – constant – modular – mesh) •Short wires (mesh)•Small area (mesh) •Regular structure (mesh)
Heiko Schröder, 2003 Parallel Architectures 7
Diameter n-1Bisection width 1
Connection networks IConnection networks I
1-D mesh (linear array)
Heiko Schröder, 2003 Parallel Architectures 8
TreeDiameter 2(log n)Bisection width 1
Heiko Schröder, 2003 Parallel Architectures 9
H-treeH-tree
Area: O(n)Longest wire :O(n)
Clock distribution
Heiko Schröder, 2003 Parallel Architectures 10
2-D Mesh
Diameter:
Bisection width :
n
n
Heiko Schröder, 2003 Parallel Architectures 11
TorusTorus
1 2 3 4 5 6 7 8 1 8 2 7 3 6 4 5 12345678
18273645
1 2 3 4 5 6 7 8
Reduced diameterIncreased bisection widthAll nodes equivalentLong wires?
Heiko Schröder, 2003 Parallel Architectures 12
3-D Mesh
Diameter:
Bisection:
n3
23 n
Heiko Schröder, 2003 Parallel Architectures 13
HypercubeHypercube
0-D0
11-D
00
01
10
112-D
000 010
001 011
100 110
101 111
3-D
0 1
4-D
diameter log nbisection width n/2
Heiko Schröder, 2003 Parallel Architectures 14
Cube Connected CyclesCube Connected Cycles
6424 4 nodes
# nodes kk 2*
204828 8 nodes
k2Diameter>
12 kbisection
Heiko Schröder, 2003 Parallel Architectures 15
Exchange (lsb)Shuffle (rotate -- left or right)
000 001
100
010 011
101
110 111
8-node shuffle-exchange graph
Degree: 3Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges
Bisection width: (n / log n)
Heiko Schröder, 2003 Parallel Architectures 16
0000 0001 1110 11110100 0101 1010
1000
0010
1001
0011 0110
1100
1011
0111
1101
Exchange (lsb)Shuffle (rotate -- left or right)
16-node shuffle-exchange graph
u1u2…uk-1uk
exu1u2…uk-1v1
uk v1v2…vk-1…u2…uk v1v2
ls+ex
v1v2…vk
ls+ex
Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges
Bisection width: (n / log n)
Degree: 3
Heiko Schröder, 2003 Parallel Architectures 17
u1u2…uk-1uk u2u3…uk-1uk00
u1u2…uk-1uk u2u3…uk-1uk11
3-dimensional de Bruijn graph
In-degree = out-degree = 2Diameter: log nBisection width: (n / log n)
Each Eulerian tour = De Bruijn sequence = contains each possible sub-string of length 4 exactly once
1111001011010000 De Bruijn sequence
000
100
001
111
110
101010
011
0
0 0
0
0
0
0
0
1
1
1
1 1
1
1
1
Heiko Schröder, 2003 Parallel Architectures 18
Butterfly networkButterfly network
Unique path
FFTrouting
sorting
Heiko Schröder, 2003 Parallel Architectures 19
Benes networkBenes network
Heiko Schröder, 2003 Parallel Architectures 20
Mesh of treesMesh of trees
Diameter (log n)Bisection width ( ) n
Heiko Schröder, 2003 Parallel Architectures 21
The Power of Hypercubes The Power of Hypercubes
4-D
•Hamiltonian cycle•Gray codes•k-D meshes (tori), N-nodes•simulates mesh of trees•simulates hypercubic networks•contains complete binary tree, almost•normal algorithms
Heiko Schröder, 2003 Parallel Architectures 22
Hamiltonian CycleHamiltonian Cycle
A hypercube contains a Hamiltonian cycle -- proof by induction.
Each Hamiltonian cycle corresponds to a Gray code (only one bit is changed per link).
Heiko Schröder, 2003 Parallel Architectures 23
Gray codeGray code
01
00011110
000001011010110111101100
reflection
Heiko Schröder, 2003 Parallel Architectures 24
Hypercube contains meshes/toriHypercube contains meshes/tori
20
30
21
31
23
33
22
32
10
00
11
01
13
03
12
02wrap around
Theorem:Any n1 x n2 x … x nk mesh (with or without wrap arounds) is a sub-graph of an n-D hypercube if ni = 2n .Proof: (see Leighton: Each sub-cube has Hamiltonian cycle)
Heiko Schröder, 2003 Parallel Architectures 25
Hypercube contains double-rooted treesHypercube contains double-rooted trees
HC can implement all tree algorithms and also all mesh-of-tree-algorithms (possibly with minor delay).
double-roots (different dimension)
Heiko Schröder, 2003 Parallel Architectures 26
Normal algorithmsNormal algorithms
A hypercube algorithm is said to be normal if •only one dimension of hypercube edges is used at any step and •if consecutive dimensions are used in consecutive steps.
•Most hypercube algorithms are normal.•Normal algorithms can be embedded efficiently on hypercubic networks
Heiko Schröder, 2003 Parallel Architectures 27
0 1 23
4
5
6
7
8
9
10
11
1213
14151617
1819
2021
22
23
24
25
26
27
2829
3031
Josephus graph:Every even node k is connected to k+2i-3Diameter: about (log n) / 2
1
1
1
1
1
2
2
2
2
2
2
22
2
2
2
2
2
2
2
Heiko Schröder, 2003 Parallel Architectures 28
1234 32142314
1324
3124
2134
4132
1432
3412
4312
1342314221434123
1423
2413
4213
1243
3241
2341
4321
3421
2431
4231 Star graph:
Set of nodes: k! nodes of degree k-1.Permutations of k elements.
Set of edges: Exchange of first element with one other.
Small degree, diameter about 2 log n .
Open problems:E.g. are there (k-1)/2 edge disjoint Hamiltonian cycles?
Number of nodes versus degree (Star/HC):24, 120, 720, 4340, 34720, 31248016, 32, 64, 128, 256, 512
Heiko Schröder, 2003 Parallel Architectures 29
pin - limitationspin - limitations
14-D
12
192
16
256
16
Heiko Schröder, 2003 Parallel Architectures 30
wiring - limitationswiring - limitations
4-D
12
1
216 nodes
bisection width: 256 32 K 25cm 32 m
Heiko Schröder, 2003 Parallel Architectures 31
Improve the topology?
The internet
Heiko Schröder, 2003 Parallel Architectures 32
against parallelismagainst parallelism
• cost(large) < cost (2 small)
• all the FORTRAN / C software
• let’s stick to pipelining
• let’s wait for faster machines
• Amdahl’s Law