Upload
phillip-nelson
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
CS668- Lecture 2 - Sept. 30Today’s topics
• Parallel Architectures (Chapter 2)• Memory Hierarchy• Busses and Switched Networks• Interconnection Network Topologies• Multiprocessors / Multicomputers• Flynn’s Taxonomy• Analysis of Interconnection Networks
Theoretic Computer Architectures
• Turing Machine• Von Neumann Architecture• Fetch/Execute Cycle• Memory Models• RAM model• PRAM model extension• Shared Memory vs. Distributed Shared
Memory vs. Distributed Memory
Processors and the Memory Hierarchy
• Registers (1 clock cycle, 100s of bytes)• 1st level cache (3-5 clock cycles, 100s KBytes)• 2nd level cache (~10 clock cycles, MBytes)• Main memory (~100 clock cycles, GBytes)• Disk (milliseconds, 100GB to gianormous)
registers
1st level Instructions
1st level Data
2nd Level unified (Instructions & Data)
CPU
IBM Dual Core
From Intel® 64 and IA-32 Architectures Optimization Reference Manualhttp://www.intel.com/design/processor/manuals/248966.pdf
Shared Memory Multiprocessor• One or more memories• Global address space (all system memory visible to all
processors)• Transfer of data between processors is usually implicit, just read
(write) to (from) a given address (OpenMP)• Complex Cache-coherency protocols to maintain consistency
between processors.
Interconnection Network
Memory
CPU
Memory
CPU
Memory
CPU
(UMA) Uniform-memory-access Shared-memory System
Distributed Shared Memory
• Single address space with implicit communication• Hardware support for read/write to non-local memories, cache
coherency• Latency for a memory operation is greater when accessing non local
data than when accessing date within a CPU’s own memory
(NUMA)Non-Uniform-memory-access Shared-memory System
Interconnection Network
MemoryCPU MemoryCPU MemoryCPU
Distributed Memory / Message Passing
• Each processor has access to its own memory only• Data transfer between processors is explicit, user calls message
passing functions• Common Libraries for message passing
– MPI, PVM• User has complete control/responsibility for data placement and
management
Interconnection Network
MemoryCPU MemoryCPU MemoryCPU
Hybrid Systems
• Distributed memory system with multiprocessor shared memory nodes.
• Most common architecture for current generation of parallel machines
Interconnection Network
CPU
Mem
ory
CPU
CPU
Network Interface
CPU
Mem
ory
CPU
CPU
Network Interface
CPU
Mem
ory
CPU
CPU
Network Interface
Flynn’s Taxonomy (figure 2.20 from Quinn)
SISDUniprocessor
SIMDProcessor arraysPipelined vector
processors
MISDSystolic array
MIMDMultiprocessorsMulticomputers
Single Multiple
Sin
gle
Mu
ltip
leData stream
Inst
ruct
ion
stre
am
Analysis of Switch Network Topologies
• View switched network as a graph– n - Vertices = processors or switches– m - Edges = communication paths
• Two kinds of topologies– Direct - ratio of switches to processors 1:1– Indirect - ratio is d:1
Evaluating Switch Topologies
• Diameter• Bisection width• Number of edges / node (d = degree)• Constant edge length? (yes/no)
– Layout area/wire length
2-D Mesh Network
• Direct topology• Switches arranged into a 2-D lattice• Communication allowed only between
neighboring switches• Variants allow wraparound connections
between switches on edge of mesh
2-D Meshes
Evaluating 2-D Meshes
• Diameter: (n1/2)
• Bisection width: (n1/2)
• Number of edges per switch: 4
• Constant edge length? Yes
Binary Tree Network
• Indirect topology• n = 2d processor nodes, n-1 switches
Evaluating Binary Tree Network
• Diameter: 2 log n
• Bisection width: 1
• Edges / node: 3
• Constant edge length? Yes/No?
Hypertree Network
• Indirect topology• Shares low diameter of binary tree• Greatly improves bisection width• From “front” looks like k-ary tree of height
d• From “side” looks like upside down binary
tree of height d
Hypertree Network
Evaluating 4-ary Hypertree
• Diameter: log n
• Bisection width: n / 2
• Edges / node: 6
• Constant edge length? No
Butterfly Network
• Indirect topology• n = 2d processor
nodes connectedby n(log n + 1)switching nodes
0 1 2 3 4 5 6 7
3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7
2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7
1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7
0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7Rank 0
Rank 1
Rank 2
Rank 3
Butterfly Network Routing
Evaluating Butterfly Network
• Diameter: log n
• Bisection width: n / 2
• Edges per node: 4
• Constant edge length? No
Hypercube
• Directory topology• 2 x 2 x … x 2 mesh• Number of nodes a power of 2• Node addresses 0, 1, …, 2k-1• Node i connected to k nodes whose
addresses differ from i in exactly one bit position
Hypercube Addressing
0010
0000
0100
0110 0111
1110
0001
0101
1000 1001
0011
1010
1111
1011
11011100
Evaluating Hypercube Network
• Diameter: log n
• Bisection width: n / 2
• Edges per node: log n
• Constant edge length? No
Shuffle-exchange
• Direct topology• Number of nodes a power of 2• Nodes have addresses 0, 1, …, 2k-1• Two outgoing links from node i
– Shuffle link to node LeftCycle(i)– Exchange link to node [xor (i, 1)]
Shuffle-exchange Illustrated
0 1 2 3 4 5 6 7
Shuffle-exchange Addressing
0000 0001 0010 0011 0100 0101
1110 11111000 1001 1010 1011 1100 1101
0110 0111
Evaluating Shuffle-exchange
• Diameter: 2log n - 1
• Bisection width: n / log n
• Edges per node: 2
• Constant edge length? No
Comparing Networks
• All have logarithmic diameterexcept 2-D mesh
• Hypertree, butterfly, and hypercube have bisection width n / 2
• All have constant edges per node except hypercube
• Only 2-D mesh keeps edge lengths constant as network size increases