CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies

CS668- Lecture 2 - Sept. 30Today’s topics

• Parallel Architectures (Chapter 2)• Memory Hierarchy• Busses and Switched Networks• Interconnection Network Topologies• Multiprocessors / Multicomputers• Flynn’s Taxonomy• Analysis of Interconnection Networks

Theoretic Computer Architectures

• Turing Machine• Von Neumann Architecture• Fetch/Execute Cycle• Memory Models• RAM model• PRAM model extension• Shared Memory vs. Distributed Shared

Memory vs. Distributed Memory

Processors and the Memory Hierarchy

• Registers (1 clock cycle, 100s of bytes)• 1st level cache (3-5 clock cycles, 100s KBytes)• 2nd level cache (~10 clock cycles, MBytes)• Main memory (~100 clock cycles, GBytes)• Disk (milliseconds, 100GB to gianormous)

registers

1st level Instructions

1st level Data

2nd Level unified (Instructions & Data)

CPU

IBM Dual Core

From Intel® 64 and IA-32 Architectures Optimization Reference Manualhttp://www.intel.com/design/processor/manuals/248966.pdf

Shared Memory Multiprocessor• One or more memories• Global address space (all system memory visible to all

processors)• Transfer of data between processors is usually implicit, just read

(write) to (from) a given address (OpenMP)• Complex Cache-coherency protocols to maintain consistency

between processors.

Interconnection Network

Memory

CPU

Memory

CPU

Memory

CPU

(UMA) Uniform-memory-access Shared-memory System

Distributed Shared Memory

• Single address space with implicit communication• Hardware support for read/write to non-local memories, cache

coherency• Latency for a memory operation is greater when accessing non local

data than when accessing date within a CPU’s own memory

(NUMA)Non-Uniform-memory-access Shared-memory System


MemoryCPU MemoryCPU MemoryCPU

Distributed Memory / Message Passing

• Each processor has access to its own memory only• Data transfer between processors is explicit, user calls message

passing functions• Common Libraries for message passing

– MPI, PVM• User has complete control/responsibility for data placement and

management


MemoryCPU MemoryCPU MemoryCPU

Hybrid Systems

• Distributed memory system with multiprocessor shared memory nodes.

• Most common architecture for current generation of parallel machines


CPU

Mem

ory

CPU

CPU

Network Interface

CPU

Mem

ory

CPU

CPU

Network Interface

CPU

Mem

ory

CPU

CPU

Network Interface

Flynn’s Taxonomy (figure 2.20 from Quinn)

SISDUniprocessor

SIMDProcessor arraysPipelined vector

processors

MISDSystolic array

MIMDMultiprocessorsMulticomputers

Single Multiple

Sin

gle

Mu

ltip

leData stream

Inst

ruct

ion

stre

am

Analysis of Switch Network Topologies

• View switched network as a graph– n - Vertices = processors or switches– m - Edges = communication paths

• Two kinds of topologies– Direct - ratio of switches to processors 1:1– Indirect - ratio is d:1

Evaluating Switch Topologies

• Diameter• Bisection width• Number of edges / node (d = degree)• Constant edge length? (yes/no)

– Layout area/wire length

2-D Mesh Network

• Direct topology• Switches arranged into a 2-D lattice• Communication allowed only between

neighboring switches• Variants allow wraparound connections

between switches on edge of mesh

2-D Meshes

Evaluating 2-D Meshes

• Diameter: (n1/2)

• Bisection width: (n1/2)

• Number of edges per switch: 4

• Constant edge length? Yes

Binary Tree Network

• Indirect topology• n = 2d processor nodes, n-1 switches

Evaluating Binary Tree Network

• Diameter: 2 log n

• Bisection width: 1

• Edges / node: 3

• Constant edge length? Yes/No?

Hypertree Network

• Indirect topology• Shares low diameter of binary tree• Greatly improves bisection width• From “front” looks like k-ary tree of height

d• From “side” looks like upside down binary

tree of height d

Hypertree Network

Evaluating 4-ary Hypertree

• Diameter: log n

• Bisection width: n / 2

• Edges / node: 6

• Constant edge length? No

Butterfly Network

• Indirect topology• n = 2d processor

nodes connectedby n(log n + 1)switching nodes

0 1 2 3 4 5 6 7

3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7

2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7

1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7

0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7Rank 0

Rank 1

Rank 2

Rank 3

Butterfly Network Routing

Evaluating Butterfly Network

• Diameter: log n


• Edges per node: 4


Hypercube

• Directory topology• 2 x 2 x … x 2 mesh• Number of nodes a power of 2• Node addresses 0, 1, …, 2k-1• Node i connected to k nodes whose

addresses differ from i in exactly one bit position

Hypercube Addressing

0010

0000

0100

0110 0111

1110

0001

0101

1000 1001

0011

1010

1111

1011

11011100

Evaluating Hypercube Network

• Diameter: log n


• Edges per node: log n


Shuffle-exchange

• Direct topology• Number of nodes a power of 2• Nodes have addresses 0, 1, …, 2k-1• Two outgoing links from node i

– Shuffle link to node LeftCycle(i)– Exchange link to node [xor (i, 1)]

Shuffle-exchange Illustrated

0 1 2 3 4 5 6 7

Shuffle-exchange Addressing

0000 0001 0010 0011 0100 0101

1110 11111000 1001 1010 1011 1100 1101

0110 0111

Evaluating Shuffle-exchange

• Diameter: 2log n - 1

• Bisection width: n / log n

• Edges per node: 2


Comparing Networks

• All have logarithmic diameterexcept 2-D mesh

• Hypertree, butterfly, and hypercube have bisection width n / 2

• All have constant edges per node except hypercube

• Only 2-D mesh keeps edge lengths constant as network size increases

Documents

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies