Mimd

Submitted to : Submitted by:

Ms Simarpreet Kiran, A06

Lec : Computer Kh. Loyanganba Meitei, 42

System and Architecture Bca Hons, D1111

*

2

*

*Types of shared memory

*Physically shared memory

*Virtual (or distributed) shared memory

*Scalability issues

*Organisation of memory

*Design of interconnection network

*Cache coherence protocols

Multi-processor:

Structure of Shared Memory MIMD

Architectures

Shared memory

computers

Single address space

memory access

Interconnection

scheme

Cache coherency

Virtual shared

memory

NUMA

CC-NUMA

COMA

Shared pathSwitching

network

Singled bus

based

Multiple bus

based

Bus

multiplication

Grid of buses

Hierarchical

system

CrossbarMultistage network

Hardware

basedSoftware based

Omega Banyan Benes

Physical shared

memory UMA

Design space of shared memory computers

*

*

*Also called distributed shared memory architecture

*The local memories of multi-computer are

components of global address space:

*any processor can access the local memory of any other

processor

*Three approaches:

*Non-uniform memory access (NUMA) machines

*Cache-only memory access (COMA) machines

*Cache-coherent non-uniform memory access

(CC-NUMA) machines

*

*

*Logically shared memory is physically

distributed

*Different access of local and remote memory

blocks. Remote access takes much more time

– latency

*Sensitive to data and program distribution

*Close to distributed memory systems, yet the

programming paradigm is different

*Example: Cray T3D

*

*

*

*Each block of the shared memory works as

local cache of a processor

*Continuous, dynamic migration of data

*Hit-rate decreases the traffic on the

Interconnection Network

*Solutions for data-consistency increase the

same traffic (see cache coherency problem

later)

*Examples: KSR-1, DDM

*

*

*A combination of NUMA and COMA

*Initially static data distribution, then

dynamic data migration

*Cache coherency problem is to be solved

*COMA and CC-NUMA are used in newer

generation of parallel computers

*Examples: Convex SPP1000, Stanford DASH,

MIT Alewife

*

+ No need to partition data or program, uniprocessor

programming techniques can be adapted

+ Communication between processor is efficient

+ Minor modifications of tool chain and compiler

- Synchronized access to share data in memory needed.

Synchronizing constructs (semaphores, conditional critical

regions, monitors) result in nondeterministic behavior which

can lead programming errors that are difficult to discover

- Lack of scalability due to (memory) contention problem

*

*Memory Access Time

*can be a bottleneck even in a single-processor system

*Contention for Memory

*two or more processors want to access a location in the same

block at the same time (hot spot problem).

*Contention for Communication

*processors should share and use exclusively elements of the

Interconnection Network

*Result: long latency-time, idle processors,

nonscalable system

*

*Problems of scalable computers

1. Tolerate and hide latency of remote loads

2. Tolerate and hide idling due to synchronization

*Solutions

1. Cache memory

* problem of cache coherence

2. Prefetching

3. Threads and fast context switching

*

Thanks....

Documents

Mimd