Upload
kiran-mangrulia
View
214
Download
4
Tags:
Embed Size (px)
Citation preview
Submitted to : Submitted by:
Ms Simarpreet Kiran, A06
Lec : Computer Kh. Loyanganba Meitei, 42
System and Architecture Bca Hons, D1111
*
2
*
*Types of shared memory
*Physically shared memory
*Virtual (or distributed) shared memory
*Scalability issues
*Organisation of memory
*Design of interconnection network
*Cache coherence protocols
Multi-processor:
Structure of Shared Memory MIMD
Architectures
Shared memory
computers
Single address space
memory access
Interconnection
scheme
Cache coherency
Virtual shared
memory
NUMA
CC-NUMA
COMA
Shared pathSwitching
network
Singled bus
based
Multiple bus
based
Bus
multiplication
Grid of buses
Hierarchical
system
CrossbarMultistage network
Hardware
basedSoftware based
Omega Banyan Benes
Physical shared
memory UMA
Design space of shared memory computers
*
*
*Also called distributed shared memory architecture
*The local memories of multi-computer are
components of global address space:
*any processor can access the local memory of any other
processor
*Three approaches:
*Non-uniform memory access (NUMA) machines
*Cache-only memory access (COMA) machines
*Cache-coherent non-uniform memory access
(CC-NUMA) machines
*
*
*Logically shared memory is physically
distributed
*Different access of local and remote memory
blocks. Remote access takes much more time
– latency
*Sensitive to data and program distribution
*Close to distributed memory systems, yet the
programming paradigm is different
*Example: Cray T3D
*
*
*
*Each block of the shared memory works as
local cache of a processor
*Continuous, dynamic migration of data
*Hit-rate decreases the traffic on the
Interconnection Network
*Solutions for data-consistency increase the
same traffic (see cache coherency problem
later)
*Examples: KSR-1, DDM
*
*
*A combination of NUMA and COMA
*Initially static data distribution, then
dynamic data migration
*Cache coherency problem is to be solved
*COMA and CC-NUMA are used in newer
generation of parallel computers
*Examples: Convex SPP1000, Stanford DASH,
MIT Alewife
*
+ No need to partition data or program, uniprocessor
programming techniques can be adapted
+ Communication between processor is efficient
+ Minor modifications of tool chain and compiler
- Synchronized access to share data in memory needed.
Synchronizing constructs (semaphores, conditional critical
regions, monitors) result in nondeterministic behavior which
can lead programming errors that are difficult to discover
- Lack of scalability due to (memory) contention problem
*
*Memory Access Time
*can be a bottleneck even in a single-processor system
*Contention for Memory
*two or more processors want to access a location in the same
block at the same time (hot spot problem).
*Contention for Communication
*processors should share and use exclusively elements of the
Interconnection Network
*Result: long latency-time, idle processors,
nonscalable system
*
*Problems of scalable computers
1. Tolerate and hide latency of remote loads
2. Tolerate and hide idling due to synchronization
*Solutions
1. Cache memory
* problem of cache coherence
2. Prefetching
3. Threads and fast context switching
*
Thanks....