Upload
clifton-baker
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Parallel ComputersOrganizations and Architecture
Department of Computer ScienceSouthern Illinois University Edwardsville
Summer, 2015
Dr. Hiroshi FujinokiE-mail: [email protected]
CS 312 Computer Organization and Architecture
Mult_Sched/001
CS 312 Computer Organization and Architecture
Four hardware architecture for “parallel computers”
Tightly-Coupled Multi-Processor System
Functionally-Specialized Multi-Processor System
Loosely-Coupled Multi-Processor System
Distributed Systems (“most loosely coupled systems”)
MotherboardMotherboard
Mult_Sched/002
Tightly-Coupled Multi-Processor System
• Multi-Processor System (multi-processor motherboard)
• Single-Processor System with a multi-core processor
Multi-ProcessorSystem
Single-Processor Systemwith multi-core processor
ProcessorProcessor
Processor Core(ALU and others)
CS 312 Computer Organization and Architecture
Mult_Sched/002
Tightly-Coupled Multi-Processor System
• Multi-Processor System (multi-processor motherboard)
CS 312 Computer Organization and Architecture
Two processors on a motherboard
Mult_Sched/002
Tightly-Coupled Multi-Processor System
CS 312 Computer Organization and Architecture
• Single-Processor System with a multi-core processor
CPU cores
Motherboard
Graphic Interface
Video RAM (“VRAM”)
Mult_Sched/003
Functionally-Specialized Multi-Processor System
Examples: • GPU on graphics card• Built-in processor on high-speed disk controllers or NICs
(especially those using DMA)
Processor
Monitor(CRT, Flat Panel)
DAC
Graphic-card performs D/A conversion using DAC.
GPU
GPU processes image data in the graphic-card memory
Processor sends graphic command to GPU
Graphic-card sends analog image signals (RGB-signals) to monitor
(GPU = “Graphic Processing Unit”)
CS 312 Computer Organization and Architecture
Mult_Sched/003
Functionally-Specialized Multi-Processor System
Examples: • GPU on graphics card (GPU = “Graphic Processing Unit”)
CS 312 Computer Organization and Architecture
DMA SCSI I/O card
CPU
Control Program (in ROM)
Mult_Sched/004
Loosely-Coupled Multi-Processor System
• Multi-Systemboard (multiple motherboard) computers
Computer System“Bus”
Processor
System Board(Motherboard)
Memory
• A computer with multiple motherboards (“blades”)
• Blades communicate through the bus
• Each blade is a computer
• Communication delay over the bus
at least “s” order
CS 312 Computer Organization and Architecture
Mult_Sched/004
Loosely-Coupled Multi-Processor System
• Multi-Systemboard (multiple motherboard) computers
CS 312 Computer Organization and Architecture
Mult_Sched/005
Distributed Systems (“most loosely coupled systems”)
AS 1
AS 4
AS 2
AS 3
• Processor• Local Memory• Secondary Storage• Other I/O
• Processor• Local Memory• Secondary Storage• Other I/O
• Processor• Local Memory• Secondary Storage• Other I/O
• Processor• Local Memory• Secondary Storage• Other I/O
Process(executable codes)
Process Migration
File (data)
Data MigrationNetwork
CS 312 Computer Organization and Architecture
Mult_Sched/006
Three different types of tightly-coupled multi-processor systems
(1) “Fine-grained” multi-processor parallel computers
(2) “Medium-grained” multi-processor parallel computers
(3) “Coarse-grained” multi-processor parallel computers
CS 312 Computer Organization and Architecture
Mult_Sched/007
Fine-Grained Multi-Process
• Fine-grained = instruction-level multi-processing
Your program(binary executable)
A = B + C;X = Y + Z;
W = A + X;
synchronization
Dependency
Granularity: 1~20 instructions
CPU CPU
CS 312 Computer Organization and Architecture
Mult_Sched/008
Medium-Grained Multi-Process
• Medium-grained = thread-level multi-processing
Your program(binary executable)
ThreadA
ThreadB
ThreadC
ThreadD
Processor Processor
CS 312 Computer Organization and Architecture
Mult_Sched/009
Medium-Grained Multi-Process
• Example: Web Browser
ThreadA -- Display thread (text output & jpeg image processing)
ThreadB -- Taking user inputs (edit boxes, radio boxes in the browser window
ThreadC -- Network input (receiving data from network)
ThreadD -- Network output (sending data to network)
ThreadA ThreadB ThreadC ThreadD
Receivingdata
Displayingdata
User makesinputs
Receivingdata
Transmitdata
CS 312 Computer Organization and Architecture
Mult_Sched/010
Medium-Grained Multi-Process
• Example: Web Browser
ThreadA -- Display thread (text output & jpeg image processing)
ThreadB -- Taking user inputs (edit boxes, radio boxes in the browser window
ThreadC -- Network input (receiving data from network)
ThreadD -- Network output (sending data to network)
ThreadA ThreadB ThreadC ThreadD
ReceivingdataDisplaying
dataUser makesinputs
Receivingdata
Transmitdata
Browser executionwith better responses
Granularity: 20~200 instructions
CS 312 Computer Organization and Architecture
Mult_Sched/011
Coarse-Grained Multi-Process
• Coarse-grained = process-level multi-tasking
Process assignment to multiple processors in multi-tasking environment
Memory
Processor
Time
CS 312 Computer Organization and Architecture
Mult_Sched/012
Coarse-Grained Multi-Process
• Coarse-grained = process-level multi-tasking
Process assignment to multiple processors in multi-tasking environment
Memory
Processor PoolGranularity = ms order
• 1ms (@ 1GHz) = 1 million instructions
• 100ms (@ 1GHz) = 100M instructions
Granularity: 1~100 M instructions
Time
CS 312 Computer Organization and Architecture