Upload
bertram-grant
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
CSCI 232 © 2005 JW Ryder 1
Parallel Processing
• Large class of techniques used to provide simultaneous data processing tasks
• Purpose: Increase computational speed of the computer
• A parallel processing system is able to process multiple tasks simultaneously
CSCI 232 © 2005 JW Ryder 2
Parallel Processing
• Instruction in ALU, next instr. read from memory
• 2 or more ALUs, 2 or more processors
• Speed up, throughput - amount of processing that can be done in a given amount of time
• Amount of hardware increases, cost increases, complexity increases
CSCI 232 © 2005 JW Ryder 3
Parallel Processing
• Viewed at various levels of complexity
• Lowest - distinguish between serial and parallel load registers
• Higher level - Multiple functional units (FU)– Arithmetic
• Adder-subtractor, Integer multiplier
– Logic• Logic unit, Incrementer, Shifter
– Floating point• add-subtract, multiply, divide
CSCI 232 © 2005 JW Ryder 4
Parallel Processing Classification
• Internal organization of processors
• Interconnection structure between processors
• Flow of information through the system
• Organization of computer system by number of instructions and data items that are manipulated simultaneously
CSCI 232 © 2005 JW Ryder 5
• Normal operation of computer is fetch from memory then execute in processor
• Sequence of instructions read from memory is instruction stream
• Operations performed on the data in the processor is data stream
• Parallel processing may occur in the instruction stream, data stream, or both
Classifications
CSCI 232 © 2005 JW Ryder 6
• SISD - Single Instruction Single Data
• SIMD - Single Instruction Multiple Data
• MISD - Multiple Instruction Single Data
• MIMD - Multiple Instruction Multiple Data
4 Major Groups
CSCI 232 © 2005 JW Ryder 7
• Single computer containing a– Control Unit– Processing Unit– Memory Unit
• Instructions executed sequentially
• System may or may not have internal parallel processing capabilities– Multiple FUs or pipelining
SISD
CSCI 232 © 2005 JW Ryder 8
• Organization including many processing units under supervision of a common control unit
• All processors receive the same instruction from the control unit
• Operate on different items of data
• Shared memory unit must contain multiple modules so that it can communicate with all processors simultaneously
• Array processor
SIMD
CSCI 232 © 2005 JW Ryder 9
• Only of theoretical interest
MISD
CSCI 232 © 2005 JW Ryder 10
• Computer system capable of processing several programs at the same time
• Most multiprocessor and multi-computer systems are in this category
• Flynn’s classification depends on distinction between the performance of the control unit and the data processing unit
• Emphasizes behavioral characteristics of the computer system rather than its operational structures and interconnections
MIMD
CSCI 232 © 2005 JW Ryder 11
• Pipelining does not fit into Flynn’s parallel processing classification scheme
• Only 2 categories used are SIMD, MIMD
Pipelining
CSCI 232 © 2005 JW Ryder 12
• Multiprocessor system is an interconnection of 2 or more CPUs with memory and input-output equipment
• ‘Processor’ in multiprocessor can mean either a central processing unit (CPU) or an input-output processor (IOP)
• System with single CPU and multiple IOPs is not considered (usually) a multiprocessor
Multiprocessors
CSCI 232 © 2005 JW Ryder 13
• Both support concurrent operations• Computers are interconnected with
each other by means of communications lines to form a computer network– Consists of several autonomous
computers that may or may not communicate with each other
• Multiprocessor system controlled by one operating system that provides interaction between processors and all components in the system cooperate to solve the problem at hand
Multiprocessors / Multicomputers
CSCI 232 © 2005 JW Ryder 14
Multiprocessors• Microprocessors major
motivation - cheap, small
• VLSI helps make it possible too
• Improves reliability– mutual funds, some loss of
efficiency
• Benefits– Improved system performance– Computations can proceed in
parallel in 2 ways• Multiple independent jobs run in
parallel
• Single job can be partitioned into multiple parallel tasks
CSCI 232 © 2005 JW Ryder 15
Multiprocessors• Overall functions can be partitioned
into several tasks• System tasks can be allocated to
specialized processors– Designed for optimal performance– Example: One processor
performs standard tasks for an industrial process and others sense and control various parameters such as temperature and flow rate
– Example: One processor takes care of high speed floating point operations while other processes standard operations and tasks
CSCI 232 © 2005 JW Ryder 16
Performance Improvement
• Decompose problem into multiple discrete tasks
• User can explicitly direct computer to split tasks
• Provide a compiler that automatically detects when parts of program can be split– Parallelizing compiler
• Multiprocessors classified by way memory is organized
CSCI 232 © 2005 JW Ryder 17
Tightly Coupled
• A multiprocessor system with common shared memory– Shared memory or Tightly
coupled multiprocessor
• Does not preclude each processor from having own local memory
• Most commercial tightly coupled systems provide cache memory for each CPU
• In addition, global common memory provided that all CPUs can access
CSCI 232 © 2005 JW Ryder 18
Loosely Coupled
• Distributed memory = Loosely coupled
• Each processing element (PE) is a loosely coupled system has its own local memory
• Processors tied together by switching scheme designed to route information between processors through a message passing scheme
• Programs and data relayed in packets consisting of address, data, error detection codes
CSCI 232 © 2005 JW Ryder 19
Loosely Coupled
• Packets either destined for a specific processor or grabbed by first processor that finds it depending on communication system design
• Most efficient when interaction between tasks is minimal
• Tightly coupled tasks can tolerate higher degree of interaction between tasks
CSCI 232 © 2005 JW Ryder 20
Interconnection Structures
• Components forming a multiprocessor are
– CPUs
– IOPs
– A memory unit (may be partitioned into separate modules)
• Interconnections can have different physical configurations
– Depending on number transfer paths available between processors and memory in shared memory system
– Depending on number of transfer paths among PEs in a loosely coupled system
CSCI 232 © 2005 JW Ryder 21
Physical Forms
• Time-Shared Common Bus• Multiported Memory• Crossbar Switch• Multistage Switching Network• Hypercube System
CSCI 232 © 2005 JW Ryder 22
Time-Shared Common Bus
• N processors connected through a common bus to a memory unit
• Only 1 processor can have access (communicate with) the memory unit or another processor at a time
• Transfer operations conducted by processor that is in control of the bus
• Other processors must wait, checking availability
• Command issued to inform destination that communication is requested– What operation, from where
• Destination responds and transfer begins
CSCI 232 © 2005 JW Ryder 23
Common Bus• Bus Contention• Resolved by including a bus
controller– Priorities
• Restricted to a single transfer at a time– When one processor transferring
to/from memory other processors are either busy with internal processing or idle waiting
• System overall transfer rate is limited by speed of bus
• Multiple buses possible but you pay penalty ($$)
CSCI 232 © 2005 JW Ryder 24
Dual Buses
Not more economical• Local buses, local memory• System bus controller is big
coordinator• Local memory can be cache memory
– Coherency problems possible
CSCI 232 © 2005 JW Ryder 25
Multiported Memory
• Separate buses between each memory module (MM) and processor
• Each processor bus connected to each MM
• Processor bus consists of – Address
– Data
– Control lines
• MM has 4 ports, 1 for each bus
CSCI 232 © 2005 JW Ryder 26
Multiported Memory
• MM must have internal logic to determine which bus has control
• Fixed priorities assigned to each memory port (1,2,3,4)
• Advantage: High transfer rate• Disadvantage:
– Expensive memory control logic
– Many cables and connectors
• Usually only appropriate for small number of processors
CSCI 232 © 2005 JW Ryder 27
Crossbar Switch
• Crosspoints placed at intersections of processor buses and memory buses
• See figure 13-4 on page 495• Each switch determines path (control
logic)– Examines address on bus
– Resolves conflicts on predetermined, hardcoded definition
• See figure 13-5 on page 495– Data both directions
– Multiplexers select data (remember select lines??)
CSCI 232 © 2005 JW Ryder 28
Crossbar Switch
• Supports simultaneous transfers from all MM– Separate path associated with each MM
• Hardware can be large and complex • Number switches needed is
Processors x MM
CSCI 232 © 2005 JW Ryder 29
Multistage Switching Network
• Basic Component is a 2-input 2-output interchange switch
• See figure 13-6 on page 496 - explain
• Switch can arbitrate between conflicts
• Can use to build a switching network• See figure 13-7 on page 497 -
explain
CSCI 232 © 2005 JW Ryder 30
Patterns & Omega
• Not all patterns are always available to all processors
• P1 accessing 0xx then P2 can only access 1xx
• Used in both tightly and loosely coupled systems
• Omega Switching Network - see figure 13-8 on page 498– Exactly 1 path from each source to each
MM
– Some patterns cannot be connected simultaneously (000 and 001)
• 1 switch 1 signal at a time
CSCI 232 © 2005 JW Ryder 31
Omega Network
• Tightly Coupled Systems– Sources - Processorrs
– Destinations - MM
• Loosely Coupled Systems– Source - Processor
– Destination - Processor
CSCI 232 © 2005 JW Ryder 32
Hypercube
• Hypercube or binary n-cube• Loosely coupled system composed
of N = 2n processors interconnected in an n-dimensional binary cube
• Each node contains CPU, local memory, I/O interfaces
• Direct communications paths to n other nodes (1 hop)
• There are 2n distinct n-bit binary addresses to be assigned to the processors
• Each neighboring processor address differs by exactly 1 bit position
• See figure 13-9 on page 499
CSCI 232 © 2005 JW Ryder 33
• Will take from 1 to n hops (max source to destination)
• Routing procedure– XOR Source and Destination addresses
• Result will show on which axes addresses differ
– Send along any indicated axis
– Repeat until arrival at destination
Routing Messages